Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Nix install run out of memory and fail when binary package size > RAM size #1969

Closed
adevress opened this issue Mar 12, 2018 · 27 comments
Closed
Assignees

Comments

@adevress
Copy link
Contributor

In case of binary installation where size of the NAR > size of the RAM, I triggered an issue where Nix 2.0 simply run out of memory and interrupt.

unexpected error in download thread: std::bad_alloc
download of 'http://********.cache.somewhere/cache/nar/yxnmlrrdng8ff8xk5lrxahvzjyc1i01p.nar.bz2' was interrupted
cannot build derivation '/nix/store/0m6yzz1x57zr81pvvlx6yqcbhnifz74q-all-modules.drv': 1 dependencies couldn't be built
�[31;1merror:�[0m build of '/nix/store/0m6yzz1x57zr81pvvlx6yqcbhnifz74q-all-modules.drv', '/nix/store/xfr1cbmd8j46dh2dgv20vcqa15fnspd0-all-benchs.drv' failed
Build step 'Execute shell' marked build as failure

This was triggered by a simple "nix-build" and was not existing before nix 2.0

@adevress
Copy link
Contributor Author

This is a major issue because installation something like CUDA 9.0 ( > 2GB ) on a medium machine with 4GB RAM fails...

@edolstra
Copy link
Member

This should partially be addressed by #619 / #1754. However another reason for the memory use regression is that evaluation and building are now done in a single process (i.e. nix-build is no longer a wrapper around nix-instantiate and nix-store -r). A solution might be to force a Boehm GC run after evaluation.

@adevress
Copy link
Contributor Author

Thx @edolstra

@adevress
Copy link
Contributor Author

Indeed:

This is on something more "lightweight" when compiling GCC5

There is a memory leaks somewhere

c-family/.deps/c-gimplify.TPo ../../gcc-5.5.0/gcc/c-family/c-gimplify.c

�[31;1merror:�[0m out of memory
Build step 'Execute shell' marked build as failure
Finished: FAILURE

adevress added a commit to BlueBrain/bbp-nixpkgs that referenced this issue Mar 12, 2018
@dtzWill
Copy link
Member

dtzWill commented Mar 13, 2018

Boehm only governs eval memory which is peanuts compared to memory used by copying around std::string's-- profiling shows copying paths is by far the largest contributor to peak memory usage, requiring strings with both the compressed (via binary cache) and decompressed to in memory at the same time (briefly, but nevertheless).

Our usage of std::string::append appears particularly painful, FWIW. Decompression adds to a string two pages at a time (on glibc)... :(.

This isn't addressed by the linked issues, although is similar in spirit so perhaps can be handled similarly.

(we have "Sink"s for compression but decompression-- by far the more common for everyone that isn't hydra-- decompresses into a string instead of being a Source or something)

I poked at this for a few hours yesterday but didn't find a clean and satisfactory way to improve this.

This is particularly problematic for the above reason (needs to store nar and nar.xz in memory, concurrently) but especially because downloading is done by worker threads causing memory requirement to be something like cores*(max(nar + nar.xz)), which I suspect is what causes memory usage problems reported.

Swap helps but at some point Nix is responsible for using disk for huge paths instead of manipulating them in-memory.

@adevress
Copy link
Contributor Author

This is particularly problematic for the above reason (needs to store nar and nar.xz in memory, concurrently) but especially because downloading is done by worker threads causing memory requirement to be something like cores*(max(nar + nar.xz)), which I suspect is what causes memory usage problems reported.

Swap helps but at some point Nix is responsible for using disk for huge paths instead of manipulating them in-memory.

I agree.

Something like if:
nar.size() < 250 MB
-> Memory,
else
-> memory map file

sounds reasonnable to me. unpacking nar > 2GB in memory hoping swap would handle it looks like a dangerous solution to me.

@7c6f434c
Copy link
Member

7c6f434c commented Mar 13, 2018 via email

@dtzWill
Copy link
Member

dtzWill commented Mar 13, 2018

Maybe useful: http://stxxl.org

@TravisWhitaker
Copy link

Just his this with CI that needs cudatoolkit. Is there a workaround other than downgrading nix or adding swap?

edolstra added a commit that referenced this issue Mar 16, 2018
copyStorePath() now pipes the output of srcStore->narFromPath()
directly into dstStore->addToStore(). The sink used by the former is
converted into a source usable by the latter using
boost::coroutine2. This is based on [1].

This reduces the maximum resident size of

  $ nix build --store ~/my-nix/ /nix/store/b0zlxla7dmy1iwc3g459rjznx59797xy-binutils-2.28.1 --substituters file:///tmp/binary-cache-xz/ --no-require-sigs

from 418592 KiB to 53416 KiB. (The previous commit also reduced the
runtime from ~4.2s to ~3.4s, not sure why.) A further improvement will
be to download files into a Sink.

[1] master...Mathnerd314:dump-fix-coroutine#diff-dcbcac55a634031f9cc73707da6e4b18

Issue #1969.
@dtzWill
Copy link
Member

dtzWill commented Mar 20, 2018

The recent commits should address the worst of this (yay!!), can this be closed?

Also.... 2.1 or whatever would be next, soon-ish? :D

@bgamari
Copy link
Contributor

bgamari commented Mar 26, 2018

Sadly this fix is still very much needed. As of 0cb1e52 I am still unable to build a derivation depending upon a 15GB tarball (an FPGA toolchain provided by a hardware vendor) on a machine with 32GB of RAM.

edolstra added a commit to edolstra/nix that referenced this issue Mar 27, 2018
This reduces memory consumption of

  nix copy --from https://cache.nixos.org --to ~/my-nix /nix/store/95cwv4q54dc6giaqv6q6p4r02ia2km35-blender-2.79

from 176 MiB to 82 MiB. (The remaining memory is probably due to xz
decompression overhead.)

Issue NixOS#1681.
Issue NixOS#1969.
@edolstra
Copy link
Member

@bgamari How are you depending on that tarball? If it's via a path reference (e.g. foo = ./bla.tar), that's a separate issue.

@bgamari
Copy link
Contributor

bgamari commented Mar 29, 2018

@edolstra indeed it is via a path reference (since the tarball must be downloaded manually due to vendor login requirements).

@7c6f434c
Copy link
Member

@bgamari I would consider requireFile and nix-prefetch-url file:///path/to/file for that use case.

@shlevy
Copy link
Member

shlevy commented Mar 30, 2018

@7c6f434c That was the inspiration for #2019 😄

@shlevy shlevy added the backlog label Apr 1, 2018
@shlevy shlevy self-assigned this Apr 1, 2018
dtzWill pushed a commit to dtzWill/nix that referenced this issue Apr 4, 2018
This reduces memory consumption of

  nix copy --from https://cache.nixos.org --to ~/my-nix /nix/store/95cwv4q54dc6giaqv6q6p4r02ia2km35-blender-2.79

from 176 MiB to 82 MiB. (The remaining memory is probably due to xz
decompression overhead.)

Issue NixOS#1681.
Issue NixOS#1969.
@lheckemann
Copy link
Member

dup of #1681?

edolstra added a commit that referenced this issue May 30, 2018
This reduces memory consumption of

  nix copy --from https://cache.nixos.org --to ~/my-nix /nix/store/95cwv4q54dc6giaqv6q6p4r02ia2km35-blender-2.79

from 176 MiB to 82 MiB. (The remaining memory is probably due to xz
decompression overhead.)

Issue #1681.
Issue #1969.
nh2 added a commit to nh2/nix that referenced this issue Jun 3, 2018
Fixes `error: out of memory` of `nix-store --serve --write`
when receiving packages via SSH (and perhaps other sources).

See NixOS#1681 NixOS#1969 NixOS#1988 NixOS/nixpkgs#38808.

Performance improvement on `nix-store --import` of a 2.2 GB cudatoolkit closure:

When the store path already exists:
  Before:
    10.82user 2.66system 0:20.14elapsed 66%CPU (0avgtext+0avgdata   12556maxresident)k
  After:
    11.43user 2.94system 0:16.71elapsed 86%CPU (0avgtext+0avgdata 4204664maxresident)k
When the store path doesn't yet exist (after `nix-store --delete`):
  Before:
    11.15user 2.09system 0:13.26elapsed 99%CPU (0avgtext+0avgdata 4204732maxresident)k
  After:
     5.27user 1.48system 0:06.80elapsed 99%CPU (0avgtext+0avgdata   12032maxresident)k

The reduction is 4200 MB -> 12 MB RAM usage, and it also takes less time.
@nh2
Copy link
Contributor

nh2 commented Jun 3, 2018

Try out #2206

mboes added a commit to tweag/rules_haskell that referenced this issue Jun 13, 2018
@bjornfor
Copy link
Contributor

What does the 'backlog' label mean? It's not a priority? You can add me to the list of users who consider this issue a blocker for Nix 2.0, and hope to see a release with the the fixes soon. (I'm currently dealing with FPGA toolchains of ~8 GiB.)

@lheckemann
Copy link
Member

I believe the backlog label just means it hasn't been triaged yet.

Anton-Latukha pushed a commit to Anton-Latukha/nix that referenced this issue Jul 12, 2018
This reduces memory consumption of

  nix copy --from https://cache.nixos.org --to ~/my-nix /nix/store/95cwv4q54dc6giaqv6q6p4r02ia2km35-blender-2.79

from 176 MiB to 82 MiB. (The remaining memory is probably due to xz
decompression overhead.)

Issue NixOS#1681.
Issue NixOS#1969.
@domenkozar
Copy link
Member

domenkozar commented Sep 25, 2018

https://travis-ci.com/cachix/cachix/jobs/147861767 still has the issue, snippets:

...
$ nix-env --version
nix-env (Nix) 2.1.2
...
copying path '/nix/store/birp3a0apx7iyl5zq71vggdizlxbrn5d-haskell-src-exts-1.20.2' from 'https://cachix.cachix.org'...
std::bad_alloc
copying path '/nix/store/vvl9kjp778q79ii28kfkknya6q9sgf8b-prettyprinter-ansi-terminal-1.1.1.2' from 'https://cachix.cachix.org'...
...

@domenkozar
Copy link
Member

Nix has 6GB ram available:

             total       used       free     shared    buffers     cached
Mem:          7479       1391       6088        179        170        939

and it needs to unpack, in total 4GB - while I run it with -j1:

these paths will be fetched (322.92 MiB download, 4064.84 MiB unpacked):

@lheckemann
Copy link
Member

Just to make sure — you're either running single-user or the daemon is 2.1.2 as well?

@domenkozar
Copy link
Member

domenkozar commented Sep 26, 2018

Afaik it's still single user on linux :)

@edolstra
Copy link
Member

This is because this NAR is corrupt:

# curl -s https://cachix.cachix.org/nar/a577be0ee2cd4bcfe39437941c4ade8b73357427deee3090ddb8e3dbdbb5cb83.nar.xz | xz -d | hexdump  -C | head -n2
00000000  00 00 88 27 02 00 04 00  00 00 40 00 00 00 00 00  |...'......@.....|
00000010  00 00 00 00 00 00 00 00  00 00 b8 a4 11 00 00 00  |................|

Note the missing NAR header, which looks like this:

# curl -s https://cachix.cachix.org/nar/a8d3c42efa059d0ebdfdae7f1c8fe98935dbe293c960c7507d462bac85f6c610.nar.xz | xz -d | hexdump  -C | head -n2
00000000  0d 00 00 00 00 00 00 00  6e 69 78 2d 61 72 63 68  |........nix-arch|
00000010  69 76 65 2d 31 00 00 00  01 00 00 00 00 00 00 00  |ive-1...........|

@edolstra
Copy link
Member

44e8630

@domenkozar
Copy link
Member

Thank you @edolstra - I'll add the checks in place to prevent this.

@domenkozar
Copy link
Member

So this is fixed in Nix 2.1 and my bug had similar error, but different cause.

nh2 pushed a commit to nh2/nix that referenced this issue Jun 27, 2019
copyStorePath() now pipes the output of srcStore->narFromPath()
directly into dstStore->addToStore(). The sink used by the former is
converted into a source usable by the latter using
boost::coroutine2. This is based on [1].

This reduces the maximum resident size of

  $ nix build --store ~/my-nix/ /nix/store/b0zlxla7dmy1iwc3g459rjznx59797xy-binutils-2.28.1 --substituters file:///tmp/binary-cache-xz/ --no-require-sigs

from 418592 KiB to 53416 KiB. (The previous commit also reduced the
runtime from ~4.2s to ~3.4s, not sure why.) A further improvement will
be to download files into a Sink.

[1] NixOS/nix@master...Mathnerd314:dump-fix-coroutine#diff-dcbcac55a634031f9cc73707da6e4b18

Issue NixOS#1969.

(cherry picked from commit 48662d1)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests