Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

nix copy uses too much memory #1681

Open
LisannaAtHome opened this issue Nov 15, 2017 · 42 comments
Open

nix copy uses too much memory #1681

LisannaAtHome opened this issue Nov 15, 2017 · 42 comments
Assignees

Comments

@LisannaAtHome
Copy link

@LisannaAtHome LisannaAtHome commented Nov 15, 2017

I'm running nix copy in runInLinuxVM, and notice that for any nontrivial closures, the VM will run out of memory during the copying process. I left it set at the default 512 megabytes. I could obviously increase the amount of memory the VM is given, but that doesn't scale for copying complex derivations with many dependencies.

I suggest adding an option to only load and copy the contents of the paths one at a time, or even better, a way to specify an upper bound on the memory to be used while copying.

@copumpkin
Copy link
Member

@copumpkin copumpkin commented Nov 15, 2017

Intuitively it feels that it should be possible for it to run in constant memory. What am I missing?

@lheckemann
Copy link
Member

@lheckemann lheckemann commented Feb 23, 2018

I'm encountering this issue with a single path — nix copy, nix-store --import, and a number of other commands I've tried all fail to import the path. Would be great to know if there's any way at all I can import it…

@LisannaAtHome
Copy link
Author

@LisannaAtHome LisannaAtHome commented Mar 17, 2018

Possibly related to #1969 ? Looks like some patches have gone in recently that might improve things here: 48662d1 3e6b194

edolstra added a commit to edolstra/nix that referenced this issue Mar 26, 2018
Continuation of 97002b6. This makes
the daemon use constant memory. For example, it reduces the daemon's
maximum RSS on

  $ nix copy --from ~/my-nix --to daemon /nix/store/1n7x0yv8vq6zi90hfmian84vdhd04bgp-blender-2.79a

from 264 MiB to 7 MiB.

We now use a TunnelSource to prevent the connection from ending up in
an undefined state if an exception is thrown while the NAR is being
sent.

Issue NixOS#1681.
edolstra added a commit to edolstra/nix that referenced this issue Mar 27, 2018
This reduces memory consumption of

  nix copy --from file://... --to ~/my-nix /nix/store/95cwv4q54dc6giaqv6q6p4r02ia2km35-blender-2.79

from 514 MiB to 18 MiB for an uncompressed binary cache, and from 192
MiB to 53 MiB for a bzipped binary cache. It may also be faster
because fetching can happen concurrently with decompression/writing.

Continuation of 48662d1.

Issue NixOS#1681.
edolstra added a commit to edolstra/nix that referenced this issue Mar 27, 2018
This reduces memory consumption of

  nix copy --from https://cache.nixos.org --to ~/my-nix /nix/store/95cwv4q54dc6giaqv6q6p4r02ia2km35-blender-2.79

from 176 MiB to 82 MiB. (The remaining memory is probably due to xz
decompression overhead.)

Issue NixOS#1681.
Issue NixOS#1969.
@shlevy shlevy added the backlog label Apr 1, 2018
@Ralith
Copy link

@Ralith Ralith commented Apr 2, 2018

I see commits purporting to address this for a number of different cases, but none concerning uploading to a S3 bucket. Trying to copy a 2.8GB store path to a S3 bucket took nearly 4GB of memory and more than twenty minutes of 100% CPU. Has that been fixed?

dtzWill added a commit to dtzWill/nix that referenced this issue Apr 4, 2018
This reduces memory consumption of

  nix copy --from file://... --to ~/my-nix /nix/store/95cwv4q54dc6giaqv6q6p4r02ia2km35-blender-2.79

from 514 MiB to 18 MiB for an uncompressed binary cache, and from 192
MiB to 53 MiB for a bzipped binary cache. It may also be faster
because fetching can happen concurrently with decompression/writing.

Continuation of 48662d1.

Issue NixOS#1681.
dtzWill added a commit to dtzWill/nix that referenced this issue Apr 4, 2018
This reduces memory consumption of

  nix copy --from https://cache.nixos.org --to ~/my-nix /nix/store/95cwv4q54dc6giaqv6q6p4r02ia2km35-blender-2.79

from 176 MiB to 82 MiB. (The remaining memory is probably due to xz
decompression overhead.)

Issue NixOS#1681.
Issue NixOS#1969.
@andrewchambers
Copy link

@andrewchambers andrewchambers commented Apr 8, 2018

Hitting this issue trying to do something like - nixos-rebuild build ; nix copy ./result --to ssh://low_ram_machine

@dtzWill will those experimental changes help with ssh copy?

@edolstra
Copy link
Member

@edolstra edolstra commented Apr 12, 2018

@Ralith I'm probably not going to make S3BinaryCacheStore do uploads in constant space. It might not even be supported by aws-sdk-cpp.

I assume the 100% CPU is caused by compression, which you can disable.

@copumpkin
Copy link
Member

@copumpkin copumpkin commented Apr 12, 2018

FWIW I too am another big-upload-to-S3 guy using nix copy 😄

It would surprise me if aws-sdk-cpp didn't support it, given that S3 supports almost arbitrarily large objects and multi-part uploads. If someone figured out how to implement it, would you accept the PR?

@Ralith
Copy link

@Ralith Ralith commented Apr 13, 2018

I assume the 100% CPU is caused by compression, which you can disable.

It seems very strange that it would take twenty minutes on my i7-4980HQ, even so. 2.8GB is big but it's not that big.

@edolstra
Copy link
Member

@edolstra edolstra commented Apr 13, 2018

IIRC xz compression can easily take that long.

@coretemp
Copy link

@coretemp coretemp commented Apr 23, 2018

This is what I am seeing too:

a...........> copying path '/nix/store/fl3mcaqqk2vg0dmk01dfbs6nbm5skpzc-systemd-237' from 'https://cache.nixos.org'...
a...........> error: out of memory

The main problem I see is that it merely says "out of memory", instead of saying how much it tried to allocate, and how much was available before the allocation in the error message. Copying data should run in constant space as others have already mentioned.

If the compression is causing higher memory requirements than needed, this is a problem too, because it raises the hosting costs for no reason other than the initial deployment.

Before the deployment at least 300MB was available on host a.

@dtzWill
Copy link
Contributor

@dtzWill dtzWill commented Apr 23, 2018

FWIW it looks like they do support streaming at least for fetches:

https://sdk.amazonaws.com/cpp/api/LATEST/index.html

(Near end, look for IOStreams).

Hopefully upload has similar.

Seconded re:xz compression taking that long. There's an option somewhere to enable parallel xz compression is you have idle cores. IIRC the result will be slightly bigger for the same compression level.

Anyway, if someone tackled the API spelunking would it be welcome? Or is there a reason that will have problems or is a bad idea?

EDIT: oops I think we already use the stream thing, although at a glance it looks like we pull it all into a string but that seems resolvable. Anyway fetch from s3 is probably not as important.

@lheckemann
Copy link
Member

@lheckemann lheckemann commented May 2, 2018

As far as I can tell, the fixes in 2.0.1 still don't really fix the issue.

@edolstra
Copy link
Member

@edolstra edolstra commented May 3, 2018

@lheckemann IIRC we didn't cherry-pick any memory improvements in 2.0.1. You need master for some of the fixes or my experimental branch for the rest.

@lheckemann
Copy link
Member

@lheckemann lheckemann commented May 3, 2018

Oh, that would explain it! Any chance they could be included in a 2.0.2 release? There have been so many complaints about this issue on IRC and I've run into it myself more times than I would like as well.

@SebastianCallh
Copy link

@SebastianCallh SebastianCallh commented May 29, 2018

Does "nixops deploy" use this? I get out of memory during deploy, even though I have several gigabytes free (both on disk and working memory) which is odd. Just wondering if this is addressed here or should be investigated further.

@coretemp
Copy link

@coretemp coretemp commented May 29, 2018

@SebastianCallh you are not specifying which machine goes out of memory, so I assume you don't know it's talking about the machine you are deploying to. The solution to this is to use 512MB of swap.

Perhaps I might commit some of my changes to fix this in an AWS environment when t2.nanos are being used, but only if there is interest in them from people with commit access.

@SebastianCallh
Copy link

@SebastianCallh SebastianCallh commented May 29, 2018

@coretemp That was the machine I was referring to. The machine being deployed too has plenty of both disk and working memory to spare when the error occurs.

edolstra added a commit that referenced this issue May 30, 2018
Continuation of 97002b6. This makes
the daemon use constant memory. For example, it reduces the daemon's
maximum RSS on

  $ nix copy --from ~/my-nix --to daemon /nix/store/1n7x0yv8vq6zi90hfmian84vdhd04bgp-blender-2.79a

from 264 MiB to 7 MiB.

We now use a TunnelSource to prevent the connection from ending up in
an undefined state if an exception is thrown while the NAR is being
sent.

Issue #1681.
@dtzWill
Copy link
Contributor

@dtzWill dtzWill commented Jun 22, 2018

Please don't release too many of the recent memory fixes until we've fixed #2203--apologies if the proposed changes don't depend on the bits that broke nix log usage for paths built by hydra. Just don't want to accidentally end up with a release with such a regression :).

@coretemp
Copy link

@coretemp coretemp commented Jul 6, 2018

edolstra/nix@c94b4fc is only controversial, because it raises the cost of cloud resources without a good reason in many cases.

If it would simply inspect the machine to check how much storage is available and/or memory is available, it could use that as a default solution.

Another solution is that it would just run until it would go out of memory and then try again with the optimization applied automatically. This way it will always work. For the people that want to optimize the last bits of performance, you could add variables (like already exist, but probably need better names) to control this behavior. By using this design, everyone would be happy. Similarly, you could have flags that optimize for deployment time (e.g. waste more cloud resources to optimize for developer time).

As a guiding principle, I would like to see acknowledged that increasing cloud resource cost is weighed heavily in implementation decisions.

In general, even if you don't implement exactly one of the suggestions above, it is likely possible to create something non controversial. The problem with the existing patch is that the variable one can control is an implementation detail, not a high level policy.

Anton-Latukha added a commit to Anton-Latukha/nix that referenced this issue Jul 12, 2018
Continuation of 97002b6. This makes
the daemon use constant memory. For example, it reduces the daemon's
maximum RSS on

  $ nix copy --from ~/my-nix --to daemon /nix/store/1n7x0yv8vq6zi90hfmian84vdhd04bgp-blender-2.79a

from 264 MiB to 7 MiB.

We now use a TunnelSource to prevent the connection from ending up in
an undefined state if an exception is thrown while the NAR is being
sent.

Issue NixOS#1681.
Anton-Latukha added a commit to Anton-Latukha/nix that referenced this issue Jul 12, 2018
This reduces memory consumption of

  nix copy --from file://... --to ~/my-nix /nix/store/95cwv4q54dc6giaqv6q6p4r02ia2km35-blender-2.79

from 514 MiB to 18 MiB for an uncompressed binary cache, and from 192
MiB to 53 MiB for a bzipped binary cache. It may also be faster
because fetching can happen concurrently with decompression/writing.

Continuation of 48662d1.

Issue NixOS#1681.
Anton-Latukha added a commit to Anton-Latukha/nix that referenced this issue Jul 12, 2018
This reduces memory consumption of

  nix copy --from https://cache.nixos.org --to ~/my-nix /nix/store/95cwv4q54dc6giaqv6q6p4r02ia2km35-blender-2.79

from 176 MiB to 82 MiB. (The remaining memory is probably due to xz
decompression overhead.)

Issue NixOS#1681.
Issue NixOS#1969.
@vcunat
Copy link
Member

@vcunat vcunat commented Jul 22, 2018

The OOM condition is rather hard to handle, as it depends on the host OS. Typically it will let you allocate too much and then invoke an OOM killer later, so you don't have the option to react to the condition nicely.

@vaibhavsagar
Copy link
Member

@vaibhavsagar vaibhavsagar commented Sep 7, 2018

Has this been fixed in Nix 2.1?

@coretemp
Copy link

@coretemp coretemp commented Sep 18, 2018

Why is this critical issue not being addressed?

@edolstra edolstra closed this Sep 18, 2018
@nh2
Copy link
Contributor

@nh2 nh2 commented Sep 20, 2018

Has this been fixed in Nix 2.1?

@vaibhavsagar I think so.

Why is this critical issue not being addressed?

@coretemp It was addressed in 2825e05.

@coretemp

This comment was marked as disruptive content.

@domenkozar
Copy link
Member

@domenkozar domenkozar commented Sep 21, 2018

@coretemp please behave with respect and avoid Ad Hominem, as it does no good to anyone.

Nix is provided for free and comes with zero obligations from developers. If you'd like professional support, I'd recommend contacting some of the consulting companies: https://nixos.org/nixos/support.html

I'm locking this issue as nothing good can come out of this, if there's an issue with the recent fix, please open another issue describing the problem.

@domenkozar
Copy link
Member

@domenkozar domenkozar commented Jun 27, 2019

coretemp was banned since, so we can unlock.

@nh2
Copy link
Contributor

@nh2 nh2 commented Jun 27, 2019

I have backported @edolstra's memory fixes to Nix 2.0.4 (because I'm still using that in one place):

2.0.4...nh2:nh2-2.0.4-issue-1681-cherry-pick

Note this fixes the case where the machine that's running nixops runs out of memory.

@edolstra edolstra reopened this Jun 27, 2019
@nh2
Copy link
Contributor

@nh2 nh2 commented Jun 27, 2019

I think this issue is solved in Nix 2.2 at least for my use cases (given that my ram problems in nixops disappear in my backport, including #38808).

But it would make sense to ask around among the subscribers to this issues if you have observed any further nix copy or nix-copy-closure related memory problems since these commits landed.

If not, we can probably close this.

(There is still #2774 which says that 2.2 is used and which is relatively recent.)

So, does anybody here still have memory problems with current nix?

@AleXoundOS
Copy link

@AleXoundOS AleXoundOS commented Jun 29, 2019

So, does anybody here still have memory problems with current nix?

I have.
I'm the author of #2774. And even slowly started to write my own solution to the problem of downloading binary cache (using a reasonable amount of RAM). Also, here at my work, the lack of a ready to use mirroring solution is the main issue that currently prevents our company from using NixOS. Since no internet connection possible and everything needs to be downloaded beforehand.

@lordcirth

This comment was marked as off-topic.

@zimbatm

This comment has been hidden.

@tazjin
Copy link
Member

@tazjin tazjin commented Oct 8, 2019

So, does anybody here still have memory problems with current nix?

Yes, on Nix 2.2.2 I'm still seeing several GB of memory usage when substituting large paths (e.g. GHC) from a cache (as part of a larger build). This is problematic for running Nixery on something like Cloud Run where memory is hard-capped at 2GB.

I haven't yet tried this with 2.3 to see if it makes a difference, but it's on the todo-list.

Edit: I won't be able to test this with 2.3 easily, as it no longer works in gVisor even with my SQLite connection patch. Might get around to more advanced debugging during the weekend ...

@nagisa
Copy link

@nagisa nagisa commented Dec 11, 2019

I have observed this when copying a locally built output to a http cache:

nix copy --to 'http://localhost:3000' /nix/store/HASH-NAME-v0.1.0 --option narinfo-cache-negative-ttl 0 --option narinfo-cache-positive-ttl 0

and have observed nix copy to consume approximately the same amount of memory as data copied. That is, the output as reported by nix-copy was 8G for all the outputs it copies and I have seen nix-copy process to consume approximately as much.

The memory usage slowly but surely rises towards that number (and never goes down) as nix copy is compressing outputs.

@nagisa
Copy link

@nagisa nagisa commented Dec 11, 2019

I think what happens here is that nix copy stores the compressed result in the memory and then sends it all out in one go rather than streaming the data out as it compresses the nar.xz.

EDIT: nix version 2.3.1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet