New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
nix copy uses too much memory #1681
Comments
Intuitively it feels that it should be possible for it to run in constant memory. What am I missing? |
I'm encountering this issue with a single path — |
Continuation of 97002b6. This makes the daemon use constant memory. For example, it reduces the daemon's maximum RSS on $ nix copy --from ~/my-nix --to daemon /nix/store/1n7x0yv8vq6zi90hfmian84vdhd04bgp-blender-2.79a from 264 MiB to 7 MiB. We now use a TunnelSource to prevent the connection from ending up in an undefined state if an exception is thrown while the NAR is being sent. Issue NixOS#1681.
This reduces memory consumption of nix copy --from file://... --to ~/my-nix /nix/store/95cwv4q54dc6giaqv6q6p4r02ia2km35-blender-2.79 from 514 MiB to 18 MiB for an uncompressed binary cache, and from 192 MiB to 53 MiB for a bzipped binary cache. It may also be faster because fetching can happen concurrently with decompression/writing. Continuation of 48662d1. Issue NixOS#1681.
This reduces memory consumption of nix copy --from https://cache.nixos.org --to ~/my-nix /nix/store/95cwv4q54dc6giaqv6q6p4r02ia2km35-blender-2.79 from 176 MiB to 82 MiB. (The remaining memory is probably due to xz decompression overhead.) Issue NixOS#1681. Issue NixOS#1969.
I see commits purporting to address this for a number of different cases, but none concerning uploading to a S3 bucket. Trying to copy a 2.8GB store path to a S3 bucket took nearly 4GB of memory and more than twenty minutes of 100% CPU. Has that been fixed? |
This reduces memory consumption of nix copy --from file://... --to ~/my-nix /nix/store/95cwv4q54dc6giaqv6q6p4r02ia2km35-blender-2.79 from 514 MiB to 18 MiB for an uncompressed binary cache, and from 192 MiB to 53 MiB for a bzipped binary cache. It may also be faster because fetching can happen concurrently with decompression/writing. Continuation of 48662d1. Issue NixOS#1681.
This reduces memory consumption of nix copy --from https://cache.nixos.org --to ~/my-nix /nix/store/95cwv4q54dc6giaqv6q6p4r02ia2km35-blender-2.79 from 176 MiB to 82 MiB. (The remaining memory is probably due to xz decompression overhead.) Issue NixOS#1681. Issue NixOS#1969.
Hitting this issue trying to do something like - nixos-rebuild build ; nix copy ./result --to ssh://low_ram_machine @dtzWill will those experimental changes help with ssh copy? |
@Ralith I'm probably not going to make S3BinaryCacheStore do uploads in constant space. It might not even be supported by aws-sdk-cpp. I assume the 100% CPU is caused by compression, which you can disable. |
FWIW I too am another big-upload-to-S3 guy using It would surprise me if aws-sdk-cpp didn't support it, given that S3 supports almost arbitrarily large objects and multi-part uploads. If someone figured out how to implement it, would you accept the PR? |
It seems very strange that it would take twenty minutes on my i7-4980HQ, even so. 2.8GB is big but it's not that big. |
IIRC xz compression can easily take that long. |
This is what I am seeing too:
The main problem I see is that it merely says "out of memory", instead of saying how much it tried to allocate, and how much was available before the allocation in the error message. Copying data should run in constant space as others have already mentioned. If the compression is causing higher memory requirements than needed, this is a problem too, because it raises the hosting costs for no reason other than the initial deployment. Before the deployment at least 300MB was available on host |
FWIW it looks like they do support streaming at least for fetches: https://sdk.amazonaws.com/cpp/api/LATEST/index.html (Near end, look for IOStreams). Hopefully upload has similar. Seconded re:xz compression taking that long. There's an option somewhere to enable parallel xz compression is you have idle cores. IIRC the result will be slightly bigger for the same compression level. Anyway, if someone tackled the API spelunking would it be welcome? Or is there a reason that will have problems or is a bad idea? EDIT: oops I think we already use the stream thing, although at a glance it looks like we pull it all into a string but that seems resolvable. Anyway fetch from s3 is probably not as important. |
As far as I can tell, the fixes in 2.0.1 still don't really fix the issue. |
@lheckemann IIRC we didn't cherry-pick any memory improvements in 2.0.1. You need master for some of the fixes or my experimental branch for the rest. |
Oh, that would explain it! Any chance they could be included in a 2.0.2 release? There have been so many complaints about this issue on IRC and I've run into it myself more times than I would like as well. |
Does "nixops deploy" use this? I get out of memory during deploy, even though I have several gigabytes free (both on disk and working memory) which is odd. Just wondering if this is addressed here or should be investigated further. |
@SebastianCallh you are not specifying which machine goes out of memory, so I assume you don't know it's talking about the machine you are deploying to. The solution to this is to use 512MB of swap. Perhaps I might commit some of my changes to fix this in an AWS environment when t2.nanos are being used, but only if there is interest in them from people with commit access. |
@coretemp That was the machine I was referring to. The machine being deployed too has plenty of both disk and working memory to spare when the error occurs. |
Continuation of 97002b6. This makes the daemon use constant memory. For example, it reduces the daemon's maximum RSS on $ nix copy --from ~/my-nix --to daemon /nix/store/1n7x0yv8vq6zi90hfmian84vdhd04bgp-blender-2.79a from 264 MiB to 7 MiB. We now use a TunnelSource to prevent the connection from ending up in an undefined state if an exception is thrown while the NAR is being sent. Issue #1681.
coretemp was banned since, so we can unlock. |
I have backported @edolstra's memory fixes to Nix 2.0.4...nh2:nh2-2.0.4-issue-1681-cherry-pick Note this fixes the case where the machine that's running |
I think this issue is solved in Nix 2.2 at least for my use cases (given that my ram problems in nixops disappear in my backport, including #38808). But it would make sense to ask around among the subscribers to this issues if you have observed any further If not, we can probably close this. (There is still #2774 which says that 2.2 is used and which is relatively recent.) So, does anybody here still have memory problems with current nix? |
I have. |
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
Yes, on Nix 2.2.2 I'm still seeing several GB of memory usage when substituting large paths (e.g. GHC) from a cache (as part of a larger build). This is problematic for running Nixery on something like Cloud Run where memory is hard-capped at 2GB. I haven't yet tried this with 2.3 to see if it makes a difference, but it's on the todo-list. Edit: I won't be able to test this with 2.3 easily, as it no longer works in gVisor even with my SQLite connection patch. Might get around to more advanced debugging during the weekend ... |
I have observed this when copying a locally built output to a http cache:
and have observed The memory usage slowly but surely rises towards that number (and never goes down) as |
I think what happens here is that EDIT: nix version 2.3.1 |
I marked this as stale due to inactivity. → More info |
Is there a plan to fix this for |
I think it is fixed on master. |
@Ericson2314 I mean the specific issue of |
This issue has been mentioned on NixOS Discourse. There might be relevant details there: https://discourse.nixosstag.fcio.net/t/code-of-conduct-or-whatever/2134/21 |
This issue has been mentioned on NixOS Discourse. There might be relevant details there: https://discourse.nixos.org/t/2-nov-2021-discourse-migration-to-flying-circus/15752/7 |
I marked this as stale due to inactivity. → More info |
I'm running nix copy in runInLinuxVM, and notice that for any nontrivial closures, the VM will run out of memory during the copying process. I left it set at the default 512 megabytes. I could obviously increase the amount of memory the VM is given, but that doesn't scale for copying complex derivations with many dependencies.
I suggest adding an option to only load and copy the contents of the paths one at a time, or even better, a way to specify an upper bound on the memory to be used while copying.
The text was updated successfully, but these errors were encountered: