Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory usage in eval #8621

Open
SaltyKitkat opened this issue Jul 1, 2023 · 12 comments
Open

Memory usage in eval #8621

SaltyKitkat opened this issue Jul 1, 2023 · 12 comments
Labels
language The Nix expression language; parser, interpreter, primops, evaluation, etc performance

Comments

@SaltyKitkat
Copy link

Just eval my nixos profile takes about 1G ram. It's kind of too much for me. And when running something like nixpkgs-review, nix will just take more and more and more ram.

Is this by design?

Or is there any way I can reduce the memory usage?

time -v nix eval --raw .#nixosConfigurations.SaltyKitkat.config.system.build.toplevel
/nix/store/v0dh21kn18a74d6gk6ayvcawprcywd65-nixos-system-SaltyKitkat-23.11.20230629.4bc72ca	Command being timed: "nix eval --raw .#nixosConfigurations.SaltyKitkat.config.system.build.toplevel"
	User time (seconds): 5.28
	System time (seconds): 0.75
	Percent of CPU this job got: 77%
	Elapsed (wall clock) time (h:mm:ss or m:ss): 0:07.77
	Average shared text size (kbytes): 0
	Average unshared data size (kbytes): 0
	Average stack size (kbytes): 0
	Average total size (kbytes): 0
	Maximum resident set size (kbytes): 1046296
	Average resident set size (kbytes): 0
	Major (requiring I/O) page faults: 2
	Minor (reclaiming a frame) page faults: 270506
	Voluntary context switches: 43679
	Involuntary context switches: 152
	Swaps: 0
	File system inputs: 123200
	File system outputs: 0
	Socket messages sent: 0
	Socket messages received: 0
	Signals delivered: 0
	Page size (bytes): 4096
	Exit status: 0

nix-env run by nixpkgs-review

Command being timed: "nix-env --extra-experimental-features no-url-literals --option system x86_64-linux -f /home/***/.cache/nixpkgs-review/rev-0df1938e62e6084894afab9846e5a842e0091833/nixpkgs -qaP --xml --out-path --show-trace --no-allow-import-from-derivation"
	User time (seconds): 80.84
	System time (seconds): 3.40
	Percent of CPU this job got: 89%
	Elapsed (wall clock) time (h:mm:ss or m:ss): 1:34.38
	Average shared text size (kbytes): 0
	Average unshared data size (kbytes): 0
	Average stack size (kbytes): 0
	Average total size (kbytes): 0
	Maximum resident set size (kbytes): 10705384
	Average resident set size (kbytes): 0
	Major (requiring I/O) page faults: 18322
	Minor (reclaiming a frame) page faults: 3116626
	Voluntary context switches: 1565
	Involuntary context switches: 884
	Swaps: 0
	File system inputs: 41600
	File system outputs: 40
	Socket messages sent: 0
	Socket messages received: 0
	Signals delivered: 0
	Page size (bytes): 4096
	Exit status: 0
@roberth
Copy link
Member

roberth commented Jul 1, 2023

We are aware that Nix evaluation tends to consume significant amounts of memory.
Causes and potential causes I'm aware of

@roberth roberth added language The Nix expression language; parser, interpreter, primops, evaluation, etc performance labels Jul 1, 2023
@jsoo1
Copy link

jsoo1 commented Jul 1, 2023

I want to add the boehm garbage collector is a conservative collector that does not allow heap compaction.

I was hoping to spark some interest in assessing the mark-region algorithm as a possible new garbage collection algorithm for nix because it allows for heap compaction. There are some existing implementations in rust (immix) and c (whippet). In particular the whippet implementation seems relevant to nix because it has zero dependencies and has boehm-compatible api.

@roberth
Copy link
Member

roberth commented Jul 1, 2023

@jsoo1 Interesting! Would you be interested in giving whippet a try? I've added notes about gc.

@jsoo1
Copy link

jsoo1 commented Jul 1, 2023

Would you be interested in giving whippet a try? I've added notes about gc.

@roberth sweet! Yes I would be interested! I was planning on setting aside some time for it if there seemed to be interest from the team.

@roberth
Copy link
Member

roberth commented Jul 1, 2023

Let's move the discussion of replacing the GC over to #8626

@SaltyKitkat
Copy link
Author

Thanks for the summary!

Since there's already memory leaks, I'm wondering if the gc is working as expected and maybe just improve the gc makes no sence if the most memory usage is by the leaked memory.

@roberth
Copy link
Member

roberth commented Jul 5, 2023

I don't expect the GC itself to be broken, and I don't expect many leaks from it being conservative either.
It manages to collect an amount about equal to the final heap size in a typical evaluation by ofborg (ie half of allocations are collected). It is hard to know how much it should be able to collect though.
So that makes your question a good one, which could perhaps be answered with a combination of profiling and debugging, although we might need custom tooling to really start relating expressions to the heap and gc.

@majewsky
Copy link

majewsky commented Dec 9, 2023

I ran into this while upgrading from NixOS 23.05 to 23.11 on my cloud VM with 2G of RAM. nix-build itself took 1G of that, and also there were some server services running, taking up about 500M, leaving only 500M for the actual derivation builds. Naturally it OOM'd kind of a lot.

I worked around that by taking the derivation file paths from the these NNN derivations will be built: output, pasting that into a file and running xargs -n1 nix-build < derivations.txt. Not sure if the -n1 also helped, but it feels like some gains could be had here by separating the two phases. I will happily be corrected if I'm working off incorrect assumptions, but it appears to me that the memory usage of nix-build is all related to Nix expressions, which at this point in the build process are entirely unneeded, since all the required information exists in the .drv files. Maybe the Nix expression evaluation could happen in a separate process that then terminates before nix-build moves on to building the derivations, or the Nix expressions could be allocated in an arena that is freed all at once after evaluation is done, or something like that?

That would not solve the original problem, and looking into a different GC still sounds valuable, but it might make the problem less acute for a portion of affected users.

@roberth
Copy link
Member

roberth commented Dec 9, 2023

Regarding freeing the expressions, a starting point would be #5747 (comment), but also making sure to destruct EvalState and the expression cache.

If you have really small machines to deploy to, you might want to use nixos-rebuild --target-host. That will neither build nor evaluate on the target machine.

@majewsky
Copy link

nixos-rebuild --target-host is a good hint and I will take that under consideration. But for what it's worth, that does not solve OOM during auto-upgrades as triggered by system.autoUpgrade.enable = true; as far as I can see.

@thkoch2001
Copy link
Contributor

CC @astro FYI

While learning nix and nix flakes, this command freezed my dear and at that point mostly idle 16GB laptop, eating >10GB:

nix flake show microvm

shortened output:

github:astro/microvm.nix/7bd9255e535c8cbada7f574ddd3bcf3bfa5e1eae                                                                                                             
├───apps                                                                                                                                                                      
│   ├───aarch64-linux                                                                                                                                                         
│   │   ├───graphics: app                                                                                                                                                     
│   │   ├───qemu-vnc: app                                                                                                                                                     
│   │   ├───vm: app                                                                                                                                                           
│   │   └───waypipe-client: app                                                                                                                                               
│   └───x86_64-linux                                                                                                                                                          
│       ├───graphics: app                                                              
│       ├───qemu-vnc: app                                                                                                                                                     
│       ├───vm: app                                                                                                                                                           
│       └───waypipe-client: app                                                                                                                                               
├───defaultTemplate: template: Flake with MicroVMs                                                                                                                            
├───hydraJobs                                                                                                                                                                 
│   ├───aarch64-linux                                                                  
│   │   ├───cloud-hypervisor-overlay-shutdown-command: derivation 'microvm-test-shutdown-command'
[...SNIP...]
│   │   └───vm-stratovirt-iperf: derivation 'vm-stratovirt-iperf'                                                                                                             
error: interrupted by the user                                                                                                                                                
nix flake show microvm  58,38s user 4,46s system 92% cpu 1:07,85 total

The output is actually from a run after I found https://github.com/rfjakob/earlyoom - You might want to recommend this nice tool somewhere!

Please don't get this issue site tracked by me. I just thought it might be interesting to mention earlyoom in this issue and have an example on how to reliably eat a lot of memory.

@roberth
Copy link
Member

roberth commented Dec 21, 2023

NixOS/rfcs#163 may reduce memory use for NixOS, by virtue of not having to load service modules that aren't used.

It's one solution among potentially others, such as #9650 for cases like show microvm.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
language The Nix expression language; parser, interpreter, primops, evaluation, etc performance
Projects
None yet
Development

No branches or pull requests

5 participants