Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

'nixops deploy' exits with 'Too many heap sections: Increase MAXHINCR or MAX_HEAP_SECTS' #287

Closed
soenkehahn opened this issue Jun 17, 2014 · 34 comments
Assignees
Labels

Comments

@soenkehahn
Copy link
Contributor

After invoking nixops deploy -j4 ... I got this error message:

Too many heap sections: Increase MAXHINCR or MAX_HEAP_SECTS
nix-instantiate killed by signal 6

The same command worked with -j3.

The full command was:

 nixops deploy -d some_deployment -I .. --read-write --option binary-caches http://hydra.nixos.org -j4
@soenkehahn
Copy link
Contributor Author

I have been running into this more frequently now. It feels very much non-deterministic.

@edolstra
Copy link
Member

Is this with a very large network?

@shlevy
Copy link
Member

shlevy commented Jun 19, 2014

Single machine.

@soenkehahn
Copy link
Contributor Author

I was hitting this again and was able to work around it by setting the environment variable GC_MAXIMUM_HEAP_SIZE to something big (5G worked).

@phunehehe
Copy link
Contributor

For future reference, Nix uses boehm-gc for garbage collection.

[5:23:35 PM] Soenke Hahn: That has a limit for the amount of memory that is allowed to be allocated.
[5:23:56 PM] Soenke Hahn: When the evaluation takes up more memory it crashes.
[5:24:17 PM] Soenke Hahn: Fortunately the library allows to modify that memory limit through GC_MAXIMUM_HEAP_SIZE.

@domenkozar
Copy link
Member

Now also happen on hydra while doing nix-build nixos/release-combined.nix

@jgeerds
Copy link
Member

jgeerds commented Sep 19, 2014

Is someone working on this issue? (especially hydra)

@lucabrunox
Copy link
Contributor

It should be fixed in recent master.

On Fri, Sep 19, 2014 at 8:00 PM, Jascha Geerds notifications@github.com
wrote:

Is someone working on this issue? (especially hydra)


Reply to this email directly or view it on GitHub
#287 (comment).

www.debian.org - The Universal Operating System

@jgeerds
Copy link
Member

jgeerds commented Sep 19, 2014

@lethalman: Great! So nixos-rebuild --upgrade will work again? (or in a few hours/days)

@lucabrunox
Copy link
Contributor

The problem with the GC has been solved, but now hydra is faster at
evaluation and thus it queues jobs faster :P So we got another problem
@edolstra
http://hydra.nixos.org/jobset/nixos/trunk-combined#tabs-evaluations

On Fri, Sep 19, 2014 at 9:48 PM, Jascha Geerds notifications@github.com
wrote:

@lethalman https://github.com/lethalman: Great! So nixos-rebuild
--upgrade will work again? (or in a few hours/days)


Reply to this email directly or view it on GitHub
#287 (comment).

www.debian.org - The Universal Operating System

@jgeerds
Copy link
Member

jgeerds commented Sep 22, 2014

Hopefully this will be fixed :-)

@shlevy
Copy link
Member

shlevy commented Oct 8, 2014

Been hitting this with 1.8pre3823_53b044c

@shlevy
Copy link
Member

shlevy commented Oct 15, 2014

@edolstra Can you suggest anything we can do to profile/investigate this? This keeps hitting us.

@edolstra
Copy link
Member

Try the latest Nix version. Commit 6bb4c0b should improve garbage collection quite a bit.

Also, you could build boehmgc with enableLargeConfig = true. In my experience, it makes the Too many heap sections message go away, but actually increases memory use. But that was before 6bb4c0b, it might be better now.

@shlevy
Copy link
Member

shlevy commented Oct 24, 2014

@edolstra no help, unfortunately. Any other ideas here?

@shlevy
Copy link
Member

shlevy commented Nov 9, 2014

@edolstra ping

@shlevy
Copy link
Member

shlevy commented Nov 24, 2014

@edolstra ping?

@edolstra
Copy link
Member

Sorry, no ideas. I haven't seen this message myself in a while. And I don't think I've ever seen it on a single-machine network, only on large Hydra jobset evaluations.

@shlevy
Copy link
Member

shlevy commented Nov 24, 2014

@edolstra Any advice for investigating this ourselves?

@edolstra
Copy link
Member

Not really, sorry. Have you tried doing what the message suggests (namely increase MAXHINCR or MAX_HEAP_SECTS)?

@wmertens
Copy link
Contributor

How about building a vm that reproduces the problem so we can all have a
look?

On Tue, Nov 25, 2014, 12:48 Eelco Dolstra notifications@github.com wrote:

Not really, sorry. Have you tried doing what the message suggests (namely
increase MAXHINCR or MAX_HEAP_SECTS)?


Reply to this email directly or view it on GitHub
#287 (comment).

@domenkozar
Copy link
Member

I can reproduce this in current nixpkgs master, though it doesn't hit the limit.

$ nix-build nixos/release-combined.nix -A tested                                                                                                                                                                                             
GC Warning: Repeated allocation of very large block (appr. size 135168):
        May lead to memory leak and poor performance.
GC Warning: Repeated allocation of very large block (appr. size 135168):
        May lead to memory leak and poor performance.
GC Warning: Repeated allocation of very large block (appr. size 151552):
        May lead to memory leak and poor performance.
GC Warning: Repeated allocation of very large block (appr. size 131072):
        May lead to memory leak and poor performance.
GC Warning: Repeated allocation of very large block (appr. size 151552):
        May lead to memory leak and poor performance.
GC Warning: Repeated allocation of very large block (appr. size 151552):
        May lead to memory leak and poor performance.
GC Warning: Repeated allocation of very large block (appr. size 151552):
        May lead to memory leak and poor performance.
GC Warning: Repeated allocation of very large block (appr. size 151552):
        May lead to memory leak and poor performance.
GC Warning: Repeated allocation of very large block (appr. size 151552):
        May lead to memory leak and poor performance.
GC Warning: Repeated allocation of very large block (appr. size 151552):
        May lead to memory leak and poor performance.
GC Warning: Repeated allocation of very large block (appr. size 151552):
        May lead to memory leak and poor performance.
GC Warning: Repeated allocation of very large block (appr. size 151552):
        May lead to memory leak and poor performance.
GC Warning: Repeated allocation of very large block (appr. size 151552):
        May lead to memory leak and poor performance.
GC Warning: Repeated allocation of very large block (appr. size 151552):
        May lead to memory leak and poor performance.
GC Warning: Repeated allocation of very large block (appr. size 151552):
        May lead to memory leak and poor performance.
GC Warning: Repeated allocation of very large block (appr. size 151552):
        May lead to memory leak and poor performance.
GC Warning: Repeated allocation of very large block (appr. size 151552):
        May lead to memory leak and poor performance.
GC Warning: Repeated allocation of very large block (appr. size 151552):
        May lead to memory leak and poor performance.
GC Warning: Repeated allocation of very large block (appr. size 151552):
        May lead to memory leak and poor performance.
GC Warning: Repeated allocation of very large block (appr. size 151552):
        May lead to memory leak and poor performance.
GC Warning: Repeated allocation of very large block (appr. size 151552):
        May lead to memory leak and poor performance.
GC Warning: Repeated allocation of very large block (appr. size 151552):
        May lead to memory leak and poor performance.
GC Warning: Repeated allocation of very large block (appr. size 151552):
        May lead to memory leak and poor performance.
GC Warning: Repeated allocation of very large block (appr. size 151552):
        May lead to memory leak and poor performance.
GC Warning: Repeated allocation of very large block (appr. size 151552):
        May lead to memory leak and poor performance.
GC Warning: Repeated allocation of very large block (appr. size 151552):
        May lead to memory leak and poor performance.
GC Warning: Repeated allocation of very large block (appr. size 151552):
        May lead to memory leak and poor performance.
GC Warning: Repeated allocation of very large block (appr. size 151552):
        May lead to memory leak and poor performance.
GC Warning: Repeated allocation of very large block (appr. size 151552):

@edolstra
Copy link
Member

edolstra commented Dec 2, 2014

@iElectric Right. But that's a pretty big evaluation (containing dozens of NixOS VMs), not a single machine case.

@domenkozar
Copy link
Member

The error is back on master: http://hydra.nixos.org/jobset/nixos/trunk-combined

@aszlig
Copy link
Member

aszlig commented Mar 6, 2015

Related: NixOS/nixpkgs#3594

@domenkozar
Copy link
Member

domenkozar added a commit to snabblab/snabblab-nixos that referenced this issue May 26, 2016
@domenkozar
Copy link
Member

I'm got the same error with 100 nodes deployed via NixOps to EC2.

GC_INITIAL_HEAP_SIZE=$((8*1024*1024*1024)) fixes it, but uses almost 13GB of ram (barely to fit on our 16GB machine).

@domenkozar
Copy link
Member

@volth see https://ac.els-cdn.com/S157106610900396X/1-s2.0-S157106610900396X-main.pdf?_tid=14162726-dce0-11e7-a77d-00000aacb361&acdnat=1512824242_7e9551614d8141a063f6582e02c10e8f

If I understood @edolstra correctly, it's hard to implement GC on top of it. In Nixops it would probably pay off turning GC off and sharing memory.

Short term solution is to get nixops to evaluate each machine separately, in multiprocess manner.

Currently, NixOS evaluation grows linearly, meaning if one machine takes 100MB of memory to evaluate, once you have 100 machines it takes ~10GB of memory.

@orivej
Copy link
Contributor

orivej commented Dec 13, 2017

There is a significant memory usage improvement in Nixpkgs staging: NixOS/nixpkgs#32544

@domenkozar
Copy link
Member

@wmertens
Copy link
Contributor

wmertens commented Aug 6, 2018

@edolstra A thought: using eval time as a metric for cache eviction

Would it be hard to keep track of how long it took to evaluate an expression, and use that to decide which expressions to memoize?

So if you could somehow say "cache should be below 100MB", and then when the cache is bigger, you evict items sorted by increasing eval time?

(possibly this is a trivial concept to you and not possible to implement, I just thought of it and wondered if that was a worthwhile approach to improving memory usage)

@stale
Copy link

stale bot commented Feb 16, 2021

I marked this as stale due to inactivity. → More info

@stale stale bot added the stale label Feb 16, 2021
@stale
Copy link

stale bot commented Apr 29, 2022

I closed this issue due to inactivity. → More info

@fricklerhandwerk
Copy link
Contributor

Closing this as it's very likely not relevant any more. Reopen if needed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

13 participants