Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize Nix build CPU utilization with NIX_MAX_CORES #31965

Closed
wants to merge 2 commits into from

Conversation

orivej
Copy link
Contributor

@orivej orivej commented Nov 23, 2017

Motivation for this change

When build-cores is less than the number of CPUs, make -l$NIX_BUILD_CORES keeps the remaining CPUs unutilized, even when max-jobs is greater than 1. This is good if you want to dedicate those CPUs to something other than Nix builds, and it is bad if you want to balance build-cores with max-jobs to maximize utilization of the system without overloading it. To achieve the latter goal make -l should be given the optimal load average which is now available in $NIX_MAX_CORES.

The implementation mirrors Ninja:
https://github.com/ninja-build/ninja/blob/e234a7bd/src/ninja.cc#L222-L233
https://github.com/ninja-build/ninja/blob/e234a7bd/src/util.cc#L473-L481
(getconf _NPROCESSORS_ONLN works on Linux and Darwin.)

If keeping some CPUs unused is desirable, Nix may add an option and export NIX_MAX_CORES with its value for the builder.

Things done
  • Tested using sandboxing (nix.useSandbox on NixOS, or option build-use-sandbox in nix.conf on non-NixOS)
  • Built on platform(s)
    • NixOS
    • macOS
    • other Linux distributions
  • Tested via one or more NixOS test(s) if existing and applicable for the change (look inside nixos/tests)
  • Tested compilation of all pkgs that depend on this change using nix-shell -p nox --run "nox-review wip"
  • Tested execution of all binary files (usually in ./result/bin/)
  • Fits CONTRIBUTING.md.

@domenkozar
Copy link
Member

domenkozar commented Nov 23, 2017

Isn't that the whole point of the setting, to be able to choose? I think that default should be 0, not 1.

@orivej
Copy link
Contributor Author

orivej commented Nov 23, 2017

@domenkozar If you mean that NIX_BUILD_CORES by default should equal NIX_MAX_CORES, not 1, then I agree, but I'm just keeping the old behaviour, and the default in Nixpkgs does not matter because AFAIK Nix always passes some NIX_BUILD_CORES to the builder.

@vcunat
Copy link
Member

vcunat commented Nov 24, 2017

Sounds OK to me. In particular, that increase in guessParallelism will probably decrease the "oscillations" that I tend to see sometimes.

If we go ahead with this, it might be worth to make this value overridable like --cores is. I sometimes use -l (via --cores) for setting poor man's priorities – setting it slightly higher for one job will make other jobs use just a single core each as long as the prioritized one can fill the load...

@orivej orivej force-pushed the parallelism branch 2 times, most recently from 9361fcd to 769d156 Compare December 4, 2017 16:17
Copy link
Member

@vcunat vcunat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really like this, but we should get an opinion of some other member as well, since it's a change in stdenv behavior.

@vcunat
Copy link
Member

vcunat commented Dec 20, 2017

Let me try /cc @edolstra ("code owner" anyway).

@orivej
Copy link
Contributor Author

orivej commented Dec 20, 2017

This PR has stalled because it removes the ability to prioritize builds for you, and it is only useful if 0 < NIX_BUILD_CORES < $(nproc), but I have not found a single reason for the latter. On the contrary, I have become more convinced that this will not help with e.g. starving QEMU, see #32271 (comment)

When `build-cores` is less than the number of CPUs, `make -l$NIX_BUILD_CORES`
keeps the remaining CPUs unutilized even when `max-jobs` is greater than 1.
This is good if you want to dedicate those CPUs to something other than Nix
builds, and it is bad if you want to balance `build-cores` with `max-jobs` to
maximize utilization of the system without overloading it. To achieve the latter
goal `make -l` should be given the optimal load average which is now available
in `$NIX_MAX_CORES`.

The implementation mirrors Ninja:
https://github.com/ninja-build/ninja/blob/e234a7bd/src/ninja.cc#L222-L233
https://github.com/ninja-build/ninja/blob/e234a7bd/src/util.cc#L473-L481
@vcunat
Copy link
Member

vcunat commented Dec 20, 2017

I would like to have this for the 96-core aarch64 builder. We often get into a state where processes cannot fork, and I assume it's caused by occasional load oscillations when many builds fire up to 96 make jobs at once.

@vcunat
Copy link
Member

vcunat commented Dec 20, 2017

The prioritization is a tiny nitpick. A really proper solution would be via a machine-wide jobserver coordinating this.

@orivej
Copy link
Contributor Author

orivej commented Dec 20, 2017

My hypothesis (observed on my hydra) is that aarch64 is flooded by haskell builds that do not respect NIX_BUILD_CORES anyway, and the only temporary solution is to decrease build-max-jobs: #32697 (comment) . I think that a reasonably good solution is to increase ulimit -u far above the practical limit and introduce cgroup classifier: #32271 (comment)

@edolstra
Copy link
Member

I don't understand the implications of this PR. What are the semantics of NIX_MAX_CORES and NIX_BUILD_CORES?

@vcunat
Copy link
Member

vcunat commented Dec 20, 2017

@edolstra -l and -j parameters of make, respectively, i.e. allowing to control them separately.

@orivej
Copy link
Contributor Author

orivej commented Dec 20, 2017

What are the semantics of NIX_MAX_CORES and NIX_BUILD_CORES?

Both parameters limit the parallelism of the build.

  • NIX_BUILD_CORES caps the number of CPU-intersive processes a build may start.
  • NIX_MAX_CORES caps the number of CPU-intersive processes that may be running on the system at any given time (or on average). (Regardless of who started them. On Linux it is implemented by looking at /proc/loadavg.)

NIX_MAX_CORES was not meant to be an option (of Nix), it was meant to be a common value (in Nixpkgs) for use by make and ninja (the only tools implementing this feature) because the invocations of make and ninja are a bit scattered around Nixpkgs, and because make and ninja are defined in two different places yet need this shared value.

I don't understand the implications of this PR.

When NIX_BUILD_CORES = $(nproc), it has almost no effect. When 0 < NIX_BUILD_CORES < $(nproc), it allows make and ninja to utilize all cores. Currently they are restricted to keep load average at $NIX_BUILD_CORES, underutilizing CPU when there are more cores available.

@grahamc and @vcunat consider(ed) setting NIX_BUILD_CORES < $(nproc) on some Hydra machines, especially aarch64. This is why I made this PR (and also because capping load average with NIX_BUILD_CORES does not make sense). It might be a worthwhile experiment, but currently I do not expect the result to be positive.

@orivej
Copy link
Contributor Author

orivej commented Dec 20, 2017

NIX_MAX_CORES was not meant to be an option of Nix

No, this is not right. The code anticipates that Nix might start providing NIX_MAX_CORES and this motivated the choice of the name; but I was not aware of the reason why Nix would do it until #31965 (comment).

@mmahut
Copy link
Member

mmahut commented Aug 19, 2019

Are there any updates on this pull request, please?

@stale
Copy link

stale bot commented Oct 2, 2020

Hello, I'm a bot and I thank you in the name of the community for your contributions.

Nixpkgs is a busy repository, and unfortunately sometimes PRs get left behind for too long. Nevertheless, we'd like to help committers reach the PRs that are still important. This PR has had no activity for 180 days, and so I marked it as stale, but you can rest assured it will never be closed by a non-human.

If this is still important to you and you'd like to remove the stale label, we ask that you leave a comment. Your comment can be as simple as "still important to me". But there's a bit more you can do:

If you received an approval by an unprivileged maintainer and you are just waiting for a merge, you can @ mention someone with merge permissions and ask them to help. You might be able to find someone relevant by using Git blame on the relevant files, or via GitHub's web interface. You can see if someone's a member of the nixpkgs-committers team, by hovering with the mouse over their username on the web interface, or by searching them directly on the list.

If your PR wasn't reviewed at all, it might help to find someone who's perhaps a user of the package or module you are changing, or alternatively, ask once more for a review by the maintainer of the package/module this is about. If you don't know any, you can use Git blame on the relevant files, or GitHub's web interface to find someone who touched the relevant files in the past.

If your PR has had reviews and nevertheless got stale, make sure you've responded to all of the reviewer's requests / questions. Usually when PR authors show responsibility and dedication, reviewers (privileged or not) show dedication as well. If you've pushed a change, it's possible the reviewer wasn't notified about your push via email, so you can always officially request them for a review, or just @ mention them and say you've addressed their comments.

Lastly, you can always ask for help at our Discourse Forum, or more specifically, at this thread or at #nixos' IRC channel.

@stale stale bot added the 2.status: stale https://github.com/NixOS/nixpkgs/blob/master/.github/STALE-BOT.md label Oct 2, 2020
@omasanori
Copy link
Contributor

This PR seems promising. Is there any chance to be rebased and landed?

@stale stale bot removed the 2.status: stale https://github.com/NixOS/nixpkgs/blob/master/.github/STALE-BOT.md label Oct 23, 2020
@SuperSandro2000
Copy link
Member

@orivej can you please rebase and fix the merge conflicts?

@stale
Copy link

stale bot commented Jun 3, 2021

I marked this as stale due to inactivity. → More info

@stale stale bot added the 2.status: stale https://github.com/NixOS/nixpkgs/blob/master/.github/STALE-BOT.md label Jun 3, 2021
@Artturin
Copy link
Member

i rebased this in #141266

@stale stale bot removed the 2.status: stale https://github.com/NixOS/nixpkgs/blob/master/.github/STALE-BOT.md label Oct 11, 2021
@Artturin Artturin closed this Nov 1, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet