New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
build.d: Reduce the default number of jobs in cgroups #15504
Conversation
Thanks for your pull request and interest in making D better, @tim-dlang! We are looking forward to reviewing it, and you should be hearing from a maintainer soon.
Please see CONTRIBUTING.md for more information. If you have addressed all reviews or aren't sure how to proceed, don't hesitate to ping us with a simple comment. Bugzilla referencesYour PR doesn't reference any Bugzilla issue. If your PR contains non-trivial changes, please reference a Bugzilla issue or create a manual changelog. Testing this PR locallyIf you don't have a local development environment setup, you can use Digger to test this PR: dub run digger -- build "master + dmd#15504" |
From the CircleCI documentation:
The program build.d uses totalCPUs to determine the number of jobs. The log on CircleCI for this pull request shows 36. This could explain random failures like for dlang/dlang.org#3681, because the jobs can run out of memory. |
3b9327f
to
03e5e5d
Compare
CircleCI runs the tests on a server with many CPUs, but restricts the number of CPUs and the memory using Linux cgroups. Using the total number of CPUs as the default number of jobs can result in many parallel DMD processes, which can consume too much memory. This can result in random failures. This commit tries to detect a reduced number of CPUs, so the number of jobs can be decreased.
03e5e5d
to
abb0e5f
Compare
} | ||
catch (ConvException) | ||
{ | ||
stderr.writeln("Warning: /sys/fs/cgroup/cpu/cpu.shares contains unknown value:", cpuSharesStr); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is an uninformative warning message. It should tell what it was looking for, what the consequence of the failure is, and suggest how to fix it.
I now think using the cpu.shares file is not the best way. The factor 1024 is the default and used by CircleCI for one CPU, but other environments could use different values. |
Any idea why it reports 36 instead of 32? |
I don't know exactly. Maybe they had only servers with 32 CPU cores when the documentation was written. Later they could have added servers with 36 cores. If they have different servers, that could also further explain, why the tests only sometimes run out of memory. |
Closing in favour of #15799 and dlang/dlang.org#3724. |
CircleCI runs the tests on a server with many CPUs, but restricts the
number of CPUs and the memory using Linux cgroups. Using the total
number of CPUs as the default number of jobs can result in many
parallel DMD processes, which can consume too much memory. This can
result in random failures.
This commit tries to detect a reduced number of CPUs, so the number of
jobs can be decreased.