build.d: Reduce the default number of jobs in cgroups #15504

tim-dlang · 2023-08-05T17:54:51Z

CircleCI runs the tests on a server with many CPUs, but restricts the
number of CPUs and the memory using Linux cgroups. Using the total
number of CPUs as the default number of jobs can result in many
parallel DMD processes, which can consume too much memory. This can
result in random failures.

This commit tries to detect a reduced number of CPUs, so the number of
jobs can be decreased.

dlang-bot · 2023-08-05T17:54:53Z

Thanks for your pull request and interest in making D better, @tim-dlang! We are looking forward to reviewing it, and you should be hearing from a maintainer soon.
Please verify that your PR follows this checklist:

My PR is fully covered with tests (you can see the coverage diff by visiting the details link of the codecov check)
My PR is as minimal as possible (smaller, focused PRs are easier to review than big ones)
I have provided a detailed rationale explaining my changes
New or modified functions have Ddoc comments (with Params: and Returns:)

Please see CONTRIBUTING.md for more information.

If you have addressed all reviews or aren't sure how to proceed, don't hesitate to ping us with a simple comment.

Bugzilla references

Your PR doesn't reference any Bugzilla issue.

If your PR contains non-trivial changes, please reference a Bugzilla issue or create a manual changelog.

Testing this PR locally

If you don't have a local development environment setup, you can use Digger to test this PR:

dub run digger -- build "master + dmd#15504"

tim-dlang · 2023-08-05T18:03:33Z

From the CircleCI documentation:

Java, Erlang and any other languages that introspect the /proc directory for information about CPU count may require additional configuration to prevent them from slowing down when using the CircleCI resource class feature. Programs with this issue may request 32 CPU cores and run slower than they would when requesting one core. Users of languages with this issue should pin their CPU count to their guaranteed CPU resources.

The program build.d uses totalCPUs to determine the number of jobs. The log on CircleCI for this pull request shows 36. This could explain random failures like for dlang/dlang.org#3681, because the jobs can run out of memory.

CircleCI runs the tests on a server with many CPUs, but restricts the number of CPUs and the memory using Linux cgroups. Using the total number of CPUs as the default number of jobs can result in many parallel DMD processes, which can consume too much memory. This can result in random failures. This commit tries to detect a reduced number of CPUs, so the number of jobs can be decreased.

dkorpel · 2023-08-07T13:45:58Z

compiler/src/build.d

+            }
+            catch (ConvException)
+            {
+                stderr.writeln("Warning: /sys/fs/cgroup/cpu/cpu.shares contains unknown value:", cpuSharesStr);


This is an uninformative warning message. It should tell what it was looking for, what the consequence of the failure is, and suggest how to fix it.

tim-dlang · 2023-08-08T17:27:59Z

I now think using the cpu.shares file is not the best way. The factor 1024 is the default and used by CircleCI for one CPU, but other environments could use different values.
Maybe it would be better to set the number of jobs in the .circleci/run.sh files. The Makefiles would need to forward the parameter to build.d.

Imperatorn · 2023-11-08T06:23:27Z

Any idea why it reports 36 instead of 32?

tim-dlang · 2023-11-08T16:23:27Z

Any idea why it reports 36 instead of 32?

I don't know exactly. Maybe they had only servers with 32 CPU cores when the documentation was written. Later they could have added servers with 36 cores. If they have different servers, that could also further explain, why the tests only sometimes run out of memory.

tim-dlang · 2023-11-11T12:09:08Z

Closing in favour of #15799 and dlang/dlang.org#3724.

tim-dlang marked this pull request as draft August 5, 2023 17:54

tim-dlang force-pushed the circleci_failures branch 7 times, most recently from 3b9327f to 03e5e5d Compare August 5, 2023 18:48

tim-dlang force-pushed the circleci_failures branch from 03e5e5d to abb0e5f Compare August 5, 2023 19:16

tim-dlang changed the title ~~WIP: Log the number of jobs~~ build.d: Reduce the default number of jobs in cgroups Aug 5, 2023

tim-dlang marked this pull request as ready for review August 5, 2023 20:07

tim-dlang mentioned this pull request Aug 5, 2023

Allow underscores at more positions in grammar for numbers dlang/dlang.org#3681

Merged

dkorpel reviewed Aug 7, 2023

View reviewed changes

tim-dlang marked this pull request as draft August 8, 2023 17:28

dlang-bot added the stalled label Nov 7, 2023

tim-dlang mentioned this pull request Nov 7, 2023

Use consistent order for qualifiers in conversion table dlang/dlang.org#3704

Merged

This was referenced Nov 7, 2023

Document enum copying and assignment behavior dlang/dlang.org#3716

Merged

Update index.dd dlang/dlang.org#3698

Merged

dlang-bot removed the stalled label Nov 8, 2023

tim-dlang closed this Nov 11, 2023

tim-dlang mentioned this pull request Nov 13, 2023

Issue 24239 - dlang.org tests on CircleCI run out of memory #15799

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

build.d: Reduce the default number of jobs in cgroups #15504

build.d: Reduce the default number of jobs in cgroups #15504

tim-dlang commented Aug 5, 2023 •

edited

dlang-bot commented Aug 5, 2023

tim-dlang commented Aug 5, 2023

dkorpel Aug 7, 2023

tim-dlang commented Aug 8, 2023

Imperatorn commented Nov 8, 2023

tim-dlang commented Nov 8, 2023

tim-dlang commented Nov 11, 2023

build.d: Reduce the default number of jobs in cgroups #15504

build.d: Reduce the default number of jobs in cgroups #15504

Conversation

tim-dlang commented Aug 5, 2023 • edited

dlang-bot commented Aug 5, 2023

Bugzilla references

Testing this PR locally

tim-dlang commented Aug 5, 2023

dkorpel Aug 7, 2023

Choose a reason for hiding this comment

tim-dlang commented Aug 8, 2023

Imperatorn commented Nov 8, 2023

tim-dlang commented Nov 8, 2023

tim-dlang commented Nov 11, 2023

tim-dlang commented Aug 5, 2023 •

edited