-
Notifications
You must be signed in to change notification settings - Fork 165
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add a job that measures build memory consumption and time #315
Add a job that measures build memory consumption and time #315
Conversation
Example is here: https://github.com/paulgessinger/acts/runs/851467964 |
Codecov Report
@@ Coverage Diff @@
## master #315 +/- ##
==========================================
+ Coverage 48.32% 48.45% +0.13%
==========================================
Files 323 324 +1
Lines 16376 16280 -96
Branches 7603 7554 -49
==========================================
- Hits 7913 7889 -24
+ Misses 3178 3139 -39
+ Partials 5285 5252 -33
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would personally measure a RelWithDebInfo job, because...
- In one measurement you did on the CKF tests, there was quite a big difference in build RSS between Release builds and RelWithDebInfo builds (generating debug info for all those templates isn't free, I assume).
- The parts of Acts that have a big build overhead problem are unit tests. These will mostly be built by developers, and developers tend to care about debug symbols for all sorts of reasons (debugging, profiling, tracing, dynamic analysis...).
...but in any case, thanks for adding this, and I'm obviously in favor of it ;)
.github/workflows/perf.yml
Outdated
-DACTS_BUILD_EVERYTHING=ON | ||
-DCMAKE_EXPORT_COMPILE_COMMANDS=ON | ||
- name: Measure | ||
run: cmakeperf collect build/compile_commands.json -o perf.csv -j$(nproc) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you want stable time measurements, parallelism is dangerous because you don't know what's running in parallel with you and it may interact badly through some shared resources (memory bus, storage...).
Personally, I care most about RAM consumption right now, and think that timing measurements on cloud VMs are a lost cause, so I'm okay with this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right. I wouldn't give too much weight to the time measurements anyway. It speeds up the job a little bit, and the memory measurement should be robust enough with concurrency.
On |
Hmmm... this kind of relates to something that I was wondering about: wouldn't it make sense to time one of the existing builds instead of adding another build to the CI workflow ? (Also, out of curiosity, how do you know that the build ran out of disk space rather than, say, out of RAM ? The failure symptoms do not look super obvious to me...) |
Also, if disk usage is the issue, it intuitively sounds like you might be able to get away with amending your measurement script so that it deletes the .o file after every monitored compilation. This is obviously an incompatible alternative to the "altering one of the existing builds" strategy. You would need to filter out linking jobs or anything else which makes use of those files from the compilation database, though. |
That's exactly what I tried. Are the linker jobs even part of the compilation database? I didn't see any, at least. |
It's mostly a guess. If the kernel nukes processes because its OOM, you'll sometimes get output from the termination signal. If the VM manager kills the whole VM if it goes over disk, it just never prints any output. But yeah, could be that the VM manager also terminates the whole VM if it goes over some memory limit. I'm not sure. |
I could run the script, and then afterwards invoke |
Since the job still fails after removing the .o files, I suspect that this could be an OOM scenario. This is consistent with the fact that building with debug info consumes a lot more RAM, and with the fact that Github claims to provide 7GB of RAM per worker, which is too little to build with 2 cores and debug info according to my measurements (our biggest processes are 5GB in RelWithDebInfo mode here). It is also consistent with the fact that no output is printed. Given Github Actions's low disk space limits, I bet that they are not using swap, and Linux's behavior when running out of RAM without a swap partition is to instantly freeze without any sane recovery option, as I've experienced too many times while playing with If this is the issue, running this build at -j1 will work around the problem, and running the coverage build at -j1 may also work around its problem. |
Maybe. Yeah. Interesting. |
This is a cool tool, but I am not sure that it helps us in the CI (at least in the current form). To be a useful part of the CI we would need to have a comparison with a reference to be alerted when there are regressions (similar to what the coverage is doing). If you want to add it in its current form (as a first step), I would suggest to combine the coverage workflow and your new build performance one into a single |
Nevermind, I misunderstood your suggestion as merging the two jobs, instead of merely grouping them together in the GitHub UI. |
Right, but that is considerably more work. Maybe I'll throw something together, but not anytime soon.
Not sure why this would matter, honestly. |
So they are logically grouped together. Similar to the builds and the checks. |
Completely understandable. That is why mentioned that we could still add it as-is if you want to have this as a first prototype in the CI. |
Ok, joined the workflows, removed the |
On the positive side, it's running through this time, and we may have found a way to stabilize the coverage build without even having to exclude every optional Acts component from it. |
Ok. Care to approve @HadrienG2? |
This runs compilation units individually, based on a compilation database.
Currently, it only prints the output in two tables, one sorted by memory and sorted by compile time.
@msmk0, @HadrienG2 what do you think?