-
Notifications
You must be signed in to change notification settings - Fork 386
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
buildkitd stuck at 100% CPU for 5 minutes during SAVE ARTIFACT for ~20 MiB output #1187
Comments
Hi @mologie, your assumptions could be correct. A few questions:
|
Hi @vladaionescu, thank you for your response and sorry for the delay on my end. The CI changes I am working on do not receive a high priority in my team right now but rest assured this is being worked on, we love Earthly so far, and hope to be able to contribute back here. Thank you first of all for confirming I /could/ be on the right track. I will continue investigating! If you mind having this open with the sparse info I have given then feel free to close, otherwise I will keep documenting my findings here probably beginning on next week. My steps to reproduce will likely have to contain an Earthfile combined with a Fedora CoreOS machine config and shell scripts to spawn a VM with qemu or vmware-vmx. I was not yet able to reproduce outside of FCOS with e.g. just Ubuntu and Docker. Just to answer your questions in advance:
It runs natively,
An image for the cross-compiler is collected with
Negative, the issue can be observed only in CI, but not on my computer. CI runs Fedora CoreOS 34.20210808.3.0 with Docker 20.10.7.
I appreciate the offer! So far I have failed to reliably find a minimal reproducible example because it looks like something in my CI environment is the deciding factor, like described above. All builds in our CI env are slower with Earthly (compared to just running Docker), but the massive difference is only observed with the cross-build with large build container. |
Some more things to try:
|
I have a similar issue, which seems tied to pushing an image to the registry. All the build steps are cached and then Earthly takes a while before it begins pushing the image. This is running with
During the ongoing lines, buildkitd is running at 100% CPU (only using a single thread apparently) |
If I remove The image that is eventually pushed to the registry is only about 100 MB |
If I replace |
It's very possible - I think I've noticed that correlation too in the past. Could be some internal buildkit cache prep routine, but not sure. |
I've started seeing this on my builds now too. Anything potentially useful to diagnose this behavior? |
Tailing the logs of buildkit I see something like
At this point buildkitd is using 300% CPU and just spinning |
$ perf top -g
|
Might be related moby/buildkit#2009 |
Additionally when cancelling the build, the buildkitd instance is still busy doing whatever it was stuck on. |
This is happening for us as well in the I have removed all cache hints, and am still seeing the issue as soon as I add the |
My only workaround is to not use the cache features |
I wonder if there's some kind of infinite loop in the Buildkit code on computing the inline cache metadata. If anyone has a minimal reproduction Earthfile that we can run, it would help tremendously to pinpoint the exact cause. |
@vladaionescu I'm using explicit caching (without any additional hints), so it's probably not just happening on inline caching.
|
+1 for this issue. It really took some time to figure this out since it's just stuck forever without any error when using |
We have identified an issue and have fixed it in Earthly v0.6.20. Closing this as resolved. |
There are signs that this issue still reproduces after the fix too. Reopening... |
Having the same problem in GitHub actions, 2 hours with --ci, 10 minutes without. |
Versions: earthly 0.5.22 (in --ci mode), docker 20.10.7
Hi, I am trying to migrate an existing GitLab CI pipeline to Earthly but I am hitting a major performance roadblock. Welcome to contribute to a solution here, but as it stands I would need some pointers to start diagnosing this properly.
Symptoms: A cross-compilation task, which takes 9 minutes with Docker on the same build machine I run Earthly with takes 14 minutes with Earthly. Splitting it up into 4 sub-tasks, each taking roughly 2m20s indivudually, take over 7 minutes individually with Earthly!
I found that the time difference is explained by the
SAVE ARTIFACT
step. Earthly prints "ongoing" messages, and buildkitd shows 100% CPU usage. It eventually completes and produces the correct result:The container, from which artifacts are saved, weights roughly 8 GiB. However, the exported artifacts are only 20 MiB large.
Indeed the I/O rate of ~22MiB/s average read and delay of roughly 5 minutes suggests that it's probably reading/exporting the whole 8 GiB build container, and thereby gets CPU-bottlenecked by compression (note how it writes 6 MiB/s).
Is my assumption plausible? If so, how can I prevent it, or could Earthly be improved to directly export artifacts over some sidechannel instead of reading in the whole build image?
The text was updated successfully, but these errors were encountered: