build: use threads to speed up tar layer creation#77
Merged
Conversation
This gives nicer output and feedback from e.g. hyperfine and also propagates Ctrl-C properly.
I hit this weird issue running benchmarks between two different worktrees to compare their results and the chunkah from one worktree ended up being reused by the benchmark for another tree. I think this is basically the consequence of (1) using a shared cache for the cargo target dir, and (2) cargo using mtime for freshness which can be wrong (e.g. a worktree with older timestamps than the one that was last built would not trigger a rebuild). Fix this by making the cache name unique to the workdir.
By far the largest time spent during a build is in the tar layer creation (and specifically the SHA-256 calculations). Since tar layers are independent, we can pretty easily parallelize this. Do this by default, but add a `-T`/`--threads` knob (and `CHUNKAH_THREADS` env) to control this. This reduces split time for my workstation bootc image (2.5 GiB compressed) from 21s to 12.5s. Closes #15. Assisted-by: Claude Opus 4.6
There was a problem hiding this comment.
Code Review
This pull request introduces parallel processing for tar layer creation, which significantly speeds up the build process. The implementation is well-done, using scoped threads and an atomic counter for work distribution, which is a robust pattern. A new command-line argument and environment variable are added to control the number of threads, with auto-detection as a sensible default. My feedback includes a couple of suggestions for improving code clarity and robustness in the new parallel processing logic.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
By far the largest time spent during a build is in the tar layer creation (and specifically the SHA-256 calculations). Since tar layers are independent, we can pretty easily parallelize this.
Do this by default, but add a
-T/--threadsknob (andCHUNKAH_THREADSenv) to control this.This reduces split time for my workstation bootc image (2.5 GiB compressed) from 21s to 12.5s.
Closes #15.
Assisted-by: Claude Opus 4.6