-
-
Notifications
You must be signed in to change notification settings - Fork 10.9k
Fix race condition in concurrent downloads for duplicate URLs #21468
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
This commit fixes the issue by: - Adding @downloads_by_location hash to track downloads by file path - Deduplicating downloads based on cached_location instead of Downloadable object identity - Ensuring multiple downloadables sharing the same file reuse the same download future Fixes Homebrew#21425
MikeMcQuaid
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @ooye-sanket, great work so far, some questions!
Co-authored-by: Mike McQuaid <mike@mikemcquaid.com>
MikeMcQuaid
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for clarifying! A few more questions ❤️
That makes sense 👍 Since the actual operation is already deduplicated to one download per file, showing output per file would be more accurate and simpler for users. Instead of rendering progress for each downloadable (formula + resource), we could render a single progress entry per cached_location. I can refactor the fetch loop to iterate over |
Good catch! Yes, downloads.clear
@downloads_by_location.clear
@symlink_targets.clear |
For |
|
@MikeMcQuaid should i implement: refactoring to use
if yes, then I'll push the changes shortly. |
|
@ooye-sanket yes, this sounds great, thank you so much 😍 |
- Remove downloads hash entirely - Show one progress line per file instead of per downloadable - Add cleanup for @symlink_targets and @downloadable_to_location - Simplifies code by eliminating redundant data structure
|
@MikeMcQuaid Refactored as discussed! Changes:
|
MikeMcQuaid
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @ooye-sanket! Code looks good. Only issue now from local testing is we want to retain the existing format of e.g.
==> Fetching downloads for: ffmpeg-full
✔︎ Bottle Manifest ffmpeg-full (8.0.1) Downloaded 108.1KB/108.1KB
✔︎ Bottle Manifest giflib (5.2.2) Downloaded 10.8KB/ 10.8KB
✔︎ Bottle giflib (5.2.2) Downloaded 153.0KB/153.0KB
✔︎ Bottle Manifest highway (1.3.0) Downloaded 9.7KB/ 9.7KB
✔︎ Bottle highway (1.3.0) Downloaded 896.7KB/896.7KB
✔︎ Bottle Manifest imath (3.2.2) Downloaded 7.4KB/ 7.4KB
✔︎ Bottle imath (3.2.2) Downloaded 194.3KB/194.3KB
✔︎ Bottle Manifest libdeflate (1.25) Downloaded 7.8KB/ 7.8KB
✔︎ Bottle libdeflate (1.25) Downloaded 113.3KB/113.3KB
✔︎ Bottle Manifest openjph (0.26.0) Downloaded 12.2KB/ 12.2KB
✔︎ Bottle openjph (0.26.0) Downloaded 157.5KB/157.5KB
✔︎ Bottle Manifest openexr (3.4.4_1) Downloaded 15.6KB/ 15.6KB
✔︎ Bottle openexr (3.4.4_1)
rather than the new:
✔︎ cask.jws.json Downloaded 15.3MB/ 15.3MB
✔︎ formula.jws.json Downloaded 32.0MB/ 32.0MB
==> Fetching downloads for: ffmpeg-full
✔︎ 7332868699be1d72494b42c0966eca2044bb87ceb55952d79068bdff04542aa6--ffmpeg-full-8.0.1-1.bottle_manifest.json Downloaded 108.1KB/108.1KB
✔︎ 283773c4d2db4fe867419d7eea6811a6417889d78fad8871041c07f49b22d2a1--giflib-5.2.2.bottle_manifest.json Downloaded 10.8KB/ 10.8KB
✔︎ 995b62b1518a0deae8dd9c0016ee099e69ae5a0e54bed3370983a80522d71783--giflib--5.2.2.arm64_tahoe.bottle.tar.gz Downloaded 153.0KB/153.0KB
✔︎ 9c45ed9405a2a0988cb2c80afd415bcdacca07ef87ac47c277905cb96a0b611e--highway-1.3.0.bottle_manifest.json Downloaded 9.7KB/ 9.7KB
✔︎ aa8ca7b1c25da4f0dd9d753cee28095bc01fbefcb01490047e20bc6e949a599a--highway--1.3.0.arm64_tahoe.bottle.tar.gz Downloaded 896.7KB/896.7KB
✔︎ 82322ae05e389eb5b059000e61d0a7acb8176046888dc2e15344c8fd7704ade4--imath-3.2.2.bottle_manifest.json Downloaded 7.4KB/ 7.4KB
✔︎ 74571d1c82f2668cb71807eb96f5fb3b7be738c2caf8dede9226d35b4a8b83b2--imath--3.2.2.arm64_tahoe.bottle.tar.gz Downloaded 194.3KB/194.3KB
if it's easier to do that by continuing to use downloads as before: no worries, feel free to go back to that. Given you're adding downloadable_to_location here anyway I don't mind what 3 variables are used as long as we can fix the issue you've fixed here and preserve the (better) current output format. Users don't need to be told the actual filenames. If we're deduping: it's fine to just hide/lie in the output about something being downloaded for two formulae/resources/places.
Thanks again!
…nloadable, e) and Remove trailing whitespace
Thanks for the detailed feedback that makes sense! |
Brings back the downloads hash for display purposes while keeping @downloads_by_location for deduplication. Each downloadable maps to the shared future, so we fix the race condition but preserve the user-friendly output format showing bottle names instead of raw cache filenames.
Fix race condition in concurrent downloads for duplicate URLs
Root Cause:
The race condition occurs because download_queue.rb creates separate download futures for each Downloadable object. In the docbook formula, the same URL (docbook-5.2.zip) appears both as the main formula download and as a resource. Even though they point to the same file, they create different Downloadable objects, so both attempt to download simultaneously and lock the same .incomplete file.
Solution Approach:
Instead of tracking downloads by Downloadable object (which treats the formula and resource as separate downloads), I've implemented deduplication based on cached_location (the actual file path). This ensures that when multiple downloadables need the same file, they share a single download operation.
Result:
The docbook formula and its xml52 resource both need docbook-5.2.zip, but now only one download is initiated and both share it. No more lock conflicts!
When
HOMEBREW_DOWNLOAD_CONCURRENCYis enabled and a formula has the same URL listed both as the main download and as a resource (e.g., docbook), concurrent download attempts would try to lock the same.incompletefile, causing an "operation in progress" error.The issue was that
download_queue.rbcreated separate download futures for eachDownloadableobject, even when multiple downloadables shared the samecached_location(file path). This resulted in race conditions when downloading identical files.This commit fixes the issue by:
@downloads_by_locationhash to track downloads by file pathcached_locationinstead ofDownloadableobject identityWhen the same file is needed by multiple downloadables, they now share a single download operation, preventing concurrent lock attempts on the same
.incompletefile.Fixes #21425
brew lgtm(style, typechecking and tests) with your changes locally?