retryable_download: preserve partial download on network errors#22048
Merged
MikeMcQuaid merged 1 commit intoHomebrew:mainfrom Apr 20, 2026
Merged
Conversation
`RetryableDownload#fetch` cleared the download cache before every retry, which removes the `.incomplete` file that `CurlDownloadStrategy` uses for resumable downloads. As a result `--continue-at -` never kicked in: each retry restarted from zero, making installs on unstable networks effectively impossible once a bottle was large enough to never finish within one uninterrupted session. Only clear the cache when the fully-downloaded file is known-bad (`ChecksumMismatchError` or `Resource::BottleManifest::Error`). For network-level `DownloadError`, keep the partial `.incomplete` file so the next attempt can resume via `--continue-at`. If that partial was itself corrupted mid-flight, the post-download checksum check will catch it on a later iteration and clear the cache via the existing path, so the loop is self-healing. Closes Homebrew#21518. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
e474a19 to
a2d76a8
Compare
MikeMcQuaid
approved these changes
Apr 20, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
brew lgtm(style, typechecking and tests) with your changes locally?The root-cause analysis and patch were drafted with Claude (Opus 4.7) assistance. I then manually read
retryable_download.rb,download_strategy.rb, andutils/curl.rbto verify the trace, reproduced the symptom, ranbin/brew style,bin/brew typecheck, andbin/brew tests --only=utils/curllocally before committing.Summary
Fix a long-standing bug where bottle downloads on unstable networks always restart from zero instead of resuming from
.incomplete, making large bottles effectively undownloadable over flaky links.Closes #21518.
Observed symptom
On a slow/flaky connection (e.g. behind a proxy that drops TLS streams mid-transfer), the user sees:
brew install <formula>downloads N MB, then the connection drops.brewprintsRetrying download in 2s…and starts again from zero — the partial MB is gone.Failed to download resource "llvm@21").brew install …, the first request actually resumes (curl--continue-at -picks up from the current.incompletesize), gets partway further, and drops again.Net effect: every fresh
brew installinvocation downloads a slice, then throws the slice away. If no single streaming session can complete the full bottle, the install can never succeed regardless of retries, even though each attempt does make forward progress.Root cause
RetryableDownload#fetch(Library/Homebrew/retryable_download.rb) wraps the download strategy with exponential-backoff retries. Before every retry it unconditionally calls:For
CurlDownloadStrategy(Library/Homebrew/download_strategy.rb),clear_cachedoes:So the
.incompletefile — whichcurl_downloadwould otherwise pass to--continue-at -on the next attempt (Library/Homebrew/utils/curl.rb, line ~278) — is deleted before every retry. Thetry_partialmachinery added in #5421 and #11381 is effectively bypassed for any download that hits aDownloadErrormore than once. The "kill the process and restart" workaround described in #21518 works precisely becauseSIGINTbypasses this rescue path and leaves the.incompletefile in place.The three kinds of exceptions caught have different semantics, but
clear_cachetreats them identically:.incompleteat catch timeDownloadError--continue-atChecksumMismatchErrorcached_location; it's badResource::BottleManifest::ErrorChange
Only clear the cache for the "finished but bad" cases. For
DownloadErrorthe partial is kept, so the next iteration'scurl_downloadseesdestination.exist?, takes thetry_partialbranch, and passes--continue-at -:Total diff: +5 / −2 on a single file.
Self-healing against a corrupted partial
If a proxy writes garbage bytes into
.incomplete(e.g. a captive-portal HTML response), a resumed download will eventually finish with a wrong sha256. That surfaces asChecksumMismatchErroron the next iteration, which does still callclear_cache, wiping the bad partial. The subsequent attempt downloads fresh. So the loop is still self-terminating under adversarial middleware — it just doesn't give up the first 200 MB of good data to an unrelated TCP hiccup.Tests
bin/brew tests --only=utils/curl— all 65 specs pass.No dedicated spec exists for
RetryableDownload#fetch's exception-handling branches today. Happy to add one that mocksdownloadableand asserts:DownloadErrorduring#fetchdoes not callclear_cacheon retry;ChecksumMismatchErrordoes callclear_cacheon retry.Let me know if you'd like that folded into this PR or tracked separately.
Prior art / related
Open reports of the same symptom
ballodescribes the exact "kill the process and re-run and it resumes, but let brew's own retry run and it restarts from zero" behavior this PR fixes.HOMEBREW_NO_INSTALL_FROM_API=1) doesn't change how the bottle blob itself is fetched, so it doesn't resolve the underlying loop. The symptom (slow-connection installs that never finish even though each attempt makes progress) matches Option to save and resume partial download #21518 and the one this PR is submitted against.Merged PRs that built resume support but are silently defeated today
The machinery for resumable downloads already exists and works correctly for the single-attempt case. What's missing is that the outer retry loop in
RetryableDownload#fetchdeletes the.incompletefile before the next attempt, so none of the following take effect once brew's own retry kicks in:utils/curl.rb) — changed the range-support probe from HEAD to GET so servers that reject HEAD don't defeat resume. Working as intended — but only reaches the point of emitting--continue-at -if an.incompletefile is still on disk on the next attempt.curl#11381 (2021-05, merged, "Fix range requests withcurl",utils/curl.rb) — reworked howcurl_downloaddetects range support (previously it sent a range-request and checkedContent-Range; now it does a proper HEAD and looks atAccept-Ranges). Same situation: correct probe, but the partial file it would probe against on retry has already been removed.#curl_download: defaulttry_partialtofalse#13179 (2022-04, merged, "#curl_download: defaulttry_partialtofalse") — made partial-download mode opt-in to save a HEAD when it isn't useful.CurlDownloadStrategystill opts in (@try_partial = true), so bottle fetches still request resume; this PR is what finally lets that opt-in actually produce savings on the second+ retry.Recent touches to
RetryableDownloadthat did not look at this pathbottle_tmp_keg) fromformula_installer.rb. Orthogonal to.incompletefile lifetime; doesn't touchclear_cache.Both PRs touched
retryable_download.rbrecently without noticing thatclear_cacheruns unconditionally between retries.Longstanding feature requests this helps but does not fully resolve
Out of scope
The
utils/curl.rbside could additionally pass--retry-all-errors(curl ≥ 7.71) inside theretries&.positive?block so curl's own retry can survive connection drops, reducing how often the outerRetryableDownloadloop runs at all. I deliberately left that out of this PR to keep the change minimal and unambiguous — happy to open a follow-up if reviewers want it.