Skip to content

Support resuming from binary caches that don't support ranged requests#445

Merged
edolstra merged 1 commit intomainfrom
resume-unranged-downloads
May 6, 2026
Merged

Support resuming from binary caches that don't support ranged requests#445
edolstra merged 1 commit intomainfrom
resume-unranged-downloads

Conversation

@edolstra
Copy link
Copy Markdown
Collaborator

@edolstra edolstra commented May 4, 2026

Motivation

We just start over and discard the data we've already received.

Context

This fixes resuming NAR downloads from binary caches that don't support ranged requests.

Summary by CodeRabbit

  • Bug Fixes
    • Prevented duplicate data when resuming transfers by tracking per-response progress and discarding bytes already written.
    • Reset per-response progress on new responses to ensure accurate resume behavior.
    • Only allow resuming when server byte-range support is confirmed.
    • Disable transparent decompression for partial-range uploads/downloads.
    • Disable retries when streaming handlers are used with non-identity content encodings.

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 4, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 63bf9a07-8ca9-4041-9fa6-48e442c747d7

📥 Commits

Reviewing files that changed from the base of the PR and between dbd6993 and 6839d85.

📒 Files selected for processing (1)
  • src/libstore/filetransfer.cc

📝 Walkthrough

Walkthrough

Adds per-response state to TransferItem (requestRange flag, bytesReceived counter), resets per-response fields on new HTTP responses/requests, updates sink handling to discard duplicate data across retries, tightens retry gating for range-resume and compressed responses, and conditions transparent decompression and CURLOPT_RESUME_FROM_LARGE on requestRange.

Changes

Transfer/resume / per-response state

Layer / File(s) Summary
Data Shape
src/libstore/filetransfer.cc
Adds requestRange flag and bytesReceived counter to TransferItem and initializes/resets per-request and per-response fields.
Core Implementation
src/libstore/filetransfer.cc
Reset bytesReceived on new HTTP status line and on request init; update finalSink handling to trim/ignore bytes that would duplicate previously written data and to update bytesReceived on successful writes.
Wiring / Options
src/libstore/filetransfer.cc
Enable transparent decompression only when not using range requests and not uploading (!requestRange && !request.data); apply CURLOPT_RESUME_FROM_LARGE only when requestRange is true; ensure resume-from behavior uses requestRange.
Retry Logic
src/libstore/filetransfer.cc
Tighten retry gating: disallow retries when dataCallback is active with non-identity Content-Encoding; when retrying with non-zero writtenToSink, set requestRange only if acceptRanges is true; ensure resumption occurs only when range requests are active.
Tests / Documentation No test or doc files changed in this diff.

Sequence Diagram(s)

sequenceDiagram
    participant Client
    participant TransferItem
    participant Server
    participant Sink

    Client->>TransferItem: start request (may set requestRange)
    TransferItem->>Server: send HTTP request (with Range if requestRange)
    Server-->>TransferItem: HTTP response (status, headers, body)
    TransferItem->>TransferItem: parse status line -> reset bytesReceived, acceptRanges, hasContentEncoding
    alt hasContentEncoding && dataCallback active
        TransferItem->>Client: block retries (no retry)
    end
    TransferItem->>Sink: finalSink writes bytes
    Sink-->>TransferItem: writtenToSink update
    TransferItem->>TransferItem: bytesReceived += bytesWritten
    alt error / retry needed
        TransferItem->>TransferItem: if writtenToSink>0 and acceptRanges set requestRange=true
        TransferItem->>Server: retry (with Range if requestRange)
        Note right of Server: New response may overlap previous written bytes
        TransferItem->>TransferItem: discard/trim duplicate bytes based on bytesReceived vs writtenToSink
        TransferItem->>Sink: continue writing remaining bytes
    end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

"I nibbled bytes and tracked each hop,
Reset my paws when headers pop,
Skip the crumbs the last attempt left,
Resume with ranges — tidy and deft.
🐇✨"

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title directly and specifically describes the main change: adding support for resuming downloads from binary caches without range request support, which aligns with the code changes adding per-response state tracking and retry logic.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch resume-unranged-downloads

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 4, 2026

📝 Walkthrough

Walkthrough

The file transfer implementation now tracks bytes received across retry attempts and introduces a request range flag. It prevents duplicate data writing by comparing previous and current received bytes, resets counters on new HTTP responses, and ties decompression behavior to range-request state rather than first-attempt checks. Retry eligibility is tightened to prevent retries when dataCallback meets non-identity Content-Encoding, and range requests are only set on retry if supported.

Changes

HTTP Range Request and Deduplication Logic

Layer / File(s) Summary
State Tracking
src/libstore/filetransfer.cc (lines 126–139)
TransferItem adds bytesReceived counter and requestRange flag to coordinate deduplication and range request behavior across retries.
Response Reset
src/libstore/filetransfer.cc (lines 317–320)
bytesReceived is reset to zero when a new HTTP response status line arrives, clearing per-response state.
Deduplication Logic
src/libstore/filetransfer.cc (lines 184–195)
Sink write path increments bytesReceived and discards data that was already written in prior attempts by comparing against writtenToSink, trimming or skipping the chunk to prevent duplicates.
Decompression Condition
src/libstore/filetransfer.cc (lines 510–513)
Transparent decompression enabling is now gated on requestRange and download-only state instead of checking writtenToSink == 0 on first attempt.
Retry Eligibility & Range Control
src/libstore/filetransfer.cc (lines 767–774)
Retries are blocked when dataCallback coexists with non-identity Content-Encoding; range requests are set only on retry if acceptRanges is true, preventing inappropriate range requests.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~22 minutes

Poem

🐰 A rabbit hops through ranges true,
Deduplication sees what's new,
No bytes repeat, no data lost,
Smart retries win without the cost!

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately reflects the main objective of the pull request - enabling resumable downloads from binary caches without HTTP range request support.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch resume-unranged-downloads

Review rate limit: 2/5 reviews remaining, refill in 35 minutes and 36 seconds.

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 4, 2026

@github-actions github-actions Bot temporarily deployed to pull request May 4, 2026 15:39 Inactive
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/libstore/filetransfer.cc`:
- Around line 185-194: The duplicate-trimming logic incorrectly compares the
response-local bytesReceived (which resets on a resumed 206 response) against
the global writtenToSink, causing valid resumed bytes to be dropped; change the
comparison to work in absolute file-offset space by adding the response's
starting offset to prevReceived (i.e., compute absolutePrev = responseStart +
prevReceived or use the existing resume/expected offset variable) and only trim
data when absolutePrev < writtenToSink, calculating the trim amount as
writtenToSink - absolutePrev and slicing data accordingly; update the block that
references bytesReceived, prevReceived, writtenToSink, and data to use this
absolute-offset logic so ranged (206) retries do not discard bytes.
- Around line 510-513: The condition that sets CURLOPT_ACCEPT_ENCODING is
inverted: currently it enables transparent decompression when requestRange is
true, but the comment and semantics require skipping decompression for range
requests; change the conditional that calls curl_easy_setopt(req,
CURLOPT_ACCEPT_ENCODING, "") to run only when requestRange is false and
request.data is false (i.e., use if (!requestRange && !request.data)) so
byte-range downloads are not decompressed; update the surrounding comment if
needed to reflect this behavior and ensure the check references the same
requestRange and request.data symbols used in the surrounding code.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 5ed76e76-dde5-4967-ab79-cd73d33754ab

📥 Commits

Reviewing files that changed from the base of the PR and between d461a35 and 9cc82b9.

📒 Files selected for processing (1)
  • src/libstore/filetransfer.cc

Comment thread src/libstore/filetransfer.cc
Comment thread src/libstore/filetransfer.cc Outdated
@cole-h cole-h self-requested a review May 5, 2026 15:14
@edolstra edolstra force-pushed the resume-unranged-downloads branch 2 times, most recently from 08118a0 to dbd6993 Compare May 5, 2026 18:05
@github-actions github-actions Bot temporarily deployed to pull request May 5, 2026 18:11 Inactive
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
src/libstore/filetransfer.cc (1)

772-781: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Differentiate real range resumes from restart-and-skip retries.

When acceptRanges is false, requestRange stays false, but this still warns "retrying from offset ...". On the non-ranged-cache path, the next attempt restarts from byte 0 and only skips duplicates locally, so the message is misleading.

💡 Proposed fix
                     if (writtenToSink) {
                         if (acceptRanges)
                             requestRange = true;
-                        warn(
-                            "%s; retrying from offset %d in %d ms (attempt %d/%d)",
-                            exc.message(),
-                            writtenToSink,
-                            ms,
-                            attempt,
-                            fileTransfer.settings.tries);
+                        if (requestRange)
+                            warn(
+                                "%s; retrying from offset %d in %d ms (attempt %d/%d)",
+                                exc.message(),
+                                writtenToSink,
+                                ms,
+                                attempt,
+                                fileTransfer.settings.tries);
+                        else
+                            warn(
+                                "%s; retrying in %d ms and skipping the first %d bytes locally (attempt %d/%d)",
+                                exc.message(),
+                                ms,
+                                writtenToSink,
+                                attempt,
+                                fileTransfer.settings.tries);
                     } else {
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/libstore/filetransfer.cc` around lines 772 - 781, The warning message is
misleading when acceptRanges is false because requestRange stays false but the
code logs "retrying from offset X" even though the next attempt restarts from 0
and skips duplicates locally; update the branch that handles writtenToSink to
check acceptRanges and emit two different warn messages: when acceptRanges is
true keep the existing "retrying from offset %d ..." message and when
acceptRanges is false log something like "retrying by restarting from the
beginning and skipping %d bytes locally ..." (use writtenToSink and
fileTransfer.settings.tries in the message), and ensure this logic is placed
around the existing requestRange/acceptRanges handling so the correct message
corresponds to requestRange being true or false.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/libstore/filetransfer.cc`:
- Line 139: The bytesReceived counter (curl_off_t bytesReceived) isn't reset on
each retry, so non-HTTP transfers reuse the previous attempt's value and can
re-send sunk bytes to dataCallback; fix this by resetting bytesReceived = 0 at
the start of every new download attempt (i.e., inside the retry/attempt loop or
immediately before the logic that handles each attempt) so both the
getHTTPStatus() != 0 and getHTTPStatus() == 0 code paths use a fresh counter;
reference the bytesReceived variable, getHTTPStatus(), and dataCallback to
locate where to add the reset.

---

Outside diff comments:
In `@src/libstore/filetransfer.cc`:
- Around line 772-781: The warning message is misleading when acceptRanges is
false because requestRange stays false but the code logs "retrying from offset
X" even though the next attempt restarts from 0 and skips duplicates locally;
update the branch that handles writtenToSink to check acceptRanges and emit two
different warn messages: when acceptRanges is true keep the existing "retrying
from offset %d ..." message and when acceptRanges is false log something like
"retrying by restarting from the beginning and skipping %d bytes locally ..."
(use writtenToSink and fileTransfer.settings.tries in the message), and ensure
this logic is placed around the existing requestRange/acceptRanges handling so
the correct message corresponds to requestRange being true or false.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: bac9bc7f-7016-41a9-b784-88112531127f

📥 Commits

Reviewing files that changed from the base of the PR and between 08118a0 and dbd6993.

📒 Files selected for processing (1)
  • src/libstore/filetransfer.cc

Comment thread src/libstore/filetransfer.cc
We just start over and discard the data we've already received.
@edolstra edolstra force-pushed the resume-unranged-downloads branch from dbd6993 to 6839d85 Compare May 6, 2026 14:07
@github-actions github-actions Bot temporarily deployed to pull request May 6, 2026 14:14 Inactive
@edolstra edolstra added this pull request to the merge queue May 6, 2026
Merged via the queue into main with commit 8596db3 May 6, 2026
31 checks passed
@edolstra edolstra deleted the resume-unranged-downloads branch May 6, 2026 15:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants