Skip to content

fix: retry transient errors and clean up partial files in model downloads#422

Merged
bigcat88 merged 3 commits intomainfrom
fix/download-retry-timeout
Apr 12, 2026
Merged

fix: retry transient errors and clean up partial files in model downloads#422
bigcat88 merged 3 commits intomainfrom
fix/download-retry-timeout

Conversation

@bigcat88
Copy link
Copy Markdown
Contributor

@bigcat88 bigcat88 commented Apr 9, 2026

Summary

Fixes #420comfy model download crashes with a massive ReadTimeout traceback when a server briefly stalls during a large download.

  • Generous read timeout: httpx default is 5s for everything; now set to Timeout(10s connect, 300s read) so large model downloads from slow/congested servers (e.g. CivitAI) don't fail prematurely
  • Retry with backoff: Up to 3 attempts with linear backoff (2s, 4s) for transient errors (TimeoutException, NetworkError). HTTP status errors (404, 403, etc.) fail immediately — no retry
  • Partial file cleanup: Removes partially-written files on any failure (timeout, network error, HTTP error, Ctrl-C), not just KeyboardInterrupt
  • User-friendly error messages: Raw httpx tracebacks are caught and converted to a clear DownloadException, e.g. "Download failed after 3 attempts: the server stopped sending data (read timeout)"

Before (from the issue):

ReadTimeout: The read operation timed out

(preceded by ~80 lines of httpx/httpcore/h11 internal traceback)

After:

Download error (attempt 1/3): the server stopped sending data (read timeout)
Retrying in 2s...
Download error (attempt 2/3): the server stopped sending data (read timeout)
Retrying in 4s...
DownloadException: Download failed after 3 attempts: the server stopped sending data (read timeout)
Please try again later.

Test plan

  • 19 new tests covering retry, timeout, cleanup, and error message behavior
  • All 659 existing unit tests still pass
  • ruff check and ruff format clean

…oads

The httpx default read timeout of 5 seconds is too aggressive for large
model downloads from servers under load (e.g. CivitAI). A brief server
stall causes an unhandled ReadTimeout with a massive traceback.

- Set read timeout to 300s (connect stays at 10s)
- Retry up to 3 times with linear backoff (2s, 4s) on transient
  network errors (timeouts, connection resets)
- Clean up partial files on any download failure, not just Ctrl-C
- Convert raw httpx exceptions into DownloadException with a clear
  user-facing message

Closes #420
@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Apr 9, 2026

📝 Walkthrough

Walkthrough

Adds an HTTPX-backed streamed download with configured timeouts, retry/backoff for transient httpx errors, partial-file cleanup, friendly network-error mapping, and adjusted KeyboardInterrupt handling. download_file now retries up to configured attempts, preserves exception chaining, and exposes helpers for cleanup and error messaging.

Changes

Cohort / File(s) Summary
Download resilience implementation
comfy_cli/file_utils.py
Introduced _download_file_httpx using httpx.stream with Timeout(read=300, connect=10), added retry loop for transient httpx exceptions with exponential backoff, _cleanup_partial and _friendly_network_error helpers, improved KeyboardInterrupt-aware cleanup, and ensured exception chaining via __cause__.
Network behavior test coverage
tests/test_file_utils_network.py
Added tests (and httpx usage) that validate timeout config, transient-exception retries/backoff and eventual success, retry exhaustion behavior, immediate failure on non-200 responses, partial-file cleanup semantics, preservation of pre-existing files, exception chaining, and KeyboardInterrupt prompt/cleanup flows.

Sequence Diagram(s)

sequenceDiagram
    participant Caller as Caller
    participant DF as download_file()
    participant HTTPx as httpx.stream()
    participant FS as FileSystem
    participant Cleanup as _cleanup_partial()
    participant Map as _friendly_network_error()

    Caller->>DF: request download(url, dest)
    loop attempts (up to N with backoff)
        DF->>HTTPx: httpx.stream(..., timeout=Timeout(read=300,connect=10))
        alt response 200 OK
            HTTPx-->>DF: response stream
            DF->>FS: open dest & write chunks
            DF-->>Caller: return success
        else transient network error
            HTTPx-->>DF: throws Timeout/Network/Protocol/Proxy error
            DF->>Cleanup: remove partial file (if opened)
            Cleanup-->>DF: cleaned
            DF->>DF: sleep(backoff) then retry
        else non-200 status
            HTTPx-->>DF: response (non-200)
            DF->>Cleanup: remove partial file (if opened)
            DF->>Map: map status -> friendly message
            DF-->>Caller: raise DownloadException (no retry)
        else KeyboardInterrupt during write
            DF->>Cleanup: optionally remove partial file (based on prompt)
            DF-->>Caller: re-raise KeyboardInterrupt
        end
    end
    alt attempts exhausted
        DF->>Map: map last error -> friendly message
        DF-->>Caller: raise DownloadException (includes original __cause__)
    end
Loading

When networks misbehave, retries rhyme with time — try once more, then stop on chime.

🚥 Pre-merge checks | ✅ 2
✅ Passed checks (2 passed)
Check name Status Explanation
Linked Issues check ✅ Passed PR successfully addresses all objectives from issue #420: handles ReadTimeout/network errors with retry logic (up to 3 attempts), provides friendly DownloadException messages replacing raw tracebacks, cleans up partial files, and implements practical timeout settings (10s connect, 300s read).
Out of Scope Changes check ✅ Passed All changes are scoped to improving download robustness: modifications to file_utils.py implement retry/timeout/cleanup logic and corresponding tests validate these behaviors; no unrelated modifications detected.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/download-retry-timeout
✨ Simplify code
  • Create PR with simplified code
  • Commit simplified code in branch fix/download-retry-timeout

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@codecov
Copy link
Copy Markdown

codecov bot commented Apr 9, 2026

Codecov Report

❌ Patch coverage is 95.31250% with 3 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
comfy_cli/file_utils.py 95.31% 3 Missing ⚠️
@@            Coverage Diff             @@
##             main     #422      +/-   ##
==========================================
+ Coverage   76.07%   76.36%   +0.29%     
==========================================
  Files          35       35              
  Lines        4213     4261      +48     
==========================================
+ Hits         3205     3254      +49     
+ Misses       1008     1007       -1     
Files with missing lines Coverage Δ
comfy_cli/file_utils.py 88.38% <95.31%> (+3.00%) ⬆️
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@bigcat88 bigcat88 marked this pull request as ready for review April 12, 2026 06:55
@dosubot dosubot bot added the size:L This PR changes 100-499 lines, ignoring generated files. label Apr 12, 2026
…load retry

Broaden transient exception coverage to include ProtocolError and ProxyError.
Track whether the output file was actually opened so that cleanup only runs
when a partial download exists — preventing deletion of unrelated pre-existing
files on HTTP errors, connection failures, or early KeyboardInterrupt. Restore
the user confirmation prompt on Ctrl+C (the original commit removed it).
Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@comfy_cli/file_utils.py`:
- Around line 207-213: The download loop currently writes directly to
local_filepath (using response.iter_bytes() and ui.show_progress) which can
truncate or delete a prior good file on failure; change it to write to a temp
sibling (e.g., local_filepath + ".part"), stream data into that temp file, and
on successful completion atomically replace the target with
os.replace(temp_path, local_filepath); on exceptions or cancellations ensure you
only unlink the temp file and never the original local_filepath. Apply the same
pattern for the similar block handling the other download (lines 223-246) so
both uses of response.iter_bytes(), ui.show_progress and the final file handling
perform write-to-temp then atomic replace/cleanup.
- Around line 233-252: The code currently only catches _TRANSIENT_EXCEPTIONS;
add an additional except block for httpx.HTTPError (or the package-aliased
HTTPError) after the transient/KeyboardInterrupt handlers that does not retry:
call _cleanup_partial(local_filepath), set last_exc = exc, and raise
DownloadException(...) from exc (using _friendly_network_error(exc) and
preserving the same message format used in the final raise) so non-transient
httpx errors are wrapped and cleaned up properly; ensure this new except sits
before the final raise and references DownloadException, _cleanup_partial,
_friendly_network_error, last_exc and _DOWNLOAD_MAX_RETRIES.
- Around line 195-198: The timeout wrapper currently calls response.read()
(inside httpx.stream) and treats timeouts as HTTPError and also passes bytes
into guess_status_code_reason; update the failure path in the httpx.stream block
so it does not pass raw bytes to guess_status_code_reason (use
response.content.decode('utf-8', errors='replace') or otherwise convert bytes to
str) and ensure timeouts are handled as timeouts: catch/propagate
httpx.ReadTimeout (or httpx.TimeoutException) instead of classifying it as an
HTTPError so the retry logic behaves correctly; update uses of
guess_status_code_reason, DownloadException, httpx.stream and _DOWNLOAD_TIMEOUT
accordingly.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: c59f9e41-4bef-416c-bafe-bf3c9b04f781

📥 Commits

Reviewing files that changed from the base of the PR and between 514b1b3 and b6d4796.

📒 Files selected for processing (2)
  • comfy_cli/file_utils.py
  • tests/test_file_utils_network.py

Comment thread comfy_cli/file_utils.py
Comment thread comfy_cli/file_utils.py
Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (3)
comfy_cli/file_utils.py (3)

229-279: ⚠️ Potential issue | 🟠 Major

Write into a .part file before touching the destination.

Line 229 opens local_filepath in "wb" before the download is known-good. If the stream fails after that, Lines 264-278 clean up the same path, which deletes the user's last good artifact instead of only the partial one. A temp sibling keeps the broom from sweeping too far.

🧹 Safer pattern
     local_filepath.parent.mkdir(parents=True, exist_ok=True)
+    temp_path = local_filepath.with_name(f"{local_filepath.name}.part")

     if downloader == "aria2":
         return _download_file_aria2(url, local_filepath, headers)

     last_exc: Exception | None = None
     state: dict = {"file_opened": False}

     for attempt in range(_DOWNLOAD_MAX_RETRIES):
         state["file_opened"] = False
         try:
-            _download_file_httpx(url, local_filepath, headers, state=state)
+            _download_file_httpx(url, temp_path, headers, state=state)
+            temp_path.replace(local_filepath)
             return
         except _TRANSIENT_EXCEPTIONS as exc:
             last_exc = exc
             if state["file_opened"]:
-                _cleanup_partial(local_filepath)
+                _cleanup_partial(temp_path)
             if attempt < _DOWNLOAD_MAX_RETRIES - 1:
                 wait = _DOWNLOAD_RETRY_BACKOFF * (attempt + 1)
                 print(f"Download error (attempt {attempt + 1}/{_DOWNLOAD_MAX_RETRIES}): {_friendly_network_error(exc)}")
                 print(f"Retrying in {wait}s...")
                 time.sleep(wait)
         except KeyboardInterrupt:
             if state["file_opened"]:
                 delete_eh = ui.prompt_confirm_action("Download interrupted, cleanup files?", True)
                 if delete_eh:
-                    _cleanup_partial(local_filepath)
+                    _cleanup_partial(temp_path)
             raise
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@comfy_cli/file_utils.py` around lines 229 - 279, The download routine opens
the final destination (local_filepath) before the stream is verified which risks
deleting a previously valid file on failures; change _download_file_httpx (and
its caller download_file) to write to a temporary sibling (e.g.,
local_filepath.with_suffix(local_filepath.suffix + ".part") or
local_filepath.with_name(local_filepath.name + ".part")) and only rename/move
the temp to local_filepath on successful completion; ensure state["file_opened"]
and _cleanup_partial track/operate on the .part temp path so cleanup and
KeyboardInterrupt prompts never delete the original destination, and keep
retry/aria2 logic unchanged.

255-285: ⚠️ Potential issue | 🟠 Major

Wrap the remaining httpx.HTTPError cases too.

Lines 260-279 only normalize _TRANSIENT_EXCEPTIONS. httpx.TooManyRedirects, httpx.UnsupportedProtocol, and other non-transient httpx.HTTPError subclasses still bubble out raw, so the traceback imp can still escape the bottle. Add a non-retrying except httpx.HTTPError that cleans up any partial output and raises DownloadException.

🩹 Proposed fix
         except _TRANSIENT_EXCEPTIONS as exc:
             last_exc = exc
             if state["file_opened"]:
                 _cleanup_partial(local_filepath)
             if attempt < _DOWNLOAD_MAX_RETRIES - 1:
                 wait = _DOWNLOAD_RETRY_BACKOFF * (attempt + 1)
                 print(f"Download error (attempt {attempt + 1}/{_DOWNLOAD_MAX_RETRIES}): {_friendly_network_error(exc)}")
                 print(f"Retrying in {wait}s...")
                 time.sleep(wait)
+        except httpx.HTTPError as exc:
+            if state["file_opened"]:
+                _cleanup_partial(local_filepath)
+            raise DownloadException(
+                f"Download failed: {_friendly_network_error(exc)}\nPlease try again later."
+            ) from exc
         except KeyboardInterrupt:
             if state["file_opened"]:
                 delete_eh = ui.prompt_confirm_action("Download interrupted, cleanup files?", True)
                 if delete_eh:
                     _cleanup_partial(local_filepath)
What exception classes in HTTPX are subclasses of `httpx.HTTPError` but not covered by `TimeoutException`, `NetworkError`, `ProtocolError`, or `ProxyError` (for example `TooManyRedirects` or `UnsupportedProtocol`)?
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@comfy_cli/file_utils.py` around lines 255 - 285, The retry loop only catches
_TRANSIENT_EXCEPTIONS, so other httpx.HTTPError subclasses (e.g.,
httpx.TooManyRedirects, httpx.UnsupportedProtocol) escape; add a non-retrying
except httpx.HTTPError block inside the for-loop after the transient-except
that: checks state["file_opened"] and calls _cleanup_partial(local_filepath) if
true (mirroring the KeyboardInterrupt cleanup), then raises a DownloadException
wrapping the error message (use _friendly_network_error(exc) for the message)
and chain the original exception; reference the retry loop around
_download_file_httpx, variables state and local_filepath, helper
_cleanup_partial, and exception type DownloadException.

217-220: ⚠️ Potential issue | 🟠 Major

Don't let response.read() turn HTTP errors into retries.

Line 219 still reads the error body inside the streaming context without a timeout guard. A slow 403/404 body can raise ReadTimeout, get caught by the transient retry loop, and violate the “HTTP status errors fail immediately” contract — a tiny timeout goblin with a big retry rhyme. Catch httpx.TimeoutException around the body read, or skip the body entirely when building the status message.

🩹 Proposed fix
         if response.status_code != 200:
-            status_reason = guess_status_code_reason(response.status_code, response.read())
+            error_body = ""
+            try:
+                error_body = response.read().decode("utf-8", errors="replace")
+            except httpx.TimeoutException:
+                pass
+            status_reason = guess_status_code_reason(response.status_code, error_body)
             raise DownloadException(f"Failed to download file.\n{status_reason}")
In HTTPX, when using `with httpx.stream(...) as response:`, can `response.read()` raise `ReadTimeout` / `TimeoutException` while consuming the response body, and is `ReadTimeout` a subclass of `TimeoutException`?
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@comfy_cli/file_utils.py` around lines 217 - 220, The current error-path reads
the response body inside the httpx.stream context which can raise
httpx.TimeoutException and cause transient retries; change the error handling in
the httpx.stream block where you call response.read() (used to build the status
message passed to guess_status_code_reason and the DownloadException) to either
skip reading the body or wrap the body read in a try/except catching
httpx.TimeoutException (and any ReadTimeout) and fall back to a safe placeholder
(e.g. empty bytes or a short message) so the DownloadException is raised
immediately; update the block that constructs status_reason (calling
guess_status_code_reason) to use the safe body value and ensure
DownloadException still contains useful context.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@comfy_cli/file_utils.py`:
- Around line 229-279: The download routine opens the final destination
(local_filepath) before the stream is verified which risks deleting a previously
valid file on failures; change _download_file_httpx (and its caller
download_file) to write to a temporary sibling (e.g.,
local_filepath.with_suffix(local_filepath.suffix + ".part") or
local_filepath.with_name(local_filepath.name + ".part")) and only rename/move
the temp to local_filepath on successful completion; ensure state["file_opened"]
and _cleanup_partial track/operate on the .part temp path so cleanup and
KeyboardInterrupt prompts never delete the original destination, and keep
retry/aria2 logic unchanged.
- Around line 255-285: The retry loop only catches _TRANSIENT_EXCEPTIONS, so
other httpx.HTTPError subclasses (e.g., httpx.TooManyRedirects,
httpx.UnsupportedProtocol) escape; add a non-retrying except httpx.HTTPError
block inside the for-loop after the transient-except that: checks
state["file_opened"] and calls _cleanup_partial(local_filepath) if true
(mirroring the KeyboardInterrupt cleanup), then raises a DownloadException
wrapping the error message (use _friendly_network_error(exc) for the message)
and chain the original exception; reference the retry loop around
_download_file_httpx, variables state and local_filepath, helper
_cleanup_partial, and exception type DownloadException.
- Around line 217-220: The current error-path reads the response body inside the
httpx.stream context which can raise httpx.TimeoutException and cause transient
retries; change the error handling in the httpx.stream block where you call
response.read() (used to build the status message passed to
guess_status_code_reason and the DownloadException) to either skip reading the
body or wrap the body read in a try/except catching httpx.TimeoutException (and
any ReadTimeout) and fall back to a safe placeholder (e.g. empty bytes or a
short message) so the DownloadException is raised immediately; update the block
that constructs status_reason (calling guess_status_code_reason) to use the safe
body value and ensure DownloadException still contains useful context.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 260a3f27-e4c8-4f1a-a005-a6affb1f28cc

📥 Commits

Reviewing files that changed from the base of the PR and between b6d4796 and d0007c6.

📒 Files selected for processing (2)
  • comfy_cli/file_utils.py
  • tests/test_file_utils_network.py

Wrap the error-body read in a try/except so that a stalling server
cannot turn an HTTP error (e.g. 403) into a retried ReadTimeout.
Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (2)
comfy_cli/file_utils.py (2)

233-241: ⚠️ Potential issue | 🟠 Major

Write into a sibling .part file, then replace on success.

Line 233 still truncates local_filepath before the transfer has actually landed. If the stream fails mid-flight, the cleanup path deletes the target itself, so a previously good file can vanish in a puff of smoke. Stream into a temp sibling and replace() only after success—less smash, more stash.

🧹 Safer pattern
     local_filepath.parent.mkdir(parents=True, exist_ok=True)
+    temp_path = local_filepath.with_name(f"{local_filepath.name}.part")

     if downloader == "aria2":
         return _download_file_aria2(url, local_filepath, headers)

     last_exc: Exception | None = None
     state: dict = {"file_opened": False}

     for attempt in range(_DOWNLOAD_MAX_RETRIES):
         state["file_opened"] = False
         try:
-            _download_file_httpx(url, local_filepath, headers, state=state)
+            _download_file_httpx(url, temp_path, headers, state=state)
+            temp_path.replace(local_filepath)
             return
         except _TRANSIENT_EXCEPTIONS as exc:
             last_exc = exc
             if state["file_opened"]:
-                _cleanup_partial(local_filepath)
+                _cleanup_partial(temp_path)
             ...
         except KeyboardInterrupt:
             if state["file_opened"]:
                 delete_eh = ui.prompt_confirm_action("Download interrupted, cleanup files?", True)
                 if delete_eh:
-                    _cleanup_partial(local_filepath)
+                    _cleanup_partial(temp_path)
             raise

Also applies to: 268-269, 279-282

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@comfy_cli/file_utils.py` around lines 233 - 241, The current download writes
directly to local_filepath (opening it at top of the block and setting
state["file_opened"]) and can truncate the target if the stream fails; change
the pattern to stream into a sibling temporary file (e.g., local_filepath +
".part") using the same ui.show_progress(response.iter_bytes(), ...)/f.write
loop, only call Path(local_filepath).replace(temp_path) after the loop completes
successfully, and ensure state["file_opened"] (or any cleanup logic) refers to
the final target only after a successful replace so partial failures don’t
remove an existing good file; apply the same pattern to the other occurrences
around the code referenced (lines with opening local_filepath at the other noted
locations).

264-289: ⚠️ Potential issue | 🟠 Major

Wrap the remaining httpx.HTTPError cases instead of leaking raw tracebacks.

Only transient transport errors are normalized here. httpx can still raise non-retriable HTTPError subclasses such as TooManyRedirects, UnsupportedProtocol, or DecodingError, and those will currently escape as raw library exceptions rather than DownloadException. Add a non-retrying except httpx.HTTPError that cleans up and re-raises a friendly message.

🪤 Non-retry wrapper
         except _TRANSIENT_EXCEPTIONS as exc:
             last_exc = exc
             if state["file_opened"]:
                 _cleanup_partial(local_filepath)
             if attempt < _DOWNLOAD_MAX_RETRIES - 1:
                 wait = _DOWNLOAD_RETRY_BACKOFF * (attempt + 1)
                 print(f"Download error (attempt {attempt + 1}/{_DOWNLOAD_MAX_RETRIES}): {_friendly_network_error(exc)}")
                 print(f"Retrying in {wait}s...")
                 time.sleep(wait)
+        except httpx.HTTPError as exc:
+            if state["file_opened"]:
+                _cleanup_partial(local_filepath)
+            raise DownloadException(f"Failed to download file: {_friendly_network_error(exc)}") from exc
         except KeyboardInterrupt:
             ...
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@comfy_cli/file_utils.py` around lines 264 - 289, The catch only handles
_TRANSIENT_EXCEPTIONS, so other httpx.HTTPError subclasses can leak raw
tracebacks; add a non-retry except httpx.HTTPError block (after the transient
except and before KeyboardInterrupt/raise) that mirrors the transient cleanup
logic: if state["file_opened"] call _cleanup_partial(local_filepath), then raise
a DownloadException (using _friendly_network_error(exc) in the message) from the
caught exc so non-retriable httpx errors are normalized; reference
functions/vars to modify: _download_file_httpx, _TRANSIENT_EXCEPTIONS,
state["file_opened"], _cleanup_partial, DownloadException, and
_friendly_network_error.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@comfy_cli/file_utils.py`:
- Around line 233-241: Wrap the local file open/write block inside the retry
loop (the with open(local_filepath, "wb") ... f.write(data) section in
comfy_cli/file_utils.py) with an except OSError handler that, if state is not
None and state.get("file_opened") is True, calls
_cleanup_partial(local_filepath) to remove the truncated partial file, then
re-raises a DownloadException that wraps/contains the original OSError; apply
the same change to the similar write block around lines 275–289 so both
write-failure paths clean up partial files and raise a friendly
DownloadException instead of letting OSError escape raw.

---

Duplicate comments:
In `@comfy_cli/file_utils.py`:
- Around line 233-241: The current download writes directly to local_filepath
(opening it at top of the block and setting state["file_opened"]) and can
truncate the target if the stream fails; change the pattern to stream into a
sibling temporary file (e.g., local_filepath + ".part") using the same
ui.show_progress(response.iter_bytes(), ...)/f.write loop, only call
Path(local_filepath).replace(temp_path) after the loop completes successfully,
and ensure state["file_opened"] (or any cleanup logic) refers to the final
target only after a successful replace so partial failures don’t remove an
existing good file; apply the same pattern to the other occurrences around the
code referenced (lines with opening local_filepath at the other noted
locations).
- Around line 264-289: The catch only handles _TRANSIENT_EXCEPTIONS, so other
httpx.HTTPError subclasses can leak raw tracebacks; add a non-retry except
httpx.HTTPError block (after the transient except and before
KeyboardInterrupt/raise) that mirrors the transient cleanup logic: if
state["file_opened"] call _cleanup_partial(local_filepath), then raise a
DownloadException (using _friendly_network_error(exc) in the message) from the
caught exc so non-retriable httpx errors are normalized; reference
functions/vars to modify: _download_file_httpx, _TRANSIENT_EXCEPTIONS,
state["file_opened"], _cleanup_partial, DownloadException, and
_friendly_network_error.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 84443ad4-06d2-4a89-bda0-811042bc847a

📥 Commits

Reviewing files that changed from the base of the PR and between d0007c6 and e589aeb.

📒 Files selected for processing (1)
  • comfy_cli/file_utils.py

Comment thread comfy_cli/file_utils.py
Comment on lines +233 to +241
with open(local_filepath, "wb") as f:
if state is not None:
state["file_opened"] = True
for data in ui.show_progress(
response.iter_bytes(),
total,
description=description,
):
f.write(data)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

# First, locate and examine the file
find . -name "file_utils.py" -path "*/comfy_cli/*" | head -5

Repository: Comfy-Org/comfy-cli

Length of output: 88


🏁 Script executed:

# Get the file size to understand scope
wc -l comfy_cli/file_utils.py

Repository: Comfy-Org/comfy-cli

Length of output: 90


🏁 Script executed:

# Read the specific lines and surrounding context (220-300 to get full picture)
sed -n '220,300p' comfy_cli/file_utils.py | cat -n

Repository: Comfy-Org/comfy-cli

Length of output: 4076


🏁 Script executed:

# Search for _cleanup_partial function to verify it exists
rg -n "_cleanup_partial" comfy_cli/file_utils.py

Repository: Comfy-Org/comfy-cli

Length of output: 230


🏁 Script executed:

# Search for OSError handling in the file
rg -n "OSError" comfy_cli/file_utils.py

Repository: Comfy-Org/comfy-cli

Length of output: 110


🏁 Script executed:

# Search for state["file_opened"] usage patterns
rg -n 'state\["file_opened"\]' comfy_cli/file_utils.py

Repository: Comfy-Org/comfy-cli

Length of output: 319


🏁 Script executed:

# Search for _TRANSIENT_EXCEPTIONS definition
rg -n "_TRANSIENT_EXCEPTIONS" comfy_cli/file_utils.py -A 5 -B 2

Repository: Comfy-Org/comfy-cli

Length of output: 1187


🏁 Script executed:

# Look at the complete exception handling block (lines 245-290)
sed -n '245,290p' comfy_cli/file_utils.py | cat -n

Repository: Comfy-Org/comfy-cli

Length of output: 2429


Catch local file I/O errors before they file away without a trace.

Lines 233 and 241 can raise OSError (disk full, permission changes, flaky filesystem) after the destination is truncated—a write failure that's bound to get a rise out of your users. Currently that error escapes raw and leaves a half-baked file sitting around. Add an except OSError handler in the retry loop that cleans up via _cleanup_partial() when state["file_opened"] is true, then re-raises a friendly DownloadException. Otherwise the cleanup promise doesn't write out to completion.

💾 Handle write failures without leaving partial files behind
         except KeyboardInterrupt:
             if state["file_opened"]:
                 delete_eh = ui.prompt_confirm_action("Download interrupted, cleanup files?", True)
                 if delete_eh:
                     _cleanup_partial(local_filepath)
             raise
+        except OSError as exc:
+            if state["file_opened"]:
+                _cleanup_partial(local_filepath)
+            raise DownloadException(f"Failed to write downloaded file: {exc}") from exc

Also applies to: 275–289

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@comfy_cli/file_utils.py` around lines 233 - 241, Wrap the local file
open/write block inside the retry loop (the with open(local_filepath, "wb") ...
f.write(data) section in comfy_cli/file_utils.py) with an except OSError handler
that, if state is not None and state.get("file_opened") is True, calls
_cleanup_partial(local_filepath) to remove the truncated partial file, then
re-raises a DownloadException that wraps/contains the original OSError; apply
the same change to the similar write block around lines 275–289 so both
write-failure paths clean up partial files and raise a friendly
DownloadException instead of letting OSError escape raw.

@bigcat88 bigcat88 merged commit b414616 into main Apr 12, 2026
15 checks passed
@bigcat88 bigcat88 deleted the fix/download-retry-timeout branch April 12, 2026 12:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size:L This PR changes 100-499 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Potential bug] Error crush during download

1 participant