fix: retry transient errors and clean up partial files in model downloads by bigcat88 · Pull Request #422 · Comfy-Org/comfy-cli

bigcat88 · 2026-04-09T16:26:37Z

Summary

Fixes #420 — comfy model download crashes with a massive ReadTimeout traceback when a server briefly stalls during a large download.

Generous read timeout: httpx default is 5s for everything; now set to Timeout(10s connect, 300s read) so large model downloads from slow/congested servers (e.g. CivitAI) don't fail prematurely
Retry with backoff: Up to 3 attempts with linear backoff (2s, 4s) for transient errors (TimeoutException, NetworkError). HTTP status errors (404, 403, etc.) fail immediately — no retry
Partial file cleanup: Removes partially-written files on any failure (timeout, network error, HTTP error, Ctrl-C), not just KeyboardInterrupt
User-friendly error messages: Raw httpx tracebacks are caught and converted to a clear DownloadException, e.g. "Download failed after 3 attempts: the server stopped sending data (read timeout)"

Before (from the issue):

ReadTimeout: The read operation timed out

(preceded by ~80 lines of httpx/httpcore/h11 internal traceback)

After:

Download error (attempt 1/3): the server stopped sending data (read timeout)
Retrying in 2s...
Download error (attempt 2/3): the server stopped sending data (read timeout)
Retrying in 4s...
DownloadException: Download failed after 3 attempts: the server stopped sending data (read timeout)
Please try again later.

Test plan

19 new tests covering retry, timeout, cleanup, and error message behavior
All 659 existing unit tests still pass
ruff check and ruff format clean

…oads The httpx default read timeout of 5 seconds is too aggressive for large model downloads from servers under load (e.g. CivitAI). A brief server stall causes an unhandled ReadTimeout with a massive traceback. - Set read timeout to 300s (connect stays at 10s) - Retry up to 3 times with linear backoff (2s, 4s) on transient network errors (timeouts, connection resets) - Clean up partial files on any download failure, not just Ctrl-C - Convert raw httpx exceptions into DownloadException with a clear user-facing message Closes #420

coderabbitai · 2026-04-09T16:26:44Z

📝 Walkthrough

Walkthrough

Adds an HTTPX-backed streamed download with configured timeouts, retry/backoff for transient httpx errors, partial-file cleanup, friendly network-error mapping, and adjusted KeyboardInterrupt handling. download_file now retries up to configured attempts, preserves exception chaining, and exposes helpers for cleanup and error messaging.

Changes

Cohort / File(s)	Summary
Download resilience implementation `comfy_cli/file_utils.py`	Introduced `_download_file_httpx` using `httpx.stream` with `Timeout(read=300, connect=10)`, added retry loop for transient httpx exceptions with exponential backoff, `_cleanup_partial` and `_friendly_network_error` helpers, improved KeyboardInterrupt-aware cleanup, and ensured exception chaining via `__cause__`.
Network behavior test coverage `tests/test_file_utils_network.py`	Added tests (and `httpx` usage) that validate timeout config, transient-exception retries/backoff and eventual success, retry exhaustion behavior, immediate failure on non-200 responses, partial-file cleanup semantics, preservation of pre-existing files, exception chaining, and KeyboardInterrupt prompt/cleanup flows.

Sequence Diagram(s)

sequenceDiagram
    participant Caller as Caller
    participant DF as download_file()
    participant HTTPx as httpx.stream()
    participant FS as FileSystem
    participant Cleanup as _cleanup_partial()
    participant Map as _friendly_network_error()

    Caller->>DF: request download(url, dest)
    loop attempts (up to N with backoff)
        DF->>HTTPx: httpx.stream(..., timeout=Timeout(read=300,connect=10))
        alt response 200 OK
            HTTPx-->>DF: response stream
            DF->>FS: open dest & write chunks
            DF-->>Caller: return success
        else transient network error
            HTTPx-->>DF: throws Timeout/Network/Protocol/Proxy error
            DF->>Cleanup: remove partial file (if opened)
            Cleanup-->>DF: cleaned
            DF->>DF: sleep(backoff) then retry
        else non-200 status
            HTTPx-->>DF: response (non-200)
            DF->>Cleanup: remove partial file (if opened)
            DF->>Map: map status -> friendly message
            DF-->>Caller: raise DownloadException (no retry)
        else KeyboardInterrupt during write
            DF->>Cleanup: optionally remove partial file (based on prompt)
            DF-->>Caller: re-raise KeyboardInterrupt
        end
    end
    alt attempts exhausted
        DF->>Map: map last error -> friendly message
        DF-->>Caller: raise DownloadException (includes original __cause__)
    end

When networks misbehave, retries rhyme with time — try once more, then stop on chime.

🚥 Pre-merge checks | ✅ 2

✅ Passed checks (2 passed)

Check name	Status	Explanation
Linked Issues check	✅ Passed	PR successfully addresses all objectives from issue `#420`: handles ReadTimeout/network errors with retry logic (up to 3 attempts), provides friendly DownloadException messages replacing raw tracebacks, cleans up partial files, and implements practical timeout settings (10s connect, 300s read).
Out of Scope Changes check	✅ Passed	All changes are scoped to improving download robustness: modifications to file_utils.py implement retry/timeout/cleanup logic and corresponding tests validate these behaviors; no unrelated modifications detected.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch fix/download-retry-timeout

✨ Simplify code

Create PR with simplified code
Commit simplified code in branch fix/download-retry-timeout

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

codecov · 2026-04-09T16:28:44Z

Codecov Report

❌ Patch coverage is 95.31250% with 3 lines in your changes missing coverage. Please review.

Files with missing lines	Patch %	Lines
comfy_cli/file_utils.py	95.31%	3 Missing ⚠️

@@            Coverage Diff             @@
##             main     #422      +/-   ##
==========================================
+ Coverage   76.07%   76.36%   +0.29%     
==========================================
  Files          35       35              
  Lines        4213     4261      +48     
==========================================
+ Hits         3205     3254      +49     
+ Misses       1008     1007       -1

Files with missing lines	Coverage Δ
comfy_cli/file_utils.py	`88.38% <95.31%> (+3.00%)`	⬆️

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

…load retry Broaden transient exception coverage to include ProtocolError and ProxyError. Track whether the output file was actually opened so that cleanup only runs when a partial download exists — preventing deletion of unrelated pre-existing files on HTTP errors, connection failures, or early KeyboardInterrupt. Restore the user confirmation prompt on Ctrl+C (the original commit removed it).

coderabbitai

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@comfy_cli/file_utils.py`:
- Around line 207-213: The download loop currently writes directly to
local_filepath (using response.iter_bytes() and ui.show_progress) which can
truncate or delete a prior good file on failure; change it to write to a temp
sibling (e.g., local_filepath + ".part"), stream data into that temp file, and
on successful completion atomically replace the target with
os.replace(temp_path, local_filepath); on exceptions or cancellations ensure you
only unlink the temp file and never the original local_filepath. Apply the same
pattern for the similar block handling the other download (lines 223-246) so
both uses of response.iter_bytes(), ui.show_progress and the final file handling
perform write-to-temp then atomic replace/cleanup.
- Around line 233-252: The code currently only catches _TRANSIENT_EXCEPTIONS;
add an additional except block for httpx.HTTPError (or the package-aliased
HTTPError) after the transient/KeyboardInterrupt handlers that does not retry:
call _cleanup_partial(local_filepath), set last_exc = exc, and raise
DownloadException(...) from exc (using _friendly_network_error(exc) and
preserving the same message format used in the final raise) so non-transient
httpx errors are wrapped and cleaned up properly; ensure this new except sits
before the final raise and references DownloadException, _cleanup_partial,
_friendly_network_error, last_exc and _DOWNLOAD_MAX_RETRIES.
- Around line 195-198: The timeout wrapper currently calls response.read()
(inside httpx.stream) and treats timeouts as HTTPError and also passes bytes
into guess_status_code_reason; update the failure path in the httpx.stream block
so it does not pass raw bytes to guess_status_code_reason (use
response.content.decode('utf-8', errors='replace') or otherwise convert bytes to
str) and ensure timeouts are handled as timeouts: catch/propagate
httpx.ReadTimeout (or httpx.TimeoutException) instead of classifying it as an
HTTPError so the retry logic behaves correctly; update uses of
guess_status_code_reason, DownloadException, httpx.stream and _DOWNLOAD_TIMEOUT
accordingly.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: c59f9e41-4bef-416c-bafe-bf3c9b04f781

📥 Commits

Reviewing files that changed from the base of the PR and between 514b1b3 and b6d4796.

📒 Files selected for processing (2)

comfy_cli/file_utils.py
tests/test_file_utils_network.py

coderabbitai

♻️ Duplicate comments (3)

comfy_cli/file_utils.py (3)

229-279: ⚠️ Potential issue | 🟠 Major

Write into a .part file before touching the destination.

Line 229 opens local_filepath in "wb" before the download is known-good. If the stream fails after that, Lines 264-278 clean up the same path, which deletes the user's last good artifact instead of only the partial one. A temp sibling keeps the broom from sweeping too far.

🧹 Safer pattern

     local_filepath.parent.mkdir(parents=True, exist_ok=True)
+    temp_path = local_filepath.with_name(f"{local_filepath.name}.part")

     if downloader == "aria2":
         return _download_file_aria2(url, local_filepath, headers)

     last_exc: Exception | None = None
     state: dict = {"file_opened": False}

     for attempt in range(_DOWNLOAD_MAX_RETRIES):
         state["file_opened"] = False
         try:
-            _download_file_httpx(url, local_filepath, headers, state=state)
+            _download_file_httpx(url, temp_path, headers, state=state)
+            temp_path.replace(local_filepath)
             return
         except _TRANSIENT_EXCEPTIONS as exc:
             last_exc = exc
             if state["file_opened"]:
-                _cleanup_partial(local_filepath)
+                _cleanup_partial(temp_path)
             if attempt < _DOWNLOAD_MAX_RETRIES - 1:
                 wait = _DOWNLOAD_RETRY_BACKOFF * (attempt + 1)
                 print(f"Download error (attempt {attempt + 1}/{_DOWNLOAD_MAX_RETRIES}): {_friendly_network_error(exc)}")
                 print(f"Retrying in {wait}s...")
                 time.sleep(wait)
         except KeyboardInterrupt:
             if state["file_opened"]:
                 delete_eh = ui.prompt_confirm_action("Download interrupted, cleanup files?", True)
                 if delete_eh:
-                    _cleanup_partial(local_filepath)
+                    _cleanup_partial(temp_path)
             raise

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@comfy_cli/file_utils.py` around lines 229 - 279, The download routine opens
the final destination (local_filepath) before the stream is verified which risks
deleting a previously valid file on failures; change _download_file_httpx (and
its caller download_file) to write to a temporary sibling (e.g.,
local_filepath.with_suffix(local_filepath.suffix + ".part") or
local_filepath.with_name(local_filepath.name + ".part")) and only rename/move
the temp to local_filepath on successful completion; ensure state["file_opened"]
and _cleanup_partial track/operate on the .part temp path so cleanup and
KeyboardInterrupt prompts never delete the original destination, and keep
retry/aria2 logic unchanged.

255-285: ⚠️ Potential issue | 🟠 Major

Wrap the remaining httpx.HTTPError cases too.

Lines 260-279 only normalize _TRANSIENT_EXCEPTIONS. httpx.TooManyRedirects, httpx.UnsupportedProtocol, and other non-transient httpx.HTTPError subclasses still bubble out raw, so the traceback imp can still escape the bottle. Add a non-retrying except httpx.HTTPError that cleans up any partial output and raises DownloadException.

🩹 Proposed fix

         except _TRANSIENT_EXCEPTIONS as exc:
             last_exc = exc
             if state["file_opened"]:
                 _cleanup_partial(local_filepath)
             if attempt < _DOWNLOAD_MAX_RETRIES - 1:
                 wait = _DOWNLOAD_RETRY_BACKOFF * (attempt + 1)
                 print(f"Download error (attempt {attempt + 1}/{_DOWNLOAD_MAX_RETRIES}): {_friendly_network_error(exc)}")
                 print(f"Retrying in {wait}s...")
                 time.sleep(wait)
+        except httpx.HTTPError as exc:
+            if state["file_opened"]:
+                _cleanup_partial(local_filepath)
+            raise DownloadException(
+                f"Download failed: {_friendly_network_error(exc)}\nPlease try again later."
+            ) from exc
         except KeyboardInterrupt:
             if state["file_opened"]:
                 delete_eh = ui.prompt_confirm_action("Download interrupted, cleanup files?", True)
                 if delete_eh:
                     _cleanup_partial(local_filepath)

What exception classes in HTTPX are subclasses of `httpx.HTTPError` but not covered by `TimeoutException`, `NetworkError`, `ProtocolError`, or `ProxyError` (for example `TooManyRedirects` or `UnsupportedProtocol`)?

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@comfy_cli/file_utils.py` around lines 255 - 285, The retry loop only catches
_TRANSIENT_EXCEPTIONS, so other httpx.HTTPError subclasses (e.g.,
httpx.TooManyRedirects, httpx.UnsupportedProtocol) escape; add a non-retrying
except httpx.HTTPError block inside the for-loop after the transient-except
that: checks state["file_opened"] and calls _cleanup_partial(local_filepath) if
true (mirroring the KeyboardInterrupt cleanup), then raises a DownloadException
wrapping the error message (use _friendly_network_error(exc) for the message)
and chain the original exception; reference the retry loop around
_download_file_httpx, variables state and local_filepath, helper
_cleanup_partial, and exception type DownloadException.

217-220: ⚠️ Potential issue | 🟠 Major

Don't let response.read() turn HTTP errors into retries.

Line 219 still reads the error body inside the streaming context without a timeout guard. A slow 403/404 body can raise ReadTimeout, get caught by the transient retry loop, and violate the “HTTP status errors fail immediately” contract — a tiny timeout goblin with a big retry rhyme. Catch httpx.TimeoutException around the body read, or skip the body entirely when building the status message.

🩹 Proposed fix

         if response.status_code != 200:
-            status_reason = guess_status_code_reason(response.status_code, response.read())
+            error_body = ""
+            try:
+                error_body = response.read().decode("utf-8", errors="replace")
+            except httpx.TimeoutException:
+                pass
+            status_reason = guess_status_code_reason(response.status_code, error_body)
             raise DownloadException(f"Failed to download file.\n{status_reason}")

In HTTPX, when using `with httpx.stream(...) as response:`, can `response.read()` raise `ReadTimeout` / `TimeoutException` while consuming the response body, and is `ReadTimeout` a subclass of `TimeoutException`?

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@comfy_cli/file_utils.py` around lines 217 - 220, The current error-path reads
the response body inside the httpx.stream context which can raise
httpx.TimeoutException and cause transient retries; change the error handling in
the httpx.stream block where you call response.read() (used to build the status
message passed to guess_status_code_reason and the DownloadException) to either
skip reading the body or wrap the body read in a try/except catching
httpx.TimeoutException (and any ReadTimeout) and fall back to a safe placeholder
(e.g. empty bytes or a short message) so the DownloadException is raised
immediately; update the block that constructs status_reason (calling
guess_status_code_reason) to use the safe body value and ensure
DownloadException still contains useful context.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@comfy_cli/file_utils.py`:
- Around line 229-279: The download routine opens the final destination
(local_filepath) before the stream is verified which risks deleting a previously
valid file on failures; change _download_file_httpx (and its caller
download_file) to write to a temporary sibling (e.g.,
local_filepath.with_suffix(local_filepath.suffix + ".part") or
local_filepath.with_name(local_filepath.name + ".part")) and only rename/move
the temp to local_filepath on successful completion; ensure state["file_opened"]
and _cleanup_partial track/operate on the .part temp path so cleanup and
KeyboardInterrupt prompts never delete the original destination, and keep
retry/aria2 logic unchanged.
- Around line 255-285: The retry loop only catches _TRANSIENT_EXCEPTIONS, so
other httpx.HTTPError subclasses (e.g., httpx.TooManyRedirects,
httpx.UnsupportedProtocol) escape; add a non-retrying except httpx.HTTPError
block inside the for-loop after the transient-except that: checks
state["file_opened"] and calls _cleanup_partial(local_filepath) if true
(mirroring the KeyboardInterrupt cleanup), then raises a DownloadException
wrapping the error message (use _friendly_network_error(exc) for the message)
and chain the original exception; reference the retry loop around
_download_file_httpx, variables state and local_filepath, helper
_cleanup_partial, and exception type DownloadException.
- Around line 217-220: The current error-path reads the response body inside the
httpx.stream context which can raise httpx.TimeoutException and cause transient
retries; change the error handling in the httpx.stream block where you call
response.read() (used to build the status message passed to
guess_status_code_reason and the DownloadException) to either skip reading the
body or wrap the body read in a try/except catching httpx.TimeoutException (and
any ReadTimeout) and fall back to a safe placeholder (e.g. empty bytes or a
short message) so the DownloadException is raised immediately; update the block
that constructs status_reason (calling guess_status_code_reason) to use the safe
body value and ensure DownloadException still contains useful context.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 260a3f27-e4c8-4f1a-a005-a6affb1f28cc

📥 Commits

Reviewing files that changed from the base of the PR and between b6d4796 and d0007c6.

📒 Files selected for processing (2)

comfy_cli/file_utils.py
tests/test_file_utils_network.py

Wrap the error-body read in a try/except so that a stalling server cannot turn an HTTP error (e.g. 403) into a retried ReadTimeout.

coderabbitai

Actionable comments posted: 1

♻️ Duplicate comments (2)

comfy_cli/file_utils.py (2)

233-241: ⚠️ Potential issue | 🟠 Major

Write into a sibling .part file, then replace on success.

Line 233 still truncates local_filepath before the transfer has actually landed. If the stream fails mid-flight, the cleanup path deletes the target itself, so a previously good file can vanish in a puff of smoke. Stream into a temp sibling and replace() only after success—less smash, more stash.

🧹 Safer pattern

     local_filepath.parent.mkdir(parents=True, exist_ok=True)
+    temp_path = local_filepath.with_name(f"{local_filepath.name}.part")

     if downloader == "aria2":
         return _download_file_aria2(url, local_filepath, headers)

     last_exc: Exception | None = None
     state: dict = {"file_opened": False}

     for attempt in range(_DOWNLOAD_MAX_RETRIES):
         state["file_opened"] = False
         try:
-            _download_file_httpx(url, local_filepath, headers, state=state)
+            _download_file_httpx(url, temp_path, headers, state=state)
+            temp_path.replace(local_filepath)
             return
         except _TRANSIENT_EXCEPTIONS as exc:
             last_exc = exc
             if state["file_opened"]:
-                _cleanup_partial(local_filepath)
+                _cleanup_partial(temp_path)
             ...
         except KeyboardInterrupt:
             if state["file_opened"]:
                 delete_eh = ui.prompt_confirm_action("Download interrupted, cleanup files?", True)
                 if delete_eh:
-                    _cleanup_partial(local_filepath)
+                    _cleanup_partial(temp_path)
             raise

Also applies to: 268-269, 279-282

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@comfy_cli/file_utils.py` around lines 233 - 241, The current download writes
directly to local_filepath (opening it at top of the block and setting
state["file_opened"]) and can truncate the target if the stream fails; change
the pattern to stream into a sibling temporary file (e.g., local_filepath +
".part") using the same ui.show_progress(response.iter_bytes(), ...)/f.write
loop, only call Path(local_filepath).replace(temp_path) after the loop completes
successfully, and ensure state["file_opened"] (or any cleanup logic) refers to
the final target only after a successful replace so partial failures don’t
remove an existing good file; apply the same pattern to the other occurrences
around the code referenced (lines with opening local_filepath at the other noted
locations).

264-289: ⚠️ Potential issue | 🟠 Major

Wrap the remaining httpx.HTTPError cases instead of leaking raw tracebacks.

Only transient transport errors are normalized here. httpx can still raise non-retriable HTTPError subclasses such as TooManyRedirects, UnsupportedProtocol, or DecodingError, and those will currently escape as raw library exceptions rather than DownloadException. Add a non-retrying except httpx.HTTPError that cleans up and re-raises a friendly message.

🪤 Non-retry wrapper

         except _TRANSIENT_EXCEPTIONS as exc:
             last_exc = exc
             if state["file_opened"]:
                 _cleanup_partial(local_filepath)
             if attempt < _DOWNLOAD_MAX_RETRIES - 1:
                 wait = _DOWNLOAD_RETRY_BACKOFF * (attempt + 1)
                 print(f"Download error (attempt {attempt + 1}/{_DOWNLOAD_MAX_RETRIES}): {_friendly_network_error(exc)}")
                 print(f"Retrying in {wait}s...")
                 time.sleep(wait)
+        except httpx.HTTPError as exc:
+            if state["file_opened"]:
+                _cleanup_partial(local_filepath)
+            raise DownloadException(f"Failed to download file: {_friendly_network_error(exc)}") from exc
         except KeyboardInterrupt:
             ...

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@comfy_cli/file_utils.py` around lines 264 - 289, The catch only handles
_TRANSIENT_EXCEPTIONS, so other httpx.HTTPError subclasses can leak raw
tracebacks; add a non-retry except httpx.HTTPError block (after the transient
except and before KeyboardInterrupt/raise) that mirrors the transient cleanup
logic: if state["file_opened"] call _cleanup_partial(local_filepath), then raise
a DownloadException (using _friendly_network_error(exc) in the message) from the
caught exc so non-retriable httpx errors are normalized; reference
functions/vars to modify: _download_file_httpx, _TRANSIENT_EXCEPTIONS,
state["file_opened"], _cleanup_partial, DownloadException, and
_friendly_network_error.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@comfy_cli/file_utils.py`:
- Around line 233-241: Wrap the local file open/write block inside the retry
loop (the with open(local_filepath, "wb") ... f.write(data) section in
comfy_cli/file_utils.py) with an except OSError handler that, if state is not
None and state.get("file_opened") is True, calls
_cleanup_partial(local_filepath) to remove the truncated partial file, then
re-raises a DownloadException that wraps/contains the original OSError; apply
the same change to the similar write block around lines 275–289 so both
write-failure paths clean up partial files and raise a friendly
DownloadException instead of letting OSError escape raw.

---

Duplicate comments:
In `@comfy_cli/file_utils.py`:
- Around line 233-241: The current download writes directly to local_filepath
(opening it at top of the block and setting state["file_opened"]) and can
truncate the target if the stream fails; change the pattern to stream into a
sibling temporary file (e.g., local_filepath + ".part") using the same
ui.show_progress(response.iter_bytes(), ...)/f.write loop, only call
Path(local_filepath).replace(temp_path) after the loop completes successfully,
and ensure state["file_opened"] (or any cleanup logic) refers to the final
target only after a successful replace so partial failures don’t remove an
existing good file; apply the same pattern to the other occurrences around the
code referenced (lines with opening local_filepath at the other noted
locations).
- Around line 264-289: The catch only handles _TRANSIENT_EXCEPTIONS, so other
httpx.HTTPError subclasses can leak raw tracebacks; add a non-retry except
httpx.HTTPError block (after the transient except and before
KeyboardInterrupt/raise) that mirrors the transient cleanup logic: if
state["file_opened"] call _cleanup_partial(local_filepath), then raise a
DownloadException (using _friendly_network_error(exc) in the message) from the
caught exc so non-retriable httpx errors are normalized; reference
functions/vars to modify: _download_file_httpx, _TRANSIENT_EXCEPTIONS,
state["file_opened"], _cleanup_partial, DownloadException, and
_friendly_network_error.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 84443ad4-06d2-4a89-bda0-811042bc847a

📥 Commits

Reviewing files that changed from the base of the PR and between d0007c6 and e589aeb.

📒 Files selected for processing (1)

comfy_cli/file_utils.py

coderabbitai · 2026-04-12T07:37:36Z

+        with open(local_filepath, "wb") as f:
+            if state is not None:
+                state["file_opened"] = True
+            for data in ui.show_progress(
+                response.iter_bytes(),
+                total,
+                description=description,
+            ):
+                f.write(data)


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

# First, locate and examine the file find . -name "file_utils.py" -path "*/comfy_cli/*" | head -5

Repository: Comfy-Org/comfy-cli

Length of output: 88

🏁 Script executed:

# Get the file size to understand scope wc -l comfy_cli/file_utils.py

Repository: Comfy-Org/comfy-cli

Length of output: 90

🏁 Script executed:

# Read the specific lines and surrounding context (220-300 to get full picture) sed -n '220,300p' comfy_cli/file_utils.py | cat -n

Repository: Comfy-Org/comfy-cli

Length of output: 4076

🏁 Script executed:

# Search for _cleanup_partial function to verify it exists rg -n "_cleanup_partial" comfy_cli/file_utils.py

Repository: Comfy-Org/comfy-cli

Length of output: 230

🏁 Script executed:

# Search for OSError handling in the file rg -n "OSError" comfy_cli/file_utils.py

Repository: Comfy-Org/comfy-cli

Length of output: 110

🏁 Script executed:

# Search for state["file_opened"] usage patterns rg -n 'state\["file_opened"\]' comfy_cli/file_utils.py

Repository: Comfy-Org/comfy-cli

Length of output: 319

🏁 Script executed:

# Search for _TRANSIENT_EXCEPTIONS definition rg -n "_TRANSIENT_EXCEPTIONS" comfy_cli/file_utils.py -A 5 -B 2

Repository: Comfy-Org/comfy-cli

Length of output: 1187

🏁 Script executed:

# Look at the complete exception handling block (lines 245-290) sed -n '245,290p' comfy_cli/file_utils.py | cat -n

Repository: Comfy-Org/comfy-cli

Length of output: 2429

Catch local file I/O errors before they file away without a trace.

Lines 233 and 241 can raise OSError (disk full, permission changes, flaky filesystem) after the destination is truncated—a write failure that's bound to get a rise out of your users. Currently that error escapes raw and leaves a half-baked file sitting around. Add an except OSError handler in the retry loop that cleans up via _cleanup_partial() when state["file_opened"] is true, then re-raises a friendly DownloadException. Otherwise the cleanup promise doesn't write out to completion.

💾 Handle write failures without leaving partial files behind

except KeyboardInterrupt: if state["file_opened"]: delete_eh = ui.prompt_confirm_action("Download interrupted, cleanup files?", True) if delete_eh: _cleanup_partial(local_filepath) raise + except OSError as exc: + if state["file_opened"]: + _cleanup_partial(local_filepath) + raise DownloadException(f"Failed to write downloaded file: {exc}") from exc

Also applies to: 275–289

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@comfy_cli/file_utils.py` around lines 233 - 241, Wrap the local file open/write block inside the retry loop (the with open(local_filepath, "wb") ... f.write(data) section in comfy_cli/file_utils.py) with an except OSError handler that, if state is not None and state.get("file_opened") is True, calls _cleanup_partial(local_filepath) to remove the truncated partial file, then re-raises a DownloadException that wraps/contains the original OSError; apply the same change to the similar write block around lines 275–289 so both write-failure paths clean up partial files and raise a friendly DownloadException instead of letting OSError escape raw.

bigcat88 marked this pull request as ready for review April 12, 2026 06:55

dosubot bot added the size:L This PR changes 100-499 lines, ignoring generated files. label Apr 12, 2026

coderabbitai bot reviewed Apr 12, 2026

View reviewed changes

Comment thread comfy_cli/file_utils.py

Comment thread comfy_cli/file_utils.py

coderabbitai bot reviewed Apr 12, 2026

View reviewed changes

fix: guard response.read() against transient errors on non-200 responses

e589aeb

Wrap the error-body read in a try/except so that a stalling server cannot turn an HTTP error (e.g. 403) into a retried ReadTimeout.

coderabbitai bot reviewed Apr 12, 2026

View reviewed changes

bigcat88 merged commit b414616 into main Apr 12, 2026
15 checks passed

bigcat88 deleted the fix/download-retry-timeout branch April 12, 2026 12:12

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: retry transient errors and clean up partial files in model downloads#422

fix: retry transient errors and clean up partial files in model downloads#422
bigcat88 merged 3 commits intomainfrom
fix/download-retry-timeout

bigcat88 commented Apr 9, 2026

Uh oh!

coderabbitai bot commented Apr 9, 2026 •

edited

Loading

Walkthrough

Changes

Sequence Diagram(s)

Uh oh!

codecov bot commented Apr 9, 2026 •

edited

Loading

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Apr 12, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

bigcat88 commented Apr 9, 2026

Summary

Test plan

Uh oh!

coderabbitai bot commented Apr 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Uh oh!

codecov bot commented Apr 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Apr 12, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

coderabbitai bot commented Apr 9, 2026 •

edited

Loading

codecov bot commented Apr 9, 2026 •

edited

Loading