Skip to content

fix: retry on protocol errors and 504 during streaming#1544

Closed
mvanhorn wants to merge 2 commits intoMoonshotAI:mainfrom
mvanhorn:osc/1540-fix-network-retry-unstable
Closed

fix: retry on protocol errors and 504 during streaming#1544
mvanhorn wants to merge 2 commits intoMoonshotAI:mainfrom
mvanhorn:osc/1540-fix-network-retry-unstable

Conversation

@mvanhorn
Copy link

@mvanhorn mvanhorn commented Mar 22, 2026

Summary

Network disconnects during streaming now trigger retry instead of terminating the session.

Why this matters

When using kimi-cli over unstable connections (mobile hotspot, high-latency networks), the generation terminates completely on connection drops instead of retrying. The retry infrastructure exists via tenacity, but two error types weren't being classified as retryable.

Changes

Two files, 3 lines total:

  • packages/kosong/src/kosong/chat_provider/openai_common.py - Added httpx.ProtocolError mapping to APIConnectionError in convert_error(). Previously, RemoteProtocolError (server drops connection mid-stream) fell through to generic ChatProviderError, bypassing retry. httpx.NetworkError was already handled but httpx.ProtocolError is a separate branch in the httpx exception hierarchy.

  • src/kimi_cli/soul/kimisoul.py - Added 504 Gateway Timeout to retryable status codes in _is_retryable_error(). 504 is a transient error (upstream server timeout) that should be retried alongside 429/500/502/503.

Testing

Verified the httpx exception hierarchy: ProtocolError (parent of RemoteProtocolError, LocalProtocolError) inherits from TransportError but NOT from NetworkError. The existing httpx.NetworkError case only catches ConnectError, ReadError, WriteError, CloseError.

Fixes #1540

This contribution was developed with AI assistance (Claude Code).


Open with Devin

Copy link
Contributor

@devin-ai-integration devin-ai-integration bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Devin Review: No Issues Found

Devin Review analyzed this PR and found no potential bugs to report.

View in Devin Review to see 2 additional findings.

Open in Devin Review

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 40b13dc20e

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +90 to +91
case httpx.ProtocolError():
return APIConnectionError(str(error))

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Route ProtocolError through retryable converter

This new httpx.ProtocolError mapping is unreachable because convert_error() returns immediately for any httpx.HTTPError earlier in the function, and ProtocolError is an HTTPError. In practice, protocol disconnects still go through convert_httpx_error (packages/kosong/src/kosong/chat_provider/__init__.py, lines 165-171), which does not classify ProtocolError as APIConnectionError, so these failures remain non-retryable in KimiSoul._is_retryable_error and the stream can still terminate instead of retrying.

Useful? React with 👍 / 👎.

Copy link
Contributor

@devin-ai-integration devin-ai-integration bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Devin Review found 1 new potential issue.

View 2 additional findings in Devin Review.

Open in Devin Review

Comment on lines +86 to +93
case httpx.TimeoutException():
return APITimeoutError(str(error))
case httpx.NetworkError():
return APIConnectionError(str(error))
case httpx.ProtocolError():
return APIConnectionError(str(error))
case httpx.HTTPStatusError():
return APIStatusError(error.response.status_code, str(error))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 New httpx match cases in convert_error are unreachable dead code

The new case httpx.TimeoutException() / NetworkError() / ProtocolError() / HTTPStatusError() match arms (lines 86-93) can never be reached. The isinstance(error, httpx.HTTPError) guard at line 77 already catches all httpx.HTTPError subclasses and returns early via convert_httpx_error. The match statement on line 81 is only entered for OpenAIError instances, which do not inherit from any httpx class — so the new httpx class patterns will never match.

This matters most for httpx.ProtocolError: the dead match case at line 90 would map it to APIConnectionError (which is retryable per kimisoul.py:859-870), but the actually-executed path through convert_httpx_error (packages/kosong/src/kosong/chat_provider/__init__.py:171) maps it to a generic ChatProviderError (which is not retryable). If the intent was to make ProtocolError retryable, the fix needs to go into convert_httpx_error instead.

Prompt for agents
The new httpx match cases at lines 86-93 of packages/kosong/src/kosong/chat_provider/openai_common.py are unreachable because the isinstance(error, httpx.HTTPError) check at line 77 already catches all httpx.HTTPError subclasses and returns via convert_httpx_error.

To fix this:
1. Remove the dead httpx match cases (lines 86-93) from the convert_error function in packages/kosong/src/kosong/chat_provider/openai_common.py.
2. If the intent is to make httpx.ProtocolError retryable by mapping it to APIConnectionError, add a ProtocolError check to the convert_httpx_error function in packages/kosong/src/kosong/chat_provider/__init__.py (around line 169), before the final fallback return. For example, add: if isinstance(error, httpx.ProtocolError): return APIConnectionError(str(error)) between the NetworkError and HTTPStatusError checks.
Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.

@n-WN
Copy link
Collaborator

n-WN commented Mar 25, 2026

The issue has been resolved at #1577, so I will close this PR.

@n-WN n-WN closed this Mar 25, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Generation terminates on unstable networks instead of retrying/resuming

3 participants