fix: retry on protocol errors and 504 during streaming#1544
fix: retry on protocol errors and 504 during streaming#1544mvanhorn wants to merge 2 commits intoMoonshotAI:mainfrom
Conversation
Signed-off-by: Kai <me@kaiyi.cool>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 40b13dc20e
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| case httpx.ProtocolError(): | ||
| return APIConnectionError(str(error)) |
There was a problem hiding this comment.
Route ProtocolError through retryable converter
This new httpx.ProtocolError mapping is unreachable because convert_error() returns immediately for any httpx.HTTPError earlier in the function, and ProtocolError is an HTTPError. In practice, protocol disconnects still go through convert_httpx_error (packages/kosong/src/kosong/chat_provider/__init__.py, lines 165-171), which does not classify ProtocolError as APIConnectionError, so these failures remain non-retryable in KimiSoul._is_retryable_error and the stream can still terminate instead of retrying.
Useful? React with 👍 / 👎.
| case httpx.TimeoutException(): | ||
| return APITimeoutError(str(error)) | ||
| case httpx.NetworkError(): | ||
| return APIConnectionError(str(error)) | ||
| case httpx.ProtocolError(): | ||
| return APIConnectionError(str(error)) | ||
| case httpx.HTTPStatusError(): | ||
| return APIStatusError(error.response.status_code, str(error)) |
There was a problem hiding this comment.
🔴 New httpx match cases in convert_error are unreachable dead code
The new case httpx.TimeoutException() / NetworkError() / ProtocolError() / HTTPStatusError() match arms (lines 86-93) can never be reached. The isinstance(error, httpx.HTTPError) guard at line 77 already catches all httpx.HTTPError subclasses and returns early via convert_httpx_error. The match statement on line 81 is only entered for OpenAIError instances, which do not inherit from any httpx class — so the new httpx class patterns will never match.
This matters most for httpx.ProtocolError: the dead match case at line 90 would map it to APIConnectionError (which is retryable per kimisoul.py:859-870), but the actually-executed path through convert_httpx_error (packages/kosong/src/kosong/chat_provider/__init__.py:171) maps it to a generic ChatProviderError (which is not retryable). If the intent was to make ProtocolError retryable, the fix needs to go into convert_httpx_error instead.
Prompt for agents
The new httpx match cases at lines 86-93 of packages/kosong/src/kosong/chat_provider/openai_common.py are unreachable because the isinstance(error, httpx.HTTPError) check at line 77 already catches all httpx.HTTPError subclasses and returns via convert_httpx_error.
To fix this:
1. Remove the dead httpx match cases (lines 86-93) from the convert_error function in packages/kosong/src/kosong/chat_provider/openai_common.py.
2. If the intent is to make httpx.ProtocolError retryable by mapping it to APIConnectionError, add a ProtocolError check to the convert_httpx_error function in packages/kosong/src/kosong/chat_provider/__init__.py (around line 169), before the final fallback return. For example, add: if isinstance(error, httpx.ProtocolError): return APIConnectionError(str(error)) between the NetworkError and HTTPStatusError checks.
Was this helpful? React with 👍 or 👎 to provide feedback.
|
The issue has been resolved at #1577, so I will close this PR. |
Summary
Network disconnects during streaming now trigger retry instead of terminating the session.
Why this matters
When using kimi-cli over unstable connections (mobile hotspot, high-latency networks), the generation terminates completely on connection drops instead of retrying. The retry infrastructure exists via tenacity, but two error types weren't being classified as retryable.
Changes
Two files, 3 lines total:
packages/kosong/src/kosong/chat_provider/openai_common.py- Addedhttpx.ProtocolErrormapping toAPIConnectionErrorinconvert_error(). Previously,RemoteProtocolError(server drops connection mid-stream) fell through to genericChatProviderError, bypassing retry.httpx.NetworkErrorwas already handled buthttpx.ProtocolErroris a separate branch in the httpx exception hierarchy.src/kimi_cli/soul/kimisoul.py- Added504 Gateway Timeoutto retryable status codes in_is_retryable_error(). 504 is a transient error (upstream server timeout) that should be retried alongside 429/500/502/503.Testing
Verified the httpx exception hierarchy:
ProtocolError(parent ofRemoteProtocolError,LocalProtocolError) inherits fromTransportErrorbut NOT fromNetworkError. The existinghttpx.NetworkErrorcase only catchesConnectError,ReadError,WriteError,CloseError.Fixes #1540
This contribution was developed with AI assistance (Claude Code).