Conversation
vLLM returns 500 for prompt-too-long errors instead of 400. This causes the circuit breaker to penalise healthy workers for bad client input. Rewrite the status to 400 when the response body matches known input validation patterns.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: a5a29702aa
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
ApprovabilityVerdict: Needs human review 1 blocking correctness issue found. Multiple reviewers have identified that this fix for vLLM input validation errors only applies to non-streaming requests, leaving streaming requests vulnerable to the same circuit breaker issue the PR aims to solve. This incomplete fix warrants human review to determine if the streaming path should also be addressed. You can customize Macroscope's approvability policy. Learn more. |
Move the 500 body inspection before the stream/non-stream branch so both paths get the 400 rewrite. vLLM error responses are always synchronous JSON even when the client requested streaming.
|
@codex review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: a045cfcd2f
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| // doesn't penalise the worker for bad input. | ||
| if status == StatusCode::INTERNAL_SERVER_ERROR { | ||
| let response_headers = header_utils::preserve_response_headers(res.headers()); | ||
| let body = res.bytes().await.unwrap_or_default(); |
There was a problem hiding this comment.
Handle 500 body read errors explicitly
Using res.bytes().await.unwrap_or_default() converts upstream read failures into an empty body while still forwarding preserved upstream headers. If the upstream included Content-Length (or other body-dependent metadata), this can emit a malformed response (e.g., declared nonzero length with empty body) and hides the real transport failure from callers and logs; before this change, the error path returned a router-generated 500 with diagnostics.
Useful? React with 👍 / 👎.
Log the transport error and return a diagnostic 500 to the caller rather than silently forwarding an empty body with stale headers.
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
Reviewed by Cursor Bugbot for commit cdad352. Configure here.
Only decrement load in the early-return path for rewritten 400s (input validation). Genuine 500s are retryable and the caller retry closure already handles their load cleanup. Also properly handle body read errors without swallowing them.
|
@codex review |

Note
Medium Risk
Changes upstream error handling by reading and conditionally rewriting
500responses to400, which affects client-visible status codes and load/circuit-breaker accounting. Risk is moderate due to altered retry/cleanup paths and potential edge cases if the upstream body is large or patterns mis-match.Overview
Reclassifies certain vLLM request validation failures that incorrectly arrive as HTTP
500into client errors by inspecting the upstream response body and rewriting the status to400, preventing these from counting as worker failures for circuit breaker purposes.Adds
is_vllm_input_validation_errorpattern matching plus a focused unit test, and updates the request proxy path to preserve headers/body while ensuring worker load is decremented correctly when a500is rewritten and the normal retry/streaming cleanup is bypassed.Reviewed by Cursor Bugbot for commit aa4a5cc. Bugbot is set up for automated code reviews on this repo. Configure here.
Note
Treat vLLM input validation 500s as 400 for circuit breaker
is_vllm_input_validation_errorin router.rs to detect vLLM prompt-length validation errors by inspecting the response body for known substrings.'Failed to read upstream response'is returned.Macroscope summarized aa4a5cc.