Skip to content

treat vLLM input validation 500s as 400 for circuit breaker#24

Merged
JannikSt merged 4 commits intomainfrom
improvement/exclude-input-validation-from-circuit-breaker
Apr 15, 2026
Merged

treat vLLM input validation 500s as 400 for circuit breaker#24
JannikSt merged 4 commits intomainfrom
improvement/exclude-input-validation-from-circuit-breaker

Conversation

@JannikSt
Copy link
Copy Markdown
Member

@JannikSt JannikSt commented Apr 15, 2026

  • vLLM returns 500 for prompt-too-long errors (e.g. 65537 tokens vs 65536 limit) instead of 400
  • these were counting as backend failures and tripping the circuit breaker on healthy workers
  • now rewrites the status to 400 when the body matches known input validation patterns so the circuit breaker ignores them

Note

Medium Risk
Changes upstream error handling by reading and conditionally rewriting 500 responses to 400, which affects client-visible status codes and load/circuit-breaker accounting. Risk is moderate due to altered retry/cleanup paths and potential edge cases if the upstream body is large or patterns mis-match.

Overview
Reclassifies certain vLLM request validation failures that incorrectly arrive as HTTP 500 into client errors by inspecting the upstream response body and rewriting the status to 400, preventing these from counting as worker failures for circuit breaker purposes.

Adds is_vllm_input_validation_error pattern matching plus a focused unit test, and updates the request proxy path to preserve headers/body while ensuring worker load is decremented correctly when a 500 is rewritten and the normal retry/streaming cleanup is bypassed.

Reviewed by Cursor Bugbot for commit aa4a5cc. Bugbot is set up for automated code reviews on this repo. Configure here.

Note

Treat vLLM input validation 500s as 400 for circuit breaker

  • Adds is_vllm_input_validation_error in router.rs to detect vLLM prompt-length validation errors by inspecting the response body for known substrings.
  • When a worker returns a 500 matching this pattern, the router rewrites the status to 400, returns the original body with preserved headers, and decrements the worker's load counter so the circuit breaker is not triggered.
  • Non-matching 500s are passed through unchanged; if the body cannot be read, a 500 with 'Failed to read upstream response' is returned.
  • Behavioral Change: clients receiving vLLM input validation errors will now see 400 instead of 500.

Macroscope summarized aa4a5cc.

vLLM returns 500 for prompt-too-long errors instead of 400. This
causes the circuit breaker to penalise healthy workers for bad client
input. Rewrite the status to 400 when the response body matches known
input validation patterns.
Comment thread src/routers/http/router.rs
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: a5a29702aa

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/routers/http/router.rs Outdated
Comment thread src/routers/http/router.rs
@macroscopeapp
Copy link
Copy Markdown

macroscopeapp bot commented Apr 15, 2026

Approvability

Verdict: Needs human review

1 blocking correctness issue found. Multiple reviewers have identified that this fix for vLLM input validation errors only applies to non-streaming requests, leaving streaming requests vulnerable to the same circuit breaker issue the PR aims to solve. This incomplete fix warrants human review to determine if the streaming path should also be addressed.

You can customize Macroscope's approvability policy. Learn more.

Move the 500 body inspection before the stream/non-stream branch so
both paths get the 400 rewrite. vLLM error responses are always
synchronous JSON even when the client requested streaming.
@JannikSt
Copy link
Copy Markdown
Member Author

@codex review

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: a045cfcd2f

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/routers/http/router.rs Outdated
// doesn't penalise the worker for bad input.
if status == StatusCode::INTERNAL_SERVER_ERROR {
let response_headers = header_utils::preserve_response_headers(res.headers());
let body = res.bytes().await.unwrap_or_default();
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Handle 500 body read errors explicitly

Using res.bytes().await.unwrap_or_default() converts upstream read failures into an empty body while still forwarding preserved upstream headers. If the upstream included Content-Length (or other body-dependent metadata), this can emit a malformed response (e.g., declared nonzero length with empty body) and hides the real transport failure from callers and logs; before this change, the error path returned a router-generated 500 with diagnostics.

Useful? React with 👍 / 👎.

Log the transport error and return a diagnostic 500 to the caller
rather than silently forwarding an empty body with stale headers.
Copy link
Copy Markdown

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit cdad352. Configure here.

Comment thread src/routers/http/router.rs
Only decrement load in the early-return path for rewritten 400s
(input validation). Genuine 500s are retryable and the caller
retry closure already handles their load cleanup. Also properly
handle body read errors without swallowing them.
@JannikSt
Copy link
Copy Markdown
Member Author

@codex review

@JannikSt JannikSt merged commit bf5e83d into main Apr 15, 2026
7 checks passed
@JannikSt JannikSt mentioned this pull request Apr 16, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant