treat vLLM input validation 500s as 400 for circuit breaker by JannikSt · Pull Request #24 · PrimeIntellect-ai/router

JannikSt · 2026-04-15T09:35:59Z

vLLM returns 500 for prompt-too-long errors (e.g. 65537 tokens vs 65536 limit) instead of 400
these were counting as backend failures and tripping the circuit breaker on healthy workers
now rewrites the status to 400 when the body matches known input validation patterns so the circuit breaker ignores them

Note

Medium Risk
Changes upstream error handling by reading and conditionally rewriting 500 responses to 400, which affects client-visible status codes and load/circuit-breaker accounting. Risk is moderate due to altered retry/cleanup paths and potential edge cases if the upstream body is large or patterns mis-match.

Overview
Reclassifies certain vLLM request validation failures that incorrectly arrive as HTTP 500 into client errors by inspecting the upstream response body and rewriting the status to 400, preventing these from counting as worker failures for circuit breaker purposes.

Adds is_vllm_input_validation_error pattern matching plus a focused unit test, and updates the request proxy path to preserve headers/body while ensuring worker load is decremented correctly when a 500 is rewritten and the normal retry/streaming cleanup is bypassed.

^{Reviewed by Cursor Bugbot for commit aa4a5cc. Bugbot is set up for automated code reviews on this repo. Configure here.}

Note

Treat vLLM input validation 500s as 400 for circuit breaker

Adds is_vllm_input_validation_error in router.rs to detect vLLM prompt-length validation errors by inspecting the response body for known substrings.
When a worker returns a 500 matching this pattern, the router rewrites the status to 400, returns the original body with preserved headers, and decrements the worker's load counter so the circuit breaker is not triggered.
Non-matching 500s are passed through unchanged; if the body cannot be read, a 500 with 'Failed to read upstream response' is returned.
Behavioral Change: clients receiving vLLM input validation errors will now see 400 instead of 500.

^{Macroscope summarized aa4a5cc.}

vLLM returns 500 for prompt-too-long errors instead of 400. This causes the circuit breaker to penalise healthy workers for bad client input. Rewrite the status to 400 when the response body matches known input validation patterns.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: a5a29702aa

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

macroscopeapp · 2026-04-15T09:41:15Z

Approvability

Verdict: Needs human review

1 blocking correctness issue found. Multiple reviewers have identified that this fix for vLLM input validation errors only applies to non-streaming requests, leaving streaming requests vulnerable to the same circuit breaker issue the PR aims to solve. This incomplete fix warrants human review to determine if the streaming path should also be addressed.

^{You can customize Macroscope's approvability policy. Learn more.}

Move the 500 body inspection before the stream/non-stream branch so both paths get the 400 rewrite. vLLM error responses are always synchronous JSON even when the client requested streaming.

JannikSt · 2026-04-15T10:06:37Z

@codex review

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: a045cfcd2f

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-04-15T10:11:09Z

+        // doesn't penalise the worker for bad input.
+        if status == StatusCode::INTERNAL_SERVER_ERROR {
+            let response_headers = header_utils::preserve_response_headers(res.headers());
+            let body = res.bytes().await.unwrap_or_default();


Handle 500 body read errors explicitly

Using res.bytes().await.unwrap_or_default() converts upstream read failures into an empty body while still forwarding preserved upstream headers. If the upstream included Content-Length (or other body-dependent metadata), this can emit a malformed response (e.g., declared nonzero length with empty body) and hides the real transport failure from callers and logs; before this change, the error path returned a router-generated 500 with diagnostics.

Useful? React with 👍 / 👎.

Log the transport error and return a diagnostic 500 to the caller rather than silently forwarding an empty body with stale headers.

cursor

Cursor Bugbot has reviewed your changes and found 1 potential issue.

^{❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.}

^{Reviewed by Cursor Bugbot for commit cdad352. Configure here.}

Only decrement load in the early-return path for rewritten 400s (input validation). Genuine 500s are retryable and the caller retry closure already handles their load cleanup. Also properly handle body read errors without swallowing them.

JannikSt · 2026-04-15T10:43:17Z

@codex review

cursor bot reviewed Apr 15, 2026

View reviewed changes

Comment thread src/routers/http/router.rs

chatgpt-codex-connector bot reviewed Apr 15, 2026

View reviewed changes

Comment thread src/routers/http/router.rs Outdated

macroscopeapp bot reviewed Apr 15, 2026

View reviewed changes

Comment thread src/routers/http/router.rs

handle vLLM input validation 500s for streaming requests too

a045cfc

Move the 500 body inspection before the stream/non-stream branch so both paths get the 400 rewrite. vLLM error responses are always synchronous JSON even when the client requested streaming.

chatgpt-codex-connector bot reviewed Apr 15, 2026

View reviewed changes

handle 500 body read errors explicitly instead of unwrap_or_default

cdad352

Log the transport error and return a diagnostic 500 to the caller rather than silently forwarding an empty body with stale headers.

cursor bot reviewed Apr 15, 2026

View reviewed changes

Comment thread src/routers/http/router.rs

fix double load decrement for 500 responses

aa4a5cc

Only decrement load in the early-return path for rewritten 400s (input validation). Genuine 500s are retryable and the caller retry closure already handles their load cleanup. Also properly handle body read errors without swallowing them.

JannikSt merged commit bf5e83d into main Apr 15, 2026
7 checks passed

JannikSt mentioned this pull request Apr 16, 2026

release: v0.1.22 #25

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

treat vLLM input validation 500s as 400 for circuit breaker#24

treat vLLM input validation 500s as 400 for circuit breaker#24
JannikSt merged 4 commits intomainfrom
improvement/exclude-input-validation-from-circuit-breaker

JannikSt commented Apr 15, 2026 •

edited by macroscopeapp bot

Loading

Uh oh!

Uh oh!

chatgpt-codex-connector bot left a comment

Uh oh!

Uh oh!

Uh oh!

macroscopeapp bot commented Apr 15, 2026

Uh oh!

JannikSt commented Apr 15, 2026

Uh oh!

chatgpt-codex-connector bot left a comment

Uh oh!

chatgpt-codex-connector bot Apr 15, 2026

Uh oh!

cursor bot left a comment

Uh oh!

Uh oh!

JannikSt commented Apr 15, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

JannikSt commented Apr 15, 2026 • edited by macroscopeapp bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Treat vLLM input validation 500s as 400 for circuit breaker

Uh oh!

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

Uh oh!

macroscopeapp bot commented Apr 15, 2026

Approvability

Uh oh!

JannikSt commented Apr 15, 2026

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector bot Apr 15, 2026

Choose a reason for hiding this comment

Uh oh!

cursor bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

JannikSt commented Apr 15, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

JannikSt commented Apr 15, 2026 •

edited by macroscopeapp bot

Loading