feat: add /openai/v1 passthrough to bedrock-mantle#62
Merged
Conversation
16 tasks covering feature flag, auth middleware extension, usage extraction, httpx passthrough client, /chat/completions and /responses endpoints with streaming, full Responses CRUD, /models, guardrail header forwarding, and documentation updates. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Mounts /openai/v1/* (chat/completions, responses + CRUD, models) as raw httpx passthrough to bedrock-mantle. Reuses proxy API key auth, rate limits, budgets, and usage tracking. Independent of ENABLE_OPENAI_COMPAT. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Implements the FastAPI router for OpenAI passthrough, mounts it conditionally under /openai/v1 when ENABLE_OPENAI_PASSTHROUGH=True, and adds four integration tests (non-streaming forward, model mapping, 4xx passthrough, and 401 on missing auth). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…ancel, input_items)
….md, and features.md
…ut; add flag-off and timeout tests Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
… base path httpx follows RFC 3986 path-merging on AsyncClient.base_url: a request path starting with `/` REPLACES the base_url's path entirely. With OPENAI_BASE_URL=https://bedrock-mantle.us-west-2.api.aws/v1, calls like `client.post("/chat/completions")` were being sent to `bedrock-mantle.us-west-2.api.aws/chat/completions` (no `/v1`), causing 404s in production. Fix: - Drop base_url from the AsyncClient - Add upstream_url(path) that explicitly joins OPENAI_BASE_URL + path - Use upstream_url() everywhere we previously passed bare paths - Add unit tests covering leading-slash, trailing-slash, and ID-in-path cases that would have caught this Integration tests passed previously because respx joins base_url + path intuitively; only real httpx exhibits the RFC 3986 replacement behaviour. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…inition The previous deploy mounted the new /openai/v1/* code but the CDK never passed ENABLE_OPENAI_PASSTHROUGH through to the container, so the conditional router mount at app/main.py evaluated False and the routes weren't registered. Add support symmetrical to enableOpenaiCompat: - AppConfig: new enableOpenaiPassthrough field - prod default: true (ship the feature on by default) - dev default: false (avoid accidental routing changes in dev) - env-var override: ENABLE_OPENAI_PASSTHROUGH at deploy time - ECS task env: emit ENABLE_OPENAI_PASSTHROUGH=<value> Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Bedrock-mantle emits Responses API SSE as data-only frames (the event type is embedded as a JSON field but no `event: <type>` line is present). This matches the SSE spec but diverges from real OpenAI servers, which prepend each frame with `event: <type>`. Strict clients like OpenAI Codex CLI key off the `event:` field and report "stream closed before response.completed" when they don't see it. Synthesize `event: <type>` lines from each data frame's JSON `type` field when api_surface == "responses". Chat Completions streams remain unchanged (real OpenAI doesn't use event: lines for that endpoint). Tests: - test_streaming_responses_synthesizes_event_lines_for_data_only_upstream asserts every data: frame is preceded by the matching event: line - test_streaming_chat_completions_does_not_inject_event_lines pins the no-injection contract for chat completions Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…s codes Previously, when client requested stream=true and upstream returned a non-2xx status (e.g. validation 400) or a connection error, the proxy would still return 200 text/event-stream and dump the JSON error body (or a synthetic SSE frame) into the stream. Strict SSE clients like OpenAI Codex CLI then hang waiting for response.completed and report "stream closed before response.completed" — masking the real error. Refactor: split open_upstream_stream() (peeks at status) from stream_passthrough_response() (streams an open 2xx body). The router now: - Returns the real upstream status as JSONResponse when the upstream responds with 4xx/5xx for a streaming request. - Returns 502/504 JSON when the upstream is unreachable (TimeoutException / RequestError) before any bytes flow. - Continues to emit an SSE error+[DONE] frame only for failures that occur AFTER the 2xx stream has begun (where we cannot retroactively change the HTTP status). Tests: - test_streaming_responses_upstream_4xx_returns_json_not_sse - test_streaming_upstream_timeout_returns_json_504 (replaces the prior test that asserted the buggy SSE-error behavior) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
/openai/v1/*endpoints that accept OpenAI-native API requests (Chat Completions, Responses API + full CRUD, /models) and forward them to AWS bedrock-mantle, gated byENABLE_OPENAI_PASSTHROUGH=False.Authorization: Bearer), rate limits, budgets, and usage tracking. Usage is normalized into the existing DynamoDB schema with two new sparse columns (api_surface,reasoning_tokens).ENABLE_OPENAI_COMPAT— both flags can be enabled together. The new endpoints are pure raw-httpx passthrough (no Pydantic schemas for OpenAI types) for forward compatibility with new Mantle features.Endpoints:
/openai/v1/chat/completions(streaming + non-streaming)/openai/v1/responses(streaming + non-streaming)/openai/v1/responses/{id}/openai/v1/responses/{id}/cancel/openai/v1/responses/{id}/input_items/openai/v1/modelsDocumentation: see
docs/plans/2026-05-25-openai-passthrough-design.mdfor the design rationale anddocs/architecture/features.mdfor the user-facing feature doc.Test plan
uv run pytest tests/unit tests/integration/test_openai_passthrough)ENABLE_OPENAI_PASSTHROUGH=True; not present whenFalse(regression test included)Authorization: Bearerandx-api-keyboth accepteddata: {... "usage": ...}chunk when client sendsstream_options: {"include_usage": true}response.completedSSE eventdata: {"error": ...}\n\n[DONE]instead of crashing the streamX-Amzn-Bedrock-*) forwarded to upstreamManual verification recommended before deploy:
OPENAI_API_KEY,OPENAI_BASE_URL) using OpenAI Python SDKprevious_response_idapi_surfaceandreasoning_tokenscolumnsFollow-ups (out of scope)
api_surfacefilter (deferred)