Enhance API compatibility, logging, and Docker build efficiency#40
Enhance API compatibility, logging, and Docker build efficiency#40technowhizz wants to merge 5 commits into
Conversation
Adds a minimal non-streaming /v1/responses endpoint that translates supported Responses API input shapes into the existing chat completions request path. The response is adapted back into the core Responses fields expected by OpenAI-compatible clients, including output message content, output_text, timestamps, status, and usage where available.
Introduces request logging through the application middleware using the existing access_log setting as the source of truth. This keeps HTTP method, path, status, duration, and client details available when access logging is enabled without forcing noisy request logs for installations that have disabled them.
Reorders the Docker build so dependency installation and slow setup steps can be cached independently from application source changes. This reduces rebuild time during local iteration while keeping the final runtime image behavior and startup command unchanged.
Closes the Claude CLI stdin stream explicitly for requests that do not provide stdin data, preventing the CLI from waiting before continuing. This removes the recurring warning and avoids adding avoidable latency to simple non-interactive API calls.
Updates the default model catalog with current Claude Opus and Sonnet aliases while preserving existing canonical model entries. The aliases now resolve to the latest documented model IDs, and the model API tests cover the new defaults and compatibility mappings.
Reviewer's GuideImplements an OpenAI-compatible /v1/responses endpoint by adapting Requests-style inputs to existing chat completions, adds access-log-aware logging configuration and middleware, updates Claude model defaults/aliases, hardens the Docker build for caching and non-root execution, and extends tests for the new API, logging behavior, and subprocess stdin handling. Sequence diagram for OpenAI Responses API request handlingsequenceDiagram
actor Client
participant FastAPIApp
participant ChatRouter
participant create_response
participant ChatCompletion as create_chat_completion
participant ClaudeBackend
Client->>FastAPIApp: POST /v1/responses
FastAPIApp->>ChatRouter: Route request
ChatRouter->>create_response: create_response(ResponsesCreateRequest)
create_response->>create_response: _responses_request_to_chat_request()
create_response->>ChatCompletion: create_chat_completion(ChatCompletionRequest)
ChatCompletion->>ClaudeBackend: Call Claude model
ClaudeBackend-->>ChatCompletion: ChatCompletionResponse or StreamingResponse
alt stream == false
ChatCompletion-->>create_response: ChatCompletionResponse (dict or model)
create_response->>create_response: _chat_response_to_responses_response()
create_response-->>FastAPIApp: ResponsesResponse (JSON)
FastAPIApp-->>Client: 200 OK JSON
else stream == true
ChatCompletion-->>create_response: StreamingResponse (SSE chat stream)
create_response->>create_response: _create_responses_sse_from_chat_stream()
create_response-->>FastAPIApp: StreamingResponse (Responses SSE)
FastAPIApp-->>Client: 200 OK text/event-stream
end
Sequence diagram for HTTP access logging middleware and configurationsequenceDiagram
actor Client
participant Uvicorn
participant FastAPIApp
participant AuthMW as auth_middleware
participant ReqLogMW as request_logging_middleware
participant Endpoint
participant LoggingConfig as configure_logging
participant StructlogLogger as logger
Note over LoggingConfig,StructlogLogger: Startup
LoggingConfig->>StructlogLogger: configure_logging(settings)
StructlogLogger-->>LoggingConfig: Processors respect access_log flag
Note over Client,Endpoint: Per HTTP request
Client->>Uvicorn: HTTP request
Uvicorn->>FastAPIApp: ASGI call
FastAPIApp->>AuthMW: auth_middleware
AuthMW-->>FastAPIApp: Next handler
FastAPIApp->>ReqLogMW: request_logging_middleware
alt settings.access_log is False
ReqLogMW->>Endpoint: call_next(request)
Endpoint-->>ReqLogMW: Response
ReqLogMW-->>FastAPIApp: Response (no access log)
else settings.access_log is True
ReqLogMW->>ReqLogMW: Measure duration
ReqLogMW->>Endpoint: call_next(request)
Endpoint-->>ReqLogMW: Response
ReqLogMW->>StructlogLogger: logger.info("HTTP request", access_log=True, ...)
ReqLogMW-->>FastAPIApp: Response
end
FastAPIApp-->>Uvicorn: Response
Uvicorn-->>Client: HTTP response
Class diagram for new OpenAI Responses API modelsclassDiagram
class ResponsesCreateRequest {
+str model
+str|List~Any~ input
+float temperature
+int max_output_tokens
+bool stream
+str instructions
+str project_id
+str session_id
}
class ResponsesOutputText {
+str type = "output_text"
+str text
+List~Any~ annotations
}
class ResponsesOutputMessage {
+str id
+str type = "message"
+str status = "completed"
+str role = "assistant"
+List~ResponsesOutputText~ content
}
class ResponsesUsage {
+int input_tokens
+int output_tokens
+int total_tokens
}
class ResponsesResponse {
+str id
+str object = "response"
+int created_at
+str status = "completed"
+int completed_at
+Dict~str,Any~ error
+Dict~str,Any~ incomplete_details
+str instructions
+int max_output_tokens
+str model
+List~ResponsesOutputMessage~ output
+str output_text
+ResponsesUsage usage
}
class ChatCompletionRequest {
+str model
+List~Dict~ messages
+float temperature
+int max_tokens
+bool stream
+str project_id
+str session_id
+str system_prompt
}
ResponsesResponse *-- ResponsesUsage : usage
ResponsesResponse *-- ResponsesOutputMessage : output
ResponsesOutputMessage *-- ResponsesOutputText : content
ResponsesCreateRequest --> ChatCompletionRequest : converted_to
ChatCompletionRequest --> ResponsesResponse : adapted_from_chat
File-Level Changes
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
Review Summary by QodoAdd Responses API support, access logging, and Docker build optimizations
WalkthroughsDescription• Add OpenAI Responses API endpoint with streaming and non-streaming support - Converts Responses API input shapes to chat completions format - Transforms chat responses back to Responses API format with proper event streaming - Supports text blocks, message arrays, and developer role mapping • Implement configurable HTTP access logging middleware - New access_log setting enables detailed request/response logging - Logs HTTP method, path, status, duration, and client details - Automatically maintains INFO log level when access logging enabled • Improve Docker build cache efficiency and best practices - Reorder build stages to cache dependencies separately from source - Add .dockerignore file to exclude unnecessary files - Use build cache mounts for pip and apt package managers - Implement non-root user with explicit UID/GID configuration • Update Claude model defaults and add new model aliases - Set default model to claude-sonnet-4-6 - Add claude-opus-4-7 and claude-sonnet-4-6 model entries - Update aliases to resolve to latest documented model IDs • Fix Claude CLI stdin handling to prevent wait warnings - Change stdin from PIPE to DEVNULL for non-interactive requests - Eliminates recurring CLI warnings and reduces latency Diagramflowchart LR
A["Responses API Request"] -->|"Convert input to chat format"| B["Chat Completions"]
B -->|"Stream or non-stream"| C["Chat Response"]
C -->|"Transform to Responses format"| D["Responses API Response"]
E["HTTP Request"] -->|"Log with middleware"| F["Access Log"]
G["Docker Build"] -->|"Cache dependencies"| H["Faster Rebuilds"]
I["Model Config"] -->|"Update defaults"| J["Latest Claude Models"]
File Changes1. claude_code_api/api/chat.py
|
Code Review by Qodo
1. Docker cache UID mismatch
|
There was a problem hiding this comment.
Hey - I've found 1 issue, and left some high level feedback:
- The Responses API conversion and streaming helpers added to
api/chat.pyare quite substantial; consider moving them into a dedicated module (e.g.api/responses.pyor a helper module) to keepchat.pyfocused and easier to navigate. - In the updated Dockerfile you removed the
rm -rf /var/lib/apt/lists/*cleanup step afterapt-get install; reintroducing this (or an equivalent cleanup) will help keep the final image size smaller.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- The Responses API conversion and streaming helpers added to `api/chat.py` are quite substantial; consider moving them into a dedicated module (e.g. `api/responses.py` or a helper module) to keep `chat.py` focused and easier to navigate.
- In the updated Dockerfile you removed the `rm -rf /var/lib/apt/lists/*` cleanup step after `apt-get install`; reintroducing this (or an equivalent cleanup) will help keep the final image size smaller.
## Individual Comments
### Comment 1
<location path="claude_code_api/api/chat.py" line_range="489-450" />
<code_context>
+ except json.JSONDecodeError:
+ continue
+
+ if "error" in chunk:
+ yield _responses_stream_event(
+ "response.failed", {"response": {"id": response_id, **chunk}}
+ )
</code_context>
<issue_to_address>
**issue (bug_risk):** Error chunks in streaming path overwrite the generated response_id, which makes the emitted response ID inconsistent with earlier events.
In the error branch, the payload is built as `{"response": {"id": response_id, **chunk}}`. If `chunk` already has an `id`, it will override `response_id`, so `response.created` and `response.failed` can emit different IDs. Please either construct the response object explicitly (copy only needed fields from `chunk`) or ensure `response_id` always wins (e.g. `{**chunk, "id": response_id}`) to keep IDs consistent across events.
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
| output_parts: List[str] = [] | ||
| content_started = False | ||
|
|
||
| yield _responses_stream_event( |
There was a problem hiding this comment.
issue (bug_risk): Error chunks in streaming path overwrite the generated response_id, which makes the emitted response ID inconsistent with earlier events.
In the error branch, the payload is built as {"response": {"id": response_id, **chunk}}. If chunk already has an id, it will override response_id, so response.created and response.failed can emit different IDs. Please either construct the response object explicitly (copy only needed fields from chunk) or ensure response_id always wins (e.g. {**chunk, "id": response_id}) to keep IDs consistent across events.
There was a problem hiding this comment.
Code Review
This pull request implements a new OpenAI-compatible 'Responses' API endpoint with support for SSE streaming, updates model configurations to include Claude Sonnet 4.6 and Opus 4.7, and adds request access logging middleware. Additionally, the Dockerfile was refactored to utilize buildkit cache mounts for faster builds. Review feedback points out that changing the subprocess stdin to DEVNULL breaks interactive features and identifies a potential character encoding issue in the streaming logic that requires an incremental decoder.
| stdout=asyncio.subprocess.PIPE, | ||
| stderr=asyncio.subprocess.PIPE, | ||
| stdin=asyncio.subprocess.PIPE, | ||
| stdin=asyncio.subprocess.DEVNULL, |
There was a problem hiding this comment.
Changing stdin to asyncio.subprocess.DEVNULL will break the send_input method (line 252) and the continue_conversation functionality (line 559). When stdin is set to DEVNULL, self.process.stdin becomes None, causing send_input to skip writing any data. If the intention is to support interactive sessions or continuing conversations within the same process, asyncio.subprocess.PIPE must be used. If interaction is truly not intended, the related dead code should be removed to avoid confusion.
| stdin=asyncio.subprocess.DEVNULL, | |
| stdin=asyncio.subprocess.PIPE, |
| async def _iter_sse_events(body_iterator: Any) -> AsyncGenerator[str, None]: | ||
| buffer = "" | ||
| async for chunk in body_iterator: | ||
| if isinstance(chunk, bytes): | ||
| buffer += chunk.decode("utf-8") | ||
| else: | ||
| buffer += str(chunk) |
There was a problem hiding this comment.
Decoding bytes directly from a stream chunk can lead to a UnicodeDecodeError if a multi-byte UTF-8 character (such as an emoji) is split across chunks. It is safer to use codecs.IncrementalDecoder to handle partial characters correctly.
| async def _iter_sse_events(body_iterator: Any) -> AsyncGenerator[str, None]: | |
| buffer = "" | |
| async for chunk in body_iterator: | |
| if isinstance(chunk, bytes): | |
| buffer += chunk.decode("utf-8") | |
| else: | |
| buffer += str(chunk) | |
| async def _iter_sse_events(body_iterator: Any) -> AsyncGenerator[str, None]: | |
| import codecs | |
| decoder = codecs.getincrementaldecoder("utf-8")() | |
| buffer = "" | |
| async for chunk in body_iterator: | |
| if isinstance(chunk, bytes): | |
| buffer += decoder.decode(chunk, final=False) | |
| else: | |
| buffer += str(chunk) |
| RUN --mount=type=cache,id=claude-api-pip-cache,target=/home/claudeuser/.cache/pip,uid=1001,gid=1001,mode=0775 \ | ||
| pip install --upgrade pip setuptools wheel |
There was a problem hiding this comment.
1. Docker cache uid mismatch 🐞 Bug ☼ Reliability
docker/Dockerfile defines APP_UID/APP_GID but the pip cache mounts are hard-coded to uid/gid 1001, so overriding APP_UID/APP_GID can cause permission errors writing to /home/claudeuser/.cache/pip and fail the Docker build.
Agent Prompt
### Issue description
`docker/Dockerfile` introduces `APP_UID`/`APP_GID`, but the BuildKit cache mounts for pip still use `uid=1001,gid=1001`. If a builder overrides these args, the container user and the cache directory ownership can diverge and pip will fail with permission errors.
### Issue Context
This occurs in all `--mount=type=cache,...target=/home/claudeuser/.cache/pip,...` layers.
### Fix Focus Areas
- docker/Dockerfile[5-7]
- docker/Dockerfile[43-44]
- docker/Dockerfile[48-50]
- docker/Dockerfile[127-128]
### Suggested change
Update every cache mount to use the build args:
- `uid=${APP_UID},gid=${APP_GID}` (or drop uid/gid entirely if you want BuildKit defaults), so the mount ownership matches the created user when args are overridden.
ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools
| access_log_enabled = bool(getattr(settings, "access_log", False)) | ||
|
|
||
| if access_log_enabled and log_level > logging.INFO: | ||
| log_level = logging.INFO |
There was a problem hiding this comment.
2. Access log forces global info 🐞 Bug ◔ Observability
configure_logging() overrides the configured log_level by forcing the root logger down to INFO when access_log is enabled, which can unintentionally enable unrelated stdlib INFO logs and, combined with uvicorn’s access_log=true, can duplicate request access logs.
Agent Prompt
### Issue description
Enabling `settings.access_log` currently forces the **global** root logger level/handlers to `INFO`, overriding the configured `log_level` and potentially enabling unrelated third-party INFO logs. In addition, `uvicorn.run(..., access_log=settings.access_log)` can produce **duplicate** per-request logs because the app also logs requests via `request_logging_middleware`.
### Issue Context
- `configure_logging()` sets `log_level = INFO` when `access_log` is enabled.
- `main.py` also enables uvicorn access logs while emitting custom access logs.
### Fix Focus Areas
- claude_code_api/core/logging_config.py[123-175]
- claude_code_api/main.py[125-145]
- claude_code_api/main.py[246-253]
### Suggested fixes (choose one approach)
1) **Prefer structured middleware access logs only**
- Stop enabling uvicorn’s access logs (`uvicorn.run(..., access_log=False)`), keep your middleware log.
- Remove the global `log_level = INFO` override; keep the user’s configured root level.
- If you still need access logs when `log_level` is `ERROR`, emit access logs through a **dedicated logger/handler** configured at INFO (so you don’t need to lower the root/handlers).
2) **If you must keep root at INFO** (less ideal)
- Add filtering so only access logs (and WARNING+) are emitted from stdlib loggers, and avoid double-logging by disabling uvicorn access logs.
Goal: turning on `access_log` should not globally change application log verbosity and should not double-log each request.
ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools
This pull request introduces support for the OpenAI "Responses" API, upgrades the Claude model defaults and aliases, improves logging configurability (including an access log feature), and makes significant enhancements to the Docker build for efficiency and best practices. It also adds new and updated tests to ensure these features work as intended.
OpenAI Responses API Support
/v1/responsesendpoint in the API root. (claude_code_api/models/openai.py,claude_code_api/main.py) [1] [2]Model Configuration Updates
claude-sonnet-4-6and added/updated aliases for new model versions, includingclaude-opus-4-7andclaude-sonnet-4-6. (claude_code_api/config/models.json) [1] [2]Logging and Access Log Improvements
access_logsetting to enable detailed HTTP request logging, including middleware for structured access logs and configuration to ensure logs are emitted at the correct level. (claude_code_api/core/config.py,claude_code_api/core/logging_config.py,claude_code_api/main.py) [1] [2] [3] [4] [5] [6] [7] [8]Docker Build Optimization
docker/Dockerfile,.dockerignore) [1] [2] [3]Testing Enhancements
/dev/nullas intended. (tests/test_logging_config.py,tests/test_claude_manager_unit.py) [1] [2] [3] [4] [5] [6]Summary by Sourcery
Add an OpenAI-compatible /v1/responses endpoint, improve logging configurability including HTTP access logs, update Claude model defaults/aliases, harden subprocess handling, and optimize the Docker image build for performance and best practices.
New Features:
Bug Fixes:
Enhancements:
Build:
Tests: