Skip to content

Enhance API compatibility, logging, and Docker build efficiency#40

Open
technowhizz wants to merge 5 commits into
codingworkflow:mainfrom
technowhizz:main
Open

Enhance API compatibility, logging, and Docker build efficiency#40
technowhizz wants to merge 5 commits into
codingworkflow:mainfrom
technowhizz:main

Conversation

@technowhizz
Copy link
Copy Markdown

@technowhizz technowhizz commented May 10, 2026

This pull request introduces support for the OpenAI "Responses" API, upgrades the Claude model defaults and aliases, improves logging configurability (including an access log feature), and makes significant enhancements to the Docker build for efficiency and best practices. It also adds new and updated tests to ensure these features work as intended.

OpenAI Responses API Support

  • Added models and endpoints for the OpenAI-compatible "Responses" API, including request/response schemas and a new /v1/responses endpoint in the API root. (claude_code_api/models/openai.py, claude_code_api/main.py) [1] [2]

Model Configuration Updates

  • Updated the default Claude model to claude-sonnet-4-6 and added/updated aliases for new model versions, including claude-opus-4-7 and claude-sonnet-4-6. (claude_code_api/config/models.json) [1] [2]

Logging and Access Log Improvements

  • Introduced an access_log setting to enable detailed HTTP request logging, including middleware for structured access logs and configuration to ensure logs are emitted at the correct level. (claude_code_api/core/config.py, claude_code_api/core/logging_config.py, claude_code_api/main.py) [1] [2] [3] [4] [5] [6] [7] [8]

Docker Build Optimization

  • Overhauled the Dockerfile for better caching, non-root user setup, and more efficient dependency installation, including use of build cache mounts and staged file copying. (docker/Dockerfile, .dockerignore) [1] [2] [3]

Testing Enhancements

  • Added and updated unit tests for the new access log functionality and for ensuring subprocesses redirect stdin to /dev/null as intended. (tests/test_logging_config.py, tests/test_claude_manager_unit.py) [1] [2] [3] [4] [5] [6]

Summary by Sourcery

Add an OpenAI-compatible /v1/responses endpoint, improve logging configurability including HTTP access logs, update Claude model defaults/aliases, harden subprocess handling, and optimize the Docker image build for performance and best practices.

New Features:

  • Introduce a minimal OpenAI Responses API implementation built on top of the existing chat completion pipeline, with support for non-streaming and SSE streaming responses.
  • Expose an HTTP access logging middleware controlled by configuration to record structured request metadata.

Bug Fixes:

  • Ensure Claude subprocesses run with stdin redirected to /dev/null to avoid unintended stdin usage.

Enhancements:

  • Expand OpenAPI models to cover Responses API request/response schemas for better schema-driven integration.
  • Map new Claude model IDs and aliases so generic names like opus and sonnet resolve to the latest supported versions.
  • Allow access-log events to bypass minimal log filtering and keep INFO-level logging enabled when access logging is turned on.

Build:

  • Restructure the Dockerfile to use a non-root user, pip cache mounts, and staged dependency installation for faster, more reproducible builds.
  • Add a .dockerignore file to reduce Docker build context size and improve caching.

Tests:

  • Add comprehensive tests for Responses API behavior, including streaming SSE events and OpenAPI schema wiring.
  • Extend logging tests to cover access-log filtering behavior and logging level interactions.
  • Add a unit test to verify the Claude process subprocess is created with stdin redirected to /dev/null.
  • Update model tests to assert availability and aliasing of the latest Claude opus and sonnet models.

Adds a minimal non-streaming /v1/responses endpoint that translates supported Responses API input shapes into the existing chat completions request path. The response is adapted back into the core Responses fields expected by OpenAI-compatible clients, including output message content, output_text, timestamps, status, and usage where available.
Introduces request logging through the application middleware using the existing access_log setting as the source of truth. This keeps HTTP method, path, status, duration, and client details available when access logging is enabled without forcing noisy request logs for installations that have disabled them.
Reorders the Docker build so dependency installation and slow setup steps can be cached independently from application source changes. This reduces rebuild time during local iteration while keeping the final runtime image behavior and startup command unchanged.
Closes the Claude CLI stdin stream explicitly for requests that do not provide stdin data, preventing the CLI from waiting before continuing. This removes the recurring warning and avoids adding avoidable latency to simple non-interactive API calls.
Updates the default model catalog with current Claude Opus and Sonnet aliases while preserving existing canonical model entries. The aliases now resolve to the latest documented model IDs, and the model API tests cover the new defaults and compatibility mappings.
@sourcery-ai
Copy link
Copy Markdown

sourcery-ai Bot commented May 10, 2026

Reviewer's Guide

Implements an OpenAI-compatible /v1/responses endpoint by adapting Requests-style inputs to existing chat completions, adds access-log-aware logging configuration and middleware, updates Claude model defaults/aliases, hardens the Docker build for caching and non-root execution, and extends tests for the new API, logging behavior, and subprocess stdin handling.

Sequence diagram for OpenAI Responses API request handling

sequenceDiagram
    actor Client
    participant FastAPIApp
    participant ChatRouter
    participant create_response
    participant ChatCompletion as create_chat_completion
    participant ClaudeBackend

    Client->>FastAPIApp: POST /v1/responses
    FastAPIApp->>ChatRouter: Route request
    ChatRouter->>create_response: create_response(ResponsesCreateRequest)
    create_response->>create_response: _responses_request_to_chat_request()
    create_response->>ChatCompletion: create_chat_completion(ChatCompletionRequest)
    ChatCompletion->>ClaudeBackend: Call Claude model
    ClaudeBackend-->>ChatCompletion: ChatCompletionResponse or StreamingResponse

    alt stream == false
        ChatCompletion-->>create_response: ChatCompletionResponse (dict or model)
        create_response->>create_response: _chat_response_to_responses_response()
        create_response-->>FastAPIApp: ResponsesResponse (JSON)
        FastAPIApp-->>Client: 200 OK JSON
    else stream == true
        ChatCompletion-->>create_response: StreamingResponse (SSE chat stream)
        create_response->>create_response: _create_responses_sse_from_chat_stream()
        create_response-->>FastAPIApp: StreamingResponse (Responses SSE)
        FastAPIApp-->>Client: 200 OK text/event-stream
    end
Loading

Sequence diagram for HTTP access logging middleware and configuration

sequenceDiagram
    actor Client
    participant Uvicorn
    participant FastAPIApp
    participant AuthMW as auth_middleware
    participant ReqLogMW as request_logging_middleware
    participant Endpoint
    participant LoggingConfig as configure_logging
    participant StructlogLogger as logger

    Note over LoggingConfig,StructlogLogger: Startup
    LoggingConfig->>StructlogLogger: configure_logging(settings)
    StructlogLogger-->>LoggingConfig: Processors respect access_log flag

    Note over Client,Endpoint: Per HTTP request
    Client->>Uvicorn: HTTP request
    Uvicorn->>FastAPIApp: ASGI call
    FastAPIApp->>AuthMW: auth_middleware
    AuthMW-->>FastAPIApp: Next handler
    FastAPIApp->>ReqLogMW: request_logging_middleware

    alt settings.access_log is False
        ReqLogMW->>Endpoint: call_next(request)
        Endpoint-->>ReqLogMW: Response
        ReqLogMW-->>FastAPIApp: Response (no access log)
    else settings.access_log is True
        ReqLogMW->>ReqLogMW: Measure duration
        ReqLogMW->>Endpoint: call_next(request)
        Endpoint-->>ReqLogMW: Response
        ReqLogMW->>StructlogLogger: logger.info("HTTP request", access_log=True, ...)
        ReqLogMW-->>FastAPIApp: Response
    end

    FastAPIApp-->>Uvicorn: Response
    Uvicorn-->>Client: HTTP response
Loading

Class diagram for new OpenAI Responses API models

classDiagram
    class ResponsesCreateRequest {
        +str model
        +str|List~Any~ input
        +float temperature
        +int max_output_tokens
        +bool stream
        +str instructions
        +str project_id
        +str session_id
    }

    class ResponsesOutputText {
        +str type = "output_text"
        +str text
        +List~Any~ annotations
    }

    class ResponsesOutputMessage {
        +str id
        +str type = "message"
        +str status = "completed"
        +str role = "assistant"
        +List~ResponsesOutputText~ content
    }

    class ResponsesUsage {
        +int input_tokens
        +int output_tokens
        +int total_tokens
    }

    class ResponsesResponse {
        +str id
        +str object = "response"
        +int created_at
        +str status = "completed"
        +int completed_at
        +Dict~str,Any~ error
        +Dict~str,Any~ incomplete_details
        +str instructions
        +int max_output_tokens
        +str model
        +List~ResponsesOutputMessage~ output
        +str output_text
        +ResponsesUsage usage
    }

    class ChatCompletionRequest {
        +str model
        +List~Dict~ messages
        +float temperature
        +int max_tokens
        +bool stream
        +str project_id
        +str session_id
        +str system_prompt
    }

    ResponsesResponse *-- ResponsesUsage : usage
    ResponsesResponse *-- ResponsesOutputMessage : output
    ResponsesOutputMessage *-- ResponsesOutputText : content

    ResponsesCreateRequest --> ChatCompletionRequest : converted_to
    ChatCompletionRequest --> ResponsesResponse : adapted_from_chat
Loading

File-Level Changes

Change Details Files
Add minimal OpenAI Responses API support on top of existing chat completions, including streaming over SSE.
  • Introduce ResponsesCreateRequest/ResponsesResponse models and related types to represent the minimal Responses API schema.
  • Map Responses-style input (string or message objects) into internal ChatCompletionRequest messages, with validation and coercion helpers for roles and content.
  • Wrap non-streaming chat completion responses into ResponsesResponse objects, including usage mapping and ID/timestamp generation.
  • Transform streaming chat completion SSE into a sequence of Responses events, handling creation, deltas, completion, and error cases, and expose this via a new /v1/responses route and OpenAPI schema.
claude_code_api/api/chat.py
claude_code_api/models/openai.py
claude_code_api/main.py
tests/test_responses_api.py
tests/test_models_unit.py
Enhance logging configuration to support optional structured HTTP access logging while preserving minimal logging in non-debug mode.
  • Extend settings with an access_log flag and plumb it into uvicorn.run and logging configuration.
  • Update logging_config to keep INFO level enabled when access_log is on, adjust uvicorn.access logger level accordingly, and allow access_log-tagged events through the minimal event filter.
  • Add an HTTP middleware that conditionally logs per-request metadata (method, path, status, latency, client) as structured access_log events.
  • Add tests to verify access-log filtering behavior, root/uvicorn logger levels, and middleware logging semantics.
claude_code_api/core/config.py
claude_code_api/core/logging_config.py
claude_code_api/main.py
tests/test_logging_config.py
tests/test_request_logging.py
Optimize Docker image build for caching, non-root execution, and dependency installation best practices.
  • Switch to Dockerfile v1.7 syntax and introduce APP_UID/APP_GID args plus VIRTUAL_ENV/PATH env configuration.
  • Refine apt-get installation with cache mounts and --no-install-recommends, and create a dedicated claudeuser user/group and application directories with proper ownership.
  • Pre-create the virtualenv, cache pip downloads, and install runtime dependencies from pyproject.toml before copying the full source to maximize layer re-use.
  • Copy project metadata and then the application package separately, finishing with a pip install --no-deps --no-build-isolation . under the non-root user, and add a .dockerignore for cleaner build context.
docker/Dockerfile
.dockerignore
Update Claude model configuration defaults and aliases to reflect latest Opus and Sonnet versions, and extend tests accordingly.
  • Change the default Claude model to claude-sonnet-4-6 and add entries for claude-opus-4-7 and claude-sonnet-4-6 in the models configuration.
  • Adjust alias resolution so generic names like opus, sonnet, and claude-sonnet-latest resolve to the new canonical IDs.
  • Extend unit tests to assert availability of the latest models and correct behavior of alias resolution, including fallback behavior for older aliases.
claude_code_api/config/models.json
tests/test_models_unit.py
Align subprocess handling and tests with desired stdin behavior for Claude process management.
  • Change ClaudeProcess startup to redirect stdin to asyncio.subprocess.DEVNULL instead of PIPE to avoid hanging on input.
  • Add a unit test that monkeypatches asyncio.create_subprocess_exec and verifies stdin is set to DEVNULL, while still exercising startup and shutdown paths.
claude_code_api/core/claude_manager.py
tests/test_claude_manager_unit.py

Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it. You can also reply to a
    review comment with @sourcery-ai issue to create an issue from it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time. You can also comment
    @sourcery-ai title on the pull request to (re-)generate the title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time exactly where you
    want it. You can also comment @sourcery-ai summary on the pull request to
    (re-)generate the summary at any time.
  • Generate reviewer's guide: Comment @sourcery-ai guide on the pull
    request to (re-)generate the reviewer's guide at any time.
  • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
    pull request to resolve all Sourcery comments. Useful if you've already
    addressed all the comments and don't want to see them anymore.
  • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
    request to dismiss all existing Sourcery reviews. Especially useful if you
    want to start fresh with a new review - don't forget to comment
    @sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

@qodo-free-for-open-source-projects
Copy link
Copy Markdown

Review Summary by Qodo

Add Responses API support, access logging, and Docker build optimizations

✨ Enhancement 🐞 Bug fix

Grey Divider

Walkthroughs

Description
• Add OpenAI Responses API endpoint with streaming and non-streaming support
  - Converts Responses API input shapes to chat completions format
  - Transforms chat responses back to Responses API format with proper event streaming
  - Supports text blocks, message arrays, and developer role mapping
• Implement configurable HTTP access logging middleware
  - New access_log setting enables detailed request/response logging
  - Logs HTTP method, path, status, duration, and client details
  - Automatically maintains INFO log level when access logging enabled
• Improve Docker build cache efficiency and best practices
  - Reorder build stages to cache dependencies separately from source
  - Add .dockerignore file to exclude unnecessary files
  - Use build cache mounts for pip and apt package managers
  - Implement non-root user with explicit UID/GID configuration
• Update Claude model defaults and add new model aliases
  - Set default model to claude-sonnet-4-6
  - Add claude-opus-4-7 and claude-sonnet-4-6 model entries
  - Update aliases to resolve to latest documented model IDs
• Fix Claude CLI stdin handling to prevent wait warnings
  - Change stdin from PIPE to DEVNULL for non-interactive requests
  - Eliminates recurring CLI warnings and reduces latency
Diagram
flowchart LR
  A["Responses API Request"] -->|"Convert input to chat format"| B["Chat Completions"]
  B -->|"Stream or non-stream"| C["Chat Response"]
  C -->|"Transform to Responses format"| D["Responses API Response"]
  E["HTTP Request"] -->|"Log with middleware"| F["Access Log"]
  G["Docker Build"] -->|"Cache dependencies"| H["Faster Rebuilds"]
  I["Model Config"] -->|"Update defaults"| J["Latest Claude Models"]
Loading

Grey Divider

File Changes

1. claude_code_api/api/chat.py ✨ Enhancement +547/-1

Add Responses API endpoint with full streaming support

claude_code_api/api/chat.py


2. claude_code_api/models/openai.py ✨ Enhancement +87/-0

Add Responses API request and response models

claude_code_api/models/openai.py


3. claude_code_api/main.py ✨ Enhancement +25/-0

Add HTTP request logging middleware and Responses endpoint

claude_code_api/main.py


View more (11)
4. claude_code_api/core/config.py ⚙️ Configuration changes +1/-0

Add access_log configuration setting

claude_code_api/core/config.py


5. claude_code_api/core/logging_config.py ✨ Enhancement +22/-5

Support access log filtering and log level management

claude_code_api/core/logging_config.py


6. claude_code_api/core/claude_manager.py 🐞 Bug fix +1/-1

Change stdin to DEVNULL to prevent CLI warnings

claude_code_api/core/claude_manager.py


7. claude_code_api/config/models.json ⚙️ Configuration changes +34/-5

Update default model and add new model aliases

claude_code_api/config/models.json


8. docker/Dockerfile ✨ Enhancement +40/-13

Optimize build cache and improve best practices

docker/Dockerfile


9. .dockerignore ⚙️ Configuration changes +34/-0

Add Docker ignore file for build efficiency

.dockerignore


10. tests/test_responses_api.py 🧪 Tests +155/-0

Add comprehensive Responses API endpoint tests

tests/test_responses_api.py


11. tests/test_request_logging.py 🧪 Tests +61/-0

Add HTTP request logging middleware tests

tests/test_request_logging.py


12. tests/test_claude_manager_unit.py 🧪 Tests +38/-0

Add test for stdin DEVNULL redirection

tests/test_claude_manager_unit.py


13. tests/test_logging_config.py 🧪 Tests +40/-0

Add tests for access log filtering and log level

tests/test_logging_config.py


14. tests/test_models_unit.py 🧪 Tests +14/-3

Update model tests for new defaults and aliases

tests/test_models_unit.py


Grey Divider

Qodo Logo

@qodo-free-for-open-source-projects
Copy link
Copy Markdown

qodo-free-for-open-source-projects Bot commented May 10, 2026

Code Review by Qodo

🐞 Bugs (3) 📘 Rule violations (0)

Grey Divider


Action required

1. Docker cache UID mismatch 🐞 Bug ☼ Reliability
Description
docker/Dockerfile defines APP_UID/APP_GID but the pip cache mounts are hard-coded to uid/gid 1001,
so overriding APP_UID/APP_GID can cause permission errors writing to /home/claudeuser/.cache/pip and
fail the Docker build.
Code

docker/Dockerfile[R43-44]

+RUN --mount=type=cache,id=claude-api-pip-cache,target=/home/claudeuser/.cache/pip,uid=1001,gid=1001,mode=0775 \
+    pip install --upgrade pip setuptools wheel
Evidence
The image user is created from APP_UID/APP_GID, but the BuildKit cache mount ownership stays fixed
at 1001:1001 across all pip install layers; when APP_UID/APP_GID differ, the non-root user may not
be able to write to the cache directory.

docker/Dockerfile[5-30]
docker/Dockerfile[43-50]
docker/Dockerfile[125-129]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

### Issue description
`docker/Dockerfile` introduces `APP_UID`/`APP_GID`, but the BuildKit cache mounts for pip still use `uid=1001,gid=1001`. If a builder overrides these args, the container user and the cache directory ownership can diverge and pip will fail with permission errors.

### Issue Context
This occurs in all `--mount=type=cache,...target=/home/claudeuser/.cache/pip,...` layers.

### Fix Focus Areas
- docker/Dockerfile[5-7]
- docker/Dockerfile[43-44]
- docker/Dockerfile[48-50]
- docker/Dockerfile[127-128]

### Suggested change
Update every cache mount to use the build args:
- `uid=${APP_UID},gid=${APP_GID}` (or drop uid/gid entirely if you want BuildKit defaults), so the mount ownership matches the created user when args are overridden.

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


2. Access log forces global INFO 🐞 Bug ◔ Observability
Description
configure_logging() overrides the configured log_level by forcing the root logger down to INFO when
access_log is enabled, which can unintentionally enable unrelated stdlib INFO logs and, combined
with uvicorn’s access_log=true, can duplicate request access logs.
Code

claude_code_api/core/logging_config.py[R134-137]

+    access_log_enabled = bool(getattr(settings, "access_log", False))
+
+    if access_log_enabled and log_level > logging.INFO:
+        log_level = logging.INFO
Evidence
When access_log is enabled, logging_config sets the root logger level (and all handler levels) to
INFO even if the user requested ERROR, and main.py also enables uvicorn’s built-in access logs while
the app emits its own structured access log event per request.

claude_code_api/core/logging_config.py[123-170]
claude_code_api/main.py[125-145]
claude_code_api/main.py[243-253]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

### Issue description
Enabling `settings.access_log` currently forces the **global** root logger level/handlers to `INFO`, overriding the configured `log_level` and potentially enabling unrelated third-party INFO logs. In addition, `uvicorn.run(..., access_log=settings.access_log)` can produce **duplicate** per-request logs because the app also logs requests via `request_logging_middleware`.

### Issue Context
- `configure_logging()` sets `log_level = INFO` when `access_log` is enabled.
- `main.py` also enables uvicorn access logs while emitting custom access logs.

### Fix Focus Areas
- claude_code_api/core/logging_config.py[123-175]
- claude_code_api/main.py[125-145]
- claude_code_api/main.py[246-253]

### Suggested fixes (choose one approach)
1) **Prefer structured middleware access logs only**
  - Stop enabling uvicorn’s access logs (`uvicorn.run(..., access_log=False)`), keep your middleware log.
  - Remove the global `log_level = INFO` override; keep the user’s configured root level.
  - If you still need access logs when `log_level` is `ERROR`, emit access logs through a **dedicated logger/handler** configured at INFO (so you don’t need to lower the root/handlers).

2) **If you must keep root at INFO** (less ideal)
  - Add filtering so only access logs (and WARNING+) are emitted from stdlib loggers, and avoid double-logging by disabling uvicorn access logs.

Goal: turning on `access_log` should not globally change application log verbosity and should not double-log each request.

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools



Remediation recommended

3. Access logs missing on errors 🐞 Bug ◔ Observability
Description
request_logging_middleware() logs only after call_next() returns; if call_next raises, no access log
entry is emitted for that failing request.
Code

claude_code_api/main.py[R125-144]

+@app.middleware("http")
+async def request_logging_middleware(request, call_next):
+    if not settings.access_log:
+        return await call_next(request)
+
+    start = time.perf_counter()
+    response = await call_next(request)
+    duration_ms = round((time.perf_counter() - start) * 1000, 2)
+
+    logger.info(
+        "HTTP request",
+        access_log=True,
+        method=request.method,
+        path=request.url.path,
+        status_code=response.status_code,
+        duration_ms=duration_ms,
+        client_host=request.client.host if request.client else None,
+    )
+
+    return response
Evidence
The middleware has no try/finally around call_next(), so any exception path skips the
logger.info(...) call and the request is absent from access logs.

claude_code_api/main.py[125-145]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

### Issue description
`request_logging_middleware` only logs after `response = await call_next(request)` succeeds. If request handling raises, the function exits without emitting an access log.

### Issue Context
This creates observability gaps specifically for failing requests (often the most important ones to track).

### Fix Focus Areas
- claude_code_api/main.py[125-145]

### Suggested change
Wrap `call_next` so logging always occurs:
- Start timer.
- `status_code = 500`
- `try: response = await call_next(request); status_code = response.status_code; return response`
- `except Exception: raise`
- `finally: logger.info("HTTP request", ..., status_code=status_code, duration_ms=..., ...)`

This ensures a log line is emitted even when the request fails.

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


Grey Divider

Qodo Logo

Copy link
Copy Markdown

@sourcery-ai sourcery-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey - I've found 1 issue, and left some high level feedback:

  • The Responses API conversion and streaming helpers added to api/chat.py are quite substantial; consider moving them into a dedicated module (e.g. api/responses.py or a helper module) to keep chat.py focused and easier to navigate.
  • In the updated Dockerfile you removed the rm -rf /var/lib/apt/lists/* cleanup step after apt-get install; reintroducing this (or an equivalent cleanup) will help keep the final image size smaller.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- The Responses API conversion and streaming helpers added to `api/chat.py` are quite substantial; consider moving them into a dedicated module (e.g. `api/responses.py` or a helper module) to keep `chat.py` focused and easier to navigate.
- In the updated Dockerfile you removed the `rm -rf /var/lib/apt/lists/*` cleanup step after `apt-get install`; reintroducing this (or an equivalent cleanup) will help keep the final image size smaller.

## Individual Comments

### Comment 1
<location path="claude_code_api/api/chat.py" line_range="489-450" />
<code_context>
+            except json.JSONDecodeError:
+                continue
+
+            if "error" in chunk:
+                yield _responses_stream_event(
+                    "response.failed", {"response": {"id": response_id, **chunk}}
+                )
</code_context>
<issue_to_address>
**issue (bug_risk):** Error chunks in streaming path overwrite the generated response_id, which makes the emitted response ID inconsistent with earlier events.

In the error branch, the payload is built as `{"response": {"id": response_id, **chunk}}`. If `chunk` already has an `id`, it will override `response_id`, so `response.created` and `response.failed` can emit different IDs. Please either construct the response object explicitly (copy only needed fields from `chunk`) or ensure `response_id` always wins (e.g. `{**chunk, "id": response_id}`) to keep IDs consistent across events.
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

output_parts: List[str] = []
content_started = False

yield _responses_stream_event(
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue (bug_risk): Error chunks in streaming path overwrite the generated response_id, which makes the emitted response ID inconsistent with earlier events.

In the error branch, the payload is built as {"response": {"id": response_id, **chunk}}. If chunk already has an id, it will override response_id, so response.created and response.failed can emit different IDs. Please either construct the response object explicitly (copy only needed fields from chunk) or ensure response_id always wins (e.g. {**chunk, "id": response_id}) to keep IDs consistent across events.

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request implements a new OpenAI-compatible 'Responses' API endpoint with support for SSE streaming, updates model configurations to include Claude Sonnet 4.6 and Opus 4.7, and adds request access logging middleware. Additionally, the Dockerfile was refactored to utilize buildkit cache mounts for faster builds. Review feedback points out that changing the subprocess stdin to DEVNULL breaks interactive features and identifies a potential character encoding issue in the streaming logic that requires an incremental decoder.

stdout=asyncio.subprocess.PIPE,
stderr=asyncio.subprocess.PIPE,
stdin=asyncio.subprocess.PIPE,
stdin=asyncio.subprocess.DEVNULL,
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

Changing stdin to asyncio.subprocess.DEVNULL will break the send_input method (line 252) and the continue_conversation functionality (line 559). When stdin is set to DEVNULL, self.process.stdin becomes None, causing send_input to skip writing any data. If the intention is to support interactive sessions or continuing conversations within the same process, asyncio.subprocess.PIPE must be used. If interaction is truly not intended, the related dead code should be removed to avoid confusion.

Suggested change
stdin=asyncio.subprocess.DEVNULL,
stdin=asyncio.subprocess.PIPE,

Comment on lines +368 to +374
async def _iter_sse_events(body_iterator: Any) -> AsyncGenerator[str, None]:
buffer = ""
async for chunk in body_iterator:
if isinstance(chunk, bytes):
buffer += chunk.decode("utf-8")
else:
buffer += str(chunk)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Decoding bytes directly from a stream chunk can lead to a UnicodeDecodeError if a multi-byte UTF-8 character (such as an emoji) is split across chunks. It is safer to use codecs.IncrementalDecoder to handle partial characters correctly.

Suggested change
async def _iter_sse_events(body_iterator: Any) -> AsyncGenerator[str, None]:
buffer = ""
async for chunk in body_iterator:
if isinstance(chunk, bytes):
buffer += chunk.decode("utf-8")
else:
buffer += str(chunk)
async def _iter_sse_events(body_iterator: Any) -> AsyncGenerator[str, None]:
import codecs
decoder = codecs.getincrementaldecoder("utf-8")()
buffer = ""
async for chunk in body_iterator:
if isinstance(chunk, bytes):
buffer += decoder.decode(chunk, final=False)
else:
buffer += str(chunk)

Comment thread docker/Dockerfile
Comment on lines +43 to +44
RUN --mount=type=cache,id=claude-api-pip-cache,target=/home/claudeuser/.cache/pip,uid=1001,gid=1001,mode=0775 \
pip install --upgrade pip setuptools wheel
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Action required

1. Docker cache uid mismatch 🐞 Bug ☼ Reliability

docker/Dockerfile defines APP_UID/APP_GID but the pip cache mounts are hard-coded to uid/gid 1001,
so overriding APP_UID/APP_GID can cause permission errors writing to /home/claudeuser/.cache/pip and
fail the Docker build.
Agent Prompt
### Issue description
`docker/Dockerfile` introduces `APP_UID`/`APP_GID`, but the BuildKit cache mounts for pip still use `uid=1001,gid=1001`. If a builder overrides these args, the container user and the cache directory ownership can diverge and pip will fail with permission errors.

### Issue Context
This occurs in all `--mount=type=cache,...target=/home/claudeuser/.cache/pip,...` layers.

### Fix Focus Areas
- docker/Dockerfile[5-7]
- docker/Dockerfile[43-44]
- docker/Dockerfile[48-50]
- docker/Dockerfile[127-128]

### Suggested change
Update every cache mount to use the build args:
- `uid=${APP_UID},gid=${APP_GID}` (or drop uid/gid entirely if you want BuildKit defaults), so the mount ownership matches the created user when args are overridden.

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools

Comment on lines +134 to +137
access_log_enabled = bool(getattr(settings, "access_log", False))

if access_log_enabled and log_level > logging.INFO:
log_level = logging.INFO
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Action required

2. Access log forces global info 🐞 Bug ◔ Observability

configure_logging() overrides the configured log_level by forcing the root logger down to INFO when
access_log is enabled, which can unintentionally enable unrelated stdlib INFO logs and, combined
with uvicorn’s access_log=true, can duplicate request access logs.
Agent Prompt
### Issue description
Enabling `settings.access_log` currently forces the **global** root logger level/handlers to `INFO`, overriding the configured `log_level` and potentially enabling unrelated third-party INFO logs. In addition, `uvicorn.run(..., access_log=settings.access_log)` can produce **duplicate** per-request logs because the app also logs requests via `request_logging_middleware`.

### Issue Context
- `configure_logging()` sets `log_level = INFO` when `access_log` is enabled.
- `main.py` also enables uvicorn access logs while emitting custom access logs.

### Fix Focus Areas
- claude_code_api/core/logging_config.py[123-175]
- claude_code_api/main.py[125-145]
- claude_code_api/main.py[246-253]

### Suggested fixes (choose one approach)
1) **Prefer structured middleware access logs only**
   - Stop enabling uvicorn’s access logs (`uvicorn.run(..., access_log=False)`), keep your middleware log.
   - Remove the global `log_level = INFO` override; keep the user’s configured root level.
   - If you still need access logs when `log_level` is `ERROR`, emit access logs through a **dedicated logger/handler** configured at INFO (so you don’t need to lower the root/handlers).

2) **If you must keep root at INFO** (less ideal)
   - Add filtering so only access logs (and WARNING+) are emitted from stdlib loggers, and avoid double-logging by disabling uvicorn access logs.

Goal: turning on `access_log` should not globally change application log verbosity and should not double-log each request.

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant