fix(proxy): pre-validate strict function tool schemas#658
Conversation
Strict-mode JSON schema violations on a function tool (e.g. `strict:true` with no `additionalProperties:false`) reached the upstream Codex backend, which closed the WebSocket with `close_code=1000` and surfaced as a generic `502 upstream_rejected_input`. Real OpenAI returns deterministic `400 invalid_function_parameters` for the same payload — and a 5xx on a permanently-broken request triggers retry/failover loops in clients. Two parts: 1. Add `validate_strict_function_tool_schema` + `enforce_strict_function_tools_format` mirroring the existing `text.format.json_schema` strict pre-validator. Wired into both `/v1/responses` and `/v1/chat/completions` handlers plus `normalize_responses_request_payload` for the WS path. Param shape matches the source endpoint: native responses callers see `tools[i].parameters`; chat-completions callers see `tools[i].function.parameters`. 2. Stop dropping the function-tool `strict` field in `_normalize_chat_tools`. Previously chat-completions silently masked strict tool schema violations (200 + tool_calls), diverging from both real OpenAI and `/v1/responses`. With the flag preserved, the chat path enters the same strict pre-validation pipeline. Side note: `scripts/verify_v1_responses_openai_sdk.py::case_tool_call_streaming` was sending a strict tool schema without `additionalProperties:false`, matching the new 400. Fix the script to use a spec-compliant strict schema — verify suite is now deterministic 5/5. Tests: 16 new unit cases in `test_strict_schema_validation.py` (validator + enforce + chat coercion preservation) and 2 new integration cases covering both endpoints' 400 envelope shape. Validation: - openspec validate validate-strict-function-tool-schema → "is valid" - targeted sweep: 241 passed - SDK e2e: 20 passed - uvx ruff check / ruff format --check / uv run ty check: all green - live four-cell matrix on `codex-lb-dev-server` with a real account: 8/8 PASS (A: 400 + correct code/param both endpoints; B/C/D: 200) - verify_v1_responses_openai_sdk.py vs the dev container: 5/5 PASS (was 4/5 due to the script's own spec-violating payload)
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: f7ebcfb198
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| """ | ||
| if not request.tools: | ||
| return | ||
| for index, tool in enumerate(request.tools): |
There was a problem hiding this comment.
Use original tool index in chat strict-schema errors
enforce_strict_function_tools_format builds param from the index in request.tools, but for chat-completions that list has already been normalized by _normalize_chat_tools, which can drop entries (for example function tools with missing/empty names). In that case a strict-schema violation is reported against tools[i].function.parameters with a shifted i, so clients receive a deterministic 400 that points to the wrong input element and cannot reliably map the error back to their original payload.
Useful? React with 👍 / 👎.
Codex review on PR #658 flagged that `enforce_strict_function_tools_format` on the chat-completions path walked `responses_payload.tools` — which is the output of `_normalize_chat_tools`, a normalizer that silently drops invalid entries (non-dict tools, function tools with missing/empty `name`). When the normalizer dropped any earlier entry, the rejected `param` ("tools[i].function.parameters") referenced an index that no longer matched the client's inbound payload. Refactor: - Switch `enforce_strict_function_tools_format` to take a raw `tools: list[JsonValue]` plus a `nested: bool` flag. `nested=False` reads parameters/strict from the flat /v1/responses shape; `nested=True` reads them from the chat-completions `tool["function"][...]` shape. - Chat handler now validates the *raw* `payload.tools` list before `to_responses_request()` runs, so the reported index always lines up with what the client sent. Compliant payloads continue through the existing coercion path unchanged. - /v1/responses handler and `normalize_responses_request_payload` are updated to pass `responses.tools` directly (flat shape, default template). Regression coverage: - `tests/unit/test_strict_schema_validation.py`: - `test_chat_strict_violation_param_uses_original_index_when_normalizer_drops` - `test_chat_strict_enforcement_runs_against_raw_payload_in_handler_path` - existing chat coercion tests adjusted to call the new signature. - `tests/integration/test_openai_compat_features.py`: - `test_v1_chat_completions_strict_violation_param_uses_original_index` posts [dropped, violating] and asserts `param == "tools[1].function.parameters"`. Validation: - ruff check, ruff format --check, ty check: clean - pytest tests/unit tests/integration: 2180 passed - live docker /v1/chat/completions + /v1/responses 9-cell matrix (4 original + 1 shifted-index regression + 4 responses): 9/9 PASS - verify_v1_responses_openai_sdk.py: 5/5 PASS Refs: PR #658 codex review feedback.
|
Addressed codex P2 feedback in 48d7889. Root cause — Fix — refactored the helper to take a raw Regression coverage
Validation
|
|
@codex review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 48d7889758
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| if tool.get("type") != "function": | ||
| continue |
There was a problem hiding this comment.
Treat missing tool type as function during strict validation
The chat pre-validator now skips any tool whose type is not exactly "function", but _normalize_chat_tools still coerces nested function tools to "function" when type is omitted ("type": tool_type or "function" in app/core/openai/chat_requests.py). With the new flow (validate raw chat tools before normalization), a payload like { "function": {..., "strict": true, "parameters": <invalid schema>} } bypasses pre-validation, gets normalized into a strict function tool, and can still fail upstream as a 5xx instead of the intended local 400 invalid_function_parameters.
Useful? React with 👍 / 👎.
Second codex review pass on PR #658 (commit 48d7889) flagged that the chat-side pre-validator's `tool.get("type") != "function"` guard is out of sync with `_normalize_chat_tools`. The normalizer coerces any tool whose `function` value is a dict into a function tool — including ones with no top-level `type` (`"type": tool_type or "function"` in chat_requests.py:198). That meant a payload like {"function": {"strict": true, "parameters": <invalid>}} was skipped by pre-validation, normalized into a strict function tool, sent upstream, and rejected with a 5xx instead of the intended local 400 `invalid_function_parameters`. Fix: in the `nested=True` branch of `enforce_strict_function_tools_format`, anchor detection on the presence of a `tool["function"]` dict — exactly the rule `_normalize_chat_tools` uses. `type" check stays in the flat (`nested=False`) branch for the /v1/responses path, where the request model already requires an explicit type. Regression coverage: - Unit (tests/unit/test_strict_schema_validation.py): - test_chat_strict_violation_when_type_omitted_but_function_dict_present - test_chat_strict_skips_tool_without_function_wrapper_even_with_type - test_responses_path_still_requires_type_function - Integration (tests/integration/test_openai_compat_features.py): - test_v1_chat_completions_strict_violation_when_type_omitted_in_chat_tool Validation: - ruff / format / ty: clean - pytest tests/unit tests/integration: 2184 passed - Live docker matrix on codex-lb-dev-server with real account: 12/12 PASS — A-H chat (incl. F=type-omitted strict violation -> 400, G=type-omitted compliant -> 200, H=type-omitted strict=false -> 200) + A-D responses - verify_v1_responses_openai_sdk.py: 5/5 PASS Refs: PR #658 codex review pass 2.
|
Addressed second codex review pass (P2 on Root cause — chat pre-validator's Fix — in the Regression coverage (3 unit + 1 integration):
Validation
@codex review |
|
Codex Review: Didn't find any major issues. 🚀 ℹ️ About Codex in GitHubYour team has set up Codex to review pull requests in this repo. Reviews are triggered when you
If Codex has suggestions, it will comment; otherwise it will react with 👍. Codex can also answer questions or update the PR. Try commenting "@codex address that feedback". |
Release 1.18.0 — v1.17.0 → main, 27 PR. ## Pre-merge verification (release-gate e2e) - ✅ CI 18/18 (mergeStateStatus CLEAN) - ✅ baseline release-gate-smoke: 10/10 (`v1.models`, chat non-stream/stream/tool_call, responses non-stream/stream, unknown-msg-key drop, transient codes, dashboard 72-byte, alembic index) - ✅ per-PR e2e extra: 14/14 (#658 strict tool × chat valid/invalid/type-omitted + responses invalid, #650-652 trim replay markers, #516 prev-resp miss mask, #558 prolite plan, #600 token_expired, #635 device poll, #647 frontend 72B, #594 account selection, #640 alembic head, #653/#654 conv archive, #655/#656 weekly pace) - ✅ pytest 562 passed / 3 PG-only skip (unit + integration suite) - ✅ SDK e2e 20/20 (`tests/e2e/test_openai_sdk_compat.py` + `tests/e2e/test_v1_responses_openai_sdk.py`) - ✅ `scripts/verify_v1_responses_openai_sdk.py` 5/5 against live container with real account - ✅ ruff check + ruff format --check + uv run ty check — All clean - ✅ prod parity probe: #637 (unknown-key drop) + #658 (strict tool 400) both confirmed fixed vs prod 1.17.0 ## Behavior change to note in announcements - **#658**: `strict: true` function tool schemas missing `additionalProperties: false` (or other strict-mode requirements) now return a local **400 `invalid_function_parameters`** with `param=tools[N].function.parameters` (chat) / `tools[N].parameters` (responses). Pre-1.18.0 these were silently forwarded upstream; 1.18.0 enforces OpenAI's documented strict-mode contract locally before forwarding. OpenSpec changes shipped in this release (to be archived after tag publish): - deactivate-on-token-expired (#600) - drop-unknown-message-fields (#637) - mask-single-http-previous-response-miss (#516) - normalize-v1-responses-openai-sdk-stream (#639) - start-device-oauth-polling (#635) - support-prolite-plan-capacity (#558) - treat-upstream-overloaded-as-retryable (#601) - validate-password-length (#598) - validate-strict-function-tool-schema (#658)
Archive changes shipped in v1.18.0 and v1.18.1 per AGENTS.md "release 이후 archive" workflow. All 12 changes verified shipped in production at 10.0.0.113 (image 1.18.1, sha256:0431a474432d). 1.18.0 (9 changes): - deactivate-on-token-expired (#600) - drop-unknown-message-fields (#637) - mask-single-http-previous-response-miss (#516) - normalize-v1-responses-openai-sdk-stream (#639) - start-device-oauth-polling (#635) [delta header fixed: MODIFIED -> ADDED] - support-prolite-plan-capacity (#558) [delta header fixed: MODIFIED -> ADDED] - treat-upstream-overloaded-as-retryable (#601) - validate-password-length (#598) - validate-strict-function-tool-schema (#658) 1.18.1 (3 changes): - harden-long-codex-websocket-turns (#586) - suppress-duplicate-side-effect-tool-calls (#586) - hotfix-db-pool-pre-ping-and-firewall-cache-ttl (#679) Specs updated: 9 capabilities (admin-auth, api-firewall, chat-completions-compat, chat-completions-extensions, database-backends, frontend-architecture, responses-api-compat, usage-refresh-policy). openspec validate --specs: 19/19 passed. Co-authored-by: Hermes Agent <hermes-agent@nous.ai>
Summary
Strict-mode schema violations on a function tool (e.g.
tools[i].function.parameterswithstrict: truebut missingadditionalProperties: false) currently reach the upstream Codex backend, which closes the WebSocket withclose_code=1000. codex-lb surfaces this as a generic502 server_error / upstream_rejected_input. Real OpenAI returns a deterministic400 invalid_function_parametersfor the same payload (OpenAI Structured Outputs docs,github/github-mcp-server#376).This PR fixes two related bugs around strict function tools:
/v1/responsesreturns 5xx where real OpenAI returns 4xx. Well-behaved retry/failover loops classify 5xx as transient and retry into a flood for a permanently-broken request. The skill notes this exact pattern: "the 502-vs-400 quirk is particularly nasty because well-behaved retry loops classify 5xx as transient."enforce_strict_text_formatalready pre-validatestext.format.json_schema; function tools were the only un-validated strict surface left./v1/chat/completionssilently masks the same violation._normalize_chat_toolsbuilds the responses-side tool item by explicit field selection but never copies thestrictfield, so chat-completions clients with violating schemas appear to succeed (200 +tool_calls). This both (a) diverges from real OpenAI and/v1/responses, and (b) lets schema bugs hide indefinitely from the client.Root Cause
/v1/chat/completions_normalize_chat_toolsdropsstrict→ upstream gets schema without strict modetool_calls(strict silently lost)invalid_function_parameters/v1/responsesclose_code=1000)upstream_rejected_inputinvalid_function_parametersBoth paths go through
to_responses_request()→enforce_strict_*pre-validation. The fix is local and symmetric:validate_strict_function_tool_schema(mirrors the existingvalidate_strict_json_schema, distinct envelopecode=invalid_function_parameters).enforce_strict_function_tools_format(request, *, param_template=...)and call it in both endpoint handlers +normalize_responses_request_payload. The chat handler passes the chat-styleparam_template="tools[{index}].function.parameters"; everyone else uses the defaulttools[{index}].parameters.stricton chat function tools — preserve it so the responses-side enforcement applies via the shared coercion pipeline.Changes
app/core/openai/strict_schema.py— newvalidate_strict_function_tool_schema()re-using the shared_find_violationwalker.app/modules/proxy/request_policy.py— newenforce_strict_function_tools_format(); wired intonormalize_responses_request_payload().app/modules/proxy/api.py— callenforce_strict_function_tools_format()in bothv1_responsesandv1_chat_completionshandlers (the chat handler passes the chat-style param template).app/core/openai/chat_requests.py—_normalize_chat_tools()now preserves the function-levelstrictboolean.scripts/verify_v1_responses_openai_sdk.py—case_tool_call_streamingnow uses a spec-compliant strict schema (additionalProperties: false). Suite is now deterministic 5/5.Spec
openspec/changes/validate-strict-function-tool-schema/specs/responses-api-compat/spec.md—ADDED Requirement: Strict function tool parameter schemas are pre-validated. Five scenarios covering missingadditionalProperties,additionalProperties:true, missingrequired, compliant strict schema, andstrict:false/omitted skip.openspec/changes/validate-strict-function-tool-schema/specs/chat-completions-compat/spec.md—ADDED Requirement: _normalize_chat_tools preserves the function tool strict flag. Four scenarios covering compliant strict 200, violating strict 400, strict=false preservation, and built-in tool unaffected.Tests
tests/unit/test_strict_schema_validation.py:additionalProperties,additionalProperties=true, missingrequired, valid schema, nested violation, anonymous-name fallback.strict=falseskipped, omittedstrictskipped, compliant strict accepted, chat-style param template.tests/integration/test_openai_compat_features.py:test_v1_chat_completions_rejects_strict_function_tool_violation(assertstools[0].function.parameters).test_v1_responses_rejects_strict_function_tool_violation(assertstools[0].parameters).Validation
✅
openspec validate validate-strict-function-tool-schema→ "is valid"✅ Targeted sweep (per
codex-lb-developmentskill §5): 241 passed✅ SDK-level e2e (skill §5): 20 passed (
tests/e2e/test_openai_sdk_compat.py tests/e2e/test_v1_responses_openai_sdk.py)✅
uvx ruff check . && uvx ruff format --check . && uv run ty check: all green✅ Live four-cell matrix on
codex-lb-dev-serverwith a real account: 8/8 PASS/v1/chat/completionsinvalid_function_parameters,tools[0].function.parameters/v1/chat/completionsfalse/v1/chat/completions/v1/chat/completions/v1/responsesinvalid_function_parameters,tools[0].parameters/v1/responsesfalse/v1/responses/v1/responses✅
scripts/verify_v1_responses_openai_sdk.pyagainst the dev container: 5/5 PASS (was 4/5 pre-fix because the script itself sent a spec-violating payload).Notes
mask-single-http-previous-response-miss/treat-upstream-overloaded-as-retryablealready addressed for other code paths; function tools were just the last un-validated strict surface.tool_choice, tool-type registry, and built-in tools (web_search,image_generation, …) are unchanged. Onlyfunctiontools withstrict:trueand a non-Noneparametersare touched.param_templateparameter keeps the error envelope'sparamfield aligned with whatever shape the client originally sent, mirroring real OpenAI's behavior across the two endpoints.