Skip to content

fix(proxy): pre-validate strict function tool schemas#658

Merged
Soju06 merged 3 commits into
mainfrom
fix/validate-strict-function-tool-schema
May 17, 2026
Merged

fix(proxy): pre-validate strict function tool schemas#658
Soju06 merged 3 commits into
mainfrom
fix/validate-strict-function-tool-schema

Conversation

@Soju06
Copy link
Copy Markdown
Owner

@Soju06 Soju06 commented May 15, 2026

Summary

Strict-mode schema violations on a function tool (e.g. tools[i].function.parameters with strict: true but missing additionalProperties: false) currently reach the upstream Codex backend, which closes the WebSocket with close_code=1000. codex-lb surfaces this as a generic 502 server_error / upstream_rejected_input. Real OpenAI returns a deterministic 400 invalid_function_parameters for the same payload (OpenAI Structured Outputs docs, github/github-mcp-server#376).

This PR fixes two related bugs around strict function tools:

  1. /v1/responses returns 5xx where real OpenAI returns 4xx. Well-behaved retry/failover loops classify 5xx as transient and retry into a flood for a permanently-broken request. The skill notes this exact pattern: "the 502-vs-400 quirk is particularly nasty because well-behaved retry loops classify 5xx as transient." enforce_strict_text_format already pre-validates text.format.json_schema; function tools were the only un-validated strict surface left.

  2. /v1/chat/completions silently masks the same violation. _normalize_chat_tools builds the responses-side tool item by explicit field selection but never copies the strict field, so chat-completions clients with violating schemas appear to succeed (200 + tool_calls). This both (a) diverges from real OpenAI and /v1/responses, and (b) lets schema bugs hide indefinitely from the client.

Root Cause

Endpoint Same invalid strict tool payload codex-lb (pre-fix) Real OpenAI
/v1/chat/completions _normalize_chat_tools drops strict → upstream gets schema without strict mode 200 + tool_calls (strict silently lost) 400 invalid_function_parameters
/v1/responses strict pass-through → upstream rejects (close_code=1000) 502 upstream_rejected_input 400 invalid_function_parameters

Both paths go through to_responses_request()enforce_strict_* pre-validation. The fix is local and symmetric:

  • Add validate_strict_function_tool_schema (mirrors the existing validate_strict_json_schema, distinct envelope code=invalid_function_parameters).
  • Add enforce_strict_function_tools_format(request, *, param_template=...) and call it in both endpoint handlers + normalize_responses_request_payload. The chat handler passes the chat-style param_template="tools[{index}].function.parameters"; everyone else uses the default tools[{index}].parameters.
  • Stop dropping strict on chat function tools — preserve it so the responses-side enforcement applies via the shared coercion pipeline.

Changes

  • app/core/openai/strict_schema.py — new validate_strict_function_tool_schema() re-using the shared _find_violation walker.
  • app/modules/proxy/request_policy.py — new enforce_strict_function_tools_format(); wired into normalize_responses_request_payload().
  • app/modules/proxy/api.py — call enforce_strict_function_tools_format() in both v1_responses and v1_chat_completions handlers (the chat handler passes the chat-style param template).
  • app/core/openai/chat_requests.py_normalize_chat_tools() now preserves the function-level strict boolean.
  • scripts/verify_v1_responses_openai_sdk.pycase_tool_call_streaming now uses a spec-compliant strict schema (additionalProperties: false). Suite is now deterministic 5/5.

Spec

  • openspec/changes/validate-strict-function-tool-schema/specs/responses-api-compat/spec.mdADDED Requirement: Strict function tool parameter schemas are pre-validated. Five scenarios covering missing additionalProperties, additionalProperties:true, missing required, compliant strict schema, and strict:false/omitted skip.
  • openspec/changes/validate-strict-function-tool-schema/specs/chat-completions-compat/spec.mdADDED Requirement: _normalize_chat_tools preserves the function tool strict flag. Four scenarios covering compliant strict 200, violating strict 400, strict=false preservation, and built-in tool unaffected.

Tests

  • 16 new unit cases in tests/unit/test_strict_schema_validation.py:
    • Validator (6): missing additionalProperties, additionalProperties=true, missing required, valid schema, nested violation, anonymous-name fallback.
    • Enforce hook (6): rejection envelope (code/param/type/message), correct index reporting, strict=false skipped, omitted strict skipped, compliant strict accepted, chat-style param template.
    • Chat coercion (4): strict=true preserved, strict=true violating pre-validates, strict=false preserved, omitted strict has no key.
  • 2 new integration cases in tests/integration/test_openai_compat_features.py:
    • test_v1_chat_completions_rejects_strict_function_tool_violation (asserts tools[0].function.parameters).
    • test_v1_responses_rejects_strict_function_tool_violation (asserts tools[0].parameters).

Validation

  • openspec validate validate-strict-function-tool-schema → "is valid"

  • ✅ Targeted sweep (per codex-lb-development skill §5): 241 passed

    pytest tests/unit/test_strict_schema_validation.py tests/unit/test_chat_request_mapping.py \
      tests/unit/test_openai_requests.py tests/integration/test_proxy_chat_completions.py \
      tests/integration/test_proxy_responses.py tests/integration/test_openai_client_compat.py \
      tests/integration/test_openai_compat_features.py
    
  • ✅ SDK-level e2e (skill §5): 20 passed (tests/e2e/test_openai_sdk_compat.py tests/e2e/test_v1_responses_openai_sdk.py)

  • uvx ruff check . && uvx ruff format --check . && uv run ty check: all green

  • Live four-cell matrix on codex-lb-dev-server with a real account: 8/8 PASS

    Endpoint Cell strict additionalProperties Result
    /v1/chat/completions A true missing ✓ 400 invalid_function_parameters, tools[0].function.parameters
    /v1/chat/completions B true false ✓ 200
    /v1/chat/completions C false missing ✓ 200
    /v1/chat/completions D omitted missing ✓ 200
    /v1/responses A true missing ✓ 400 invalid_function_parameters, tools[0].parameters
    /v1/responses B true false ✓ 200
    /v1/responses C false missing ✓ 200
    /v1/responses D omitted missing ✓ 200
  • scripts/verify_v1_responses_openai_sdk.py against the dev container: 5/5 PASS (was 4/5 pre-fix because the script itself sent a spec-violating payload).

Notes

  • Not a regression — 1.17.0 and earlier had the same behavior. This is the inverse of the 502→400 fix surface that fix(proxy): make /v1/responses streams parseable by the OpenAI SDK #639 / mask-single-http-previous-response-miss / treat-upstream-overloaded-as-retryable already addressed for other code paths; function tools were just the last un-validated strict surface.
  • tool_choice, tool-type registry, and built-in tools (web_search, image_generation, …) are unchanged. Only function tools with strict:true and a non-None parameters are touched.
  • The param_template parameter keeps the error envelope's param field aligned with whatever shape the client originally sent, mirroring real OpenAI's behavior across the two endpoints.

Strict-mode JSON schema violations on a function tool (e.g. `strict:true`
with no `additionalProperties:false`) reached the upstream Codex backend,
which closed the WebSocket with `close_code=1000` and surfaced as a
generic `502 upstream_rejected_input`. Real OpenAI returns deterministic
`400 invalid_function_parameters` for the same payload — and a 5xx on a
permanently-broken request triggers retry/failover loops in clients.

Two parts:

1. Add `validate_strict_function_tool_schema` + `enforce_strict_function_tools_format`
   mirroring the existing `text.format.json_schema` strict pre-validator.
   Wired into both `/v1/responses` and `/v1/chat/completions` handlers
   plus `normalize_responses_request_payload` for the WS path. Param
   shape matches the source endpoint: native responses callers see
   `tools[i].parameters`; chat-completions callers see
   `tools[i].function.parameters`.

2. Stop dropping the function-tool `strict` field in
   `_normalize_chat_tools`. Previously chat-completions silently masked
   strict tool schema violations (200 + tool_calls), diverging from both
   real OpenAI and `/v1/responses`. With the flag preserved, the chat
   path enters the same strict pre-validation pipeline.

Side note: `scripts/verify_v1_responses_openai_sdk.py::case_tool_call_streaming`
was sending a strict tool schema without `additionalProperties:false`,
matching the new 400. Fix the script to use a spec-compliant strict
schema — verify suite is now deterministic 5/5.

Tests: 16 new unit cases in `test_strict_schema_validation.py` (validator
+ enforce + chat coercion preservation) and 2 new integration cases
covering both endpoints' 400 envelope shape.

Validation:
- openspec validate validate-strict-function-tool-schema → "is valid"
- targeted sweep: 241 passed
- SDK e2e: 20 passed
- uvx ruff check / ruff format --check / uv run ty check: all green
- live four-cell matrix on `codex-lb-dev-server` with a real account: 8/8 PASS
  (A: 400 + correct code/param both endpoints; B/C/D: 200)
- verify_v1_responses_openai_sdk.py vs the dev container: 5/5 PASS
  (was 4/5 due to the script's own spec-violating payload)
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: f7ebcfb198

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread app/modules/proxy/request_policy.py Outdated
"""
if not request.tools:
return
for index, tool in enumerate(request.tools):
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Use original tool index in chat strict-schema errors

enforce_strict_function_tools_format builds param from the index in request.tools, but for chat-completions that list has already been normalized by _normalize_chat_tools, which can drop entries (for example function tools with missing/empty names). In that case a strict-schema violation is reported against tools[i].function.parameters with a shifted i, so clients receive a deterministic 400 that points to the wrong input element and cannot reliably map the error back to their original payload.

Useful? React with 👍 / 👎.

Codex review on PR #658 flagged that `enforce_strict_function_tools_format`
on the chat-completions path walked `responses_payload.tools` — which is
the output of `_normalize_chat_tools`, a normalizer that silently drops
invalid entries (non-dict tools, function tools with missing/empty
`name`). When the normalizer dropped any earlier entry, the rejected
`param` ("tools[i].function.parameters") referenced an index that no
longer matched the client's inbound payload.

Refactor:

- Switch `enforce_strict_function_tools_format` to take a raw
  `tools: list[JsonValue]` plus a `nested: bool` flag. `nested=False`
  reads parameters/strict from the flat /v1/responses shape;
  `nested=True` reads them from the chat-completions
  `tool["function"][...]` shape.
- Chat handler now validates the *raw* `payload.tools` list before
  `to_responses_request()` runs, so the reported index always lines up
  with what the client sent. Compliant payloads continue through the
  existing coercion path unchanged.
- /v1/responses handler and `normalize_responses_request_payload` are
  updated to pass `responses.tools` directly (flat shape, default
  template).

Regression coverage:

- `tests/unit/test_strict_schema_validation.py`:
  - `test_chat_strict_violation_param_uses_original_index_when_normalizer_drops`
  - `test_chat_strict_enforcement_runs_against_raw_payload_in_handler_path`
  - existing chat coercion tests adjusted to call the new signature.
- `tests/integration/test_openai_compat_features.py`:
  - `test_v1_chat_completions_strict_violation_param_uses_original_index`
    posts [dropped, violating] and asserts
    `param == "tools[1].function.parameters"`.

Validation:

- ruff check, ruff format --check, ty check: clean
- pytest tests/unit tests/integration: 2180 passed
- live docker /v1/chat/completions + /v1/responses 9-cell matrix
  (4 original + 1 shifted-index regression + 4 responses): 9/9 PASS
- verify_v1_responses_openai_sdk.py: 5/5 PASS

Refs: PR #658 codex review feedback.
@Soju06
Copy link
Copy Markdown
Owner Author

Soju06 commented May 16, 2026

Addressed codex P2 feedback in 48d7889.

Root causeenforce_strict_function_tools_format was walking responses_payload.tools (the output of _normalize_chat_tools), which silently drops invalid entries (non-dict tools, function tools with missing/empty name). When an earlier entry was dropped, the reported param ("tools[i].function.parameters") referenced a shifted i that no longer matched the client's inbound payload.

Fix — refactored the helper to take a raw tools: list[JsonValue] plus nested: bool instead of a ResponsesRequest. The chat handler now validates the raw payload.tools list before to_responses_request() runs _normalize_chat_tools, so the reported index always lines up with the inbound request. The /v1/responses handler is unaffected behaviorally (passes responses.tools directly).

Regression coverage

  • Unit:
    • test_chat_strict_violation_param_uses_original_index_when_normalizer_drops — payload [dropped@0, violation@1], asserts param == 'tools[1].function.parameters'.
    • test_chat_strict_enforcement_runs_against_raw_payload_in_handler_path — confirms compliant payload validates pre-coercion.
  • Integration: test_v1_chat_completions_strict_violation_param_uses_original_index exercises the same scenario through the FastAPI handler.

Validation

  • ruff / ty: clean
  • pytest tests/unit tests/integration: 2180 passed
  • Live docker matrix on codex-lb-dev-server with a real account (Pro): 9/9 PASS (4 chat × strict combos + new shifted-index regression cell + 4 responses × strict combos)
  • scripts/verify_v1_responses_openai_sdk.py: 5/5 PASS

@Soju06
Copy link
Copy Markdown
Owner Author

Soju06 commented May 16, 2026

@codex review

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 48d7889758

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread app/modules/proxy/request_policy.py Outdated
Comment on lines +279 to +280
if tool.get("type") != "function":
continue
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Treat missing tool type as function during strict validation

The chat pre-validator now skips any tool whose type is not exactly "function", but _normalize_chat_tools still coerces nested function tools to "function" when type is omitted ("type": tool_type or "function" in app/core/openai/chat_requests.py). With the new flow (validate raw chat tools before normalization), a payload like { "function": {..., "strict": true, "parameters": <invalid schema>} } bypasses pre-validation, gets normalized into a strict function tool, and can still fail upstream as a 5xx instead of the intended local 400 invalid_function_parameters.

Useful? React with 👍 / 👎.

Second codex review pass on PR #658 (commit 48d7889) flagged that the
chat-side pre-validator's `tool.get("type") != "function"` guard is
out of sync with `_normalize_chat_tools`. The normalizer coerces any
tool whose `function` value is a dict into a function tool — including
ones with no top-level `type` (`"type": tool_type or "function"` in
chat_requests.py:198). That meant a payload like

    {"function": {"strict": true, "parameters": <invalid>}}

was skipped by pre-validation, normalized into a strict function tool,
sent upstream, and rejected with a 5xx instead of the intended local
400 `invalid_function_parameters`.

Fix: in the `nested=True` branch of `enforce_strict_function_tools_format`,
anchor detection on the presence of a `tool["function"]` dict —
exactly the rule `_normalize_chat_tools` uses. `type" check stays in
the flat (`nested=False`) branch for the /v1/responses path, where the
request model already requires an explicit type.

Regression coverage:

- Unit (tests/unit/test_strict_schema_validation.py):
  - test_chat_strict_violation_when_type_omitted_but_function_dict_present
  - test_chat_strict_skips_tool_without_function_wrapper_even_with_type
  - test_responses_path_still_requires_type_function
- Integration (tests/integration/test_openai_compat_features.py):
  - test_v1_chat_completions_strict_violation_when_type_omitted_in_chat_tool

Validation:

- ruff / format / ty: clean
- pytest tests/unit tests/integration: 2184 passed
- Live docker matrix on codex-lb-dev-server with real account:
  12/12 PASS — A-H chat (incl. F=type-omitted strict violation -> 400,
  G=type-omitted compliant -> 200, H=type-omitted strict=false -> 200)
  + A-D responses
- verify_v1_responses_openai_sdk.py: 5/5 PASS

Refs: PR #658 codex review pass 2.
@Soju06
Copy link
Copy Markdown
Owner Author

Soju06 commented May 16, 2026

Addressed second codex review pass (P2 on request_policy.py:280) in 008d502.

Root cause — chat pre-validator's tool.get("type") != "function" guard was out of sync with _normalize_chat_tools, which coerces any tool with "function": {...} into a function tool ("type": tool_type or "function" at chat_requests.py:198). A payload like {"function": {"strict": true, "parameters": <invalid>}} (no top-level type) was skipped by pre-validation, normalized into a strict function tool, sent upstream, and surfaced as a misleading 5xx instead of the local 400.

Fix — in the nested=True branch of enforce_strict_function_tools_format, anchor function-tool detection on the presence of the tool["function"] dict (matching the normalizer's rule). The flat /v1/responses branch (nested=False) keeps requiring an explicit type == "function" since the request model already enforces it.

Regression coverage (3 unit + 1 integration):

  • test_chat_strict_violation_when_type_omitted_but_function_dict_present — exact payload codex described, asserts 400 with param=tools[0].function.parameters.
  • test_chat_strict_skips_tool_without_function_wrapper_even_with_type — symmetric guardrail.
  • test_responses_path_still_requires_type_function — defensive check that flat-shape behavior didn't shift.
  • test_v1_chat_completions_strict_violation_when_type_omitted_in_chat_tool — same scenario via FastAPI handler.

Validation

  • ruff / format / ty: clean
  • pytest tests/unit tests/integration: 2184 passed
  • Live docker matrix (A–H chat + A–D responses): 12/12 PASS, including the new F (type-omitted nested strict violation → 400), G (type-omitted compliant → 200), and H (type-omitted strict=false → 200).
  • verify_v1_responses_openai_sdk.py: 5/5 PASS

@codex review

@chatgpt-codex-connector
Copy link
Copy Markdown

Codex Review: Didn't find any major issues. 🚀

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

@Komzpa Komzpa added the 🤖 codex: ok [@codex review] says no issues found. label May 16, 2026
@Soju06 Soju06 merged commit 0998cac into main May 17, 2026
18 checks passed
@Soju06 Soju06 deleted the fix/validate-strict-function-tool-schema branch May 17, 2026 04:56
Soju06 added a commit that referenced this pull request May 17, 2026
Release 1.18.0 — v1.17.0 → main, 27 PR.

## Pre-merge verification (release-gate e2e)

- ✅ CI 18/18 (mergeStateStatus CLEAN)
- ✅ baseline release-gate-smoke: 10/10 (`v1.models`, chat non-stream/stream/tool_call, responses non-stream/stream, unknown-msg-key drop, transient codes, dashboard 72-byte, alembic index)
- ✅ per-PR e2e extra: 14/14 (#658 strict tool × chat valid/invalid/type-omitted + responses invalid, #650-652 trim replay markers, #516 prev-resp miss mask, #558 prolite plan, #600 token_expired, #635 device poll, #647 frontend 72B, #594 account selection, #640 alembic head, #653/#654 conv archive, #655/#656 weekly pace)
- ✅ pytest 562 passed / 3 PG-only skip (unit + integration suite)
- ✅ SDK e2e 20/20 (`tests/e2e/test_openai_sdk_compat.py` + `tests/e2e/test_v1_responses_openai_sdk.py`)
- ✅ `scripts/verify_v1_responses_openai_sdk.py` 5/5 against live container with real account
- ✅ ruff check + ruff format --check + uv run ty check — All clean
- ✅ prod parity probe: #637 (unknown-key drop) + #658 (strict tool 400) both confirmed fixed vs prod 1.17.0

## Behavior change to note in announcements

- **#658**: `strict: true` function tool schemas missing `additionalProperties: false` (or other strict-mode requirements) now return a local **400 `invalid_function_parameters`** with `param=tools[N].function.parameters` (chat) / `tools[N].parameters` (responses). Pre-1.18.0 these were silently forwarded upstream; 1.18.0 enforces OpenAI's documented strict-mode contract locally before forwarding.

OpenSpec changes shipped in this release (to be archived after tag publish):
- deactivate-on-token-expired (#600)
- drop-unknown-message-fields (#637)
- mask-single-http-previous-response-miss (#516)
- normalize-v1-responses-openai-sdk-stream (#639)
- start-device-oauth-polling (#635)
- support-prolite-plan-capacity (#558)
- treat-upstream-overloaded-as-retryable (#601)
- validate-password-length (#598)
- validate-strict-function-tool-schema (#658)
Soju06 added a commit that referenced this pull request May 17, 2026
Archive changes shipped in v1.18.0 and v1.18.1 per AGENTS.md
"release 이후 archive" workflow. All 12 changes verified shipped in
production at 10.0.0.113 (image 1.18.1, sha256:0431a474432d).

1.18.0 (9 changes):
- deactivate-on-token-expired (#600)
- drop-unknown-message-fields (#637)
- mask-single-http-previous-response-miss (#516)
- normalize-v1-responses-openai-sdk-stream (#639)
- start-device-oauth-polling (#635) [delta header fixed: MODIFIED -> ADDED]
- support-prolite-plan-capacity (#558) [delta header fixed: MODIFIED -> ADDED]
- treat-upstream-overloaded-as-retryable (#601)
- validate-password-length (#598)
- validate-strict-function-tool-schema (#658)

1.18.1 (3 changes):
- harden-long-codex-websocket-turns (#586)
- suppress-duplicate-side-effect-tool-calls (#586)
- hotfix-db-pool-pre-ping-and-firewall-cache-ttl (#679)

Specs updated: 9 capabilities (admin-auth, api-firewall, chat-completions-compat,
chat-completions-extensions, database-backends, frontend-architecture,
responses-api-compat, usage-refresh-policy).

openspec validate --specs: 19/19 passed.

Co-authored-by: Hermes Agent <hermes-agent@nous.ai>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

🤖 codex: ok [@codex review] says no issues found.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants