Skip to content

Conversation

@ammario
Copy link
Member

@ammario ammario commented Oct 7, 2025

Summary

This PR adds integration tests to reproduce the intermittent OpenAI reasoning error and updates dependencies to attempt to fix it:

Item 'rs_*' of type 'reasoning' was provided without its required following item.

Changes

1. Integration Test (Test Coverage)

File: tests/ipcMain/openaiReasoning.test.ts

Two test approaches:

  1. Random prompts - Generates aggressive prompts to trigger extensive reasoning
  2. Deterministic history replay ✅ - Uses real chat history from cmux-docs-style workspace that had 3800+ reasoning parts and 17 tool calls

The deterministic test successfully reproduces the error with 100% consistency on Turn 1.

Run with:

# Run deterministic test (1 attempt)
OPENAI_REASONING_TEST_RUNS=1 TEST_INTEGRATION=1 bun x jest tests/ipcMain/openaiReasoning.test.ts -t "should reproduce error with real chat history"

# Run random test (10 attempts)
OPENAI_REASONING_TEST_RUNS=10 TEST_INTEGRATION=1 bun x jest tests/ipcMain/openaiReasoning.test.ts

2. Dependency Updates & Configuration Fix

  • AI SDK: 5.0.56 → 5.0.60
  • @ai-sdk/openai: 2.0.40 → 2.0.44
  • Removed include: ['reasoning.encrypted_content'] from provider options

Rationale: Per OpenAI docs, encrypted reasoning content is only needed when store: false (ZDR compliance). We use store: true (default) with previousResponseId, so OpenAI manages reasoning state automatically.

Investigation Results

Key Findings

  1. Error is a Vercel AI SDK bug, tracked in:

  2. Error occurs during streaming on FIRST turn (no history), when:

    • Model generates reasoning
    • Model makes tool call
    • Stream sequence: [stream-start, reasoning-delta..., reasoning-end, tool-call-start, stream-error]
  3. Configuration doesn't fix it: Tested both WITH and WITHOUT include: ['reasoning.encrypted_content'] - error occurs in both cases

  4. SDK updates don't fix it: Error persists even with latest AI SDK 5.0.60

Debug Dumps

Test creates detailed dumps in ~/.cmux/debug_obj/:

  • 1_original_messages.json - Messages from history
  • 1b_openai_stripped.json - After stripping reasoning
  • 2_model_messages.json - After convertToModelMessages
  • 3_final_messages.json - Final messages sent to API

Known Limitation

⚠️ The underlying Vercel AI SDK bug is NOT resolved. The error still occurs due to how the SDK handles reasoning items during tool call streaming. This appears to be an issue in the SDK's serialization of reasoning items when transitioning to tool calls.

Testing Status

  • ✅ Integration test reproduces error with 100% consistency
  • ✅ Test confirms error occurs with both config options
  • ✅ All existing tests pass
  • ⚠️ Error remains unfixed - this PR tracks the issue and provides reproduction case

Next Steps

If monitoring shows the error persists in production:

  1. Report to Vercel with our detailed reproduction case
  2. Implement retry logic for this specific error
  3. Consider workarounds: Disable reasoning summaries, use different tool calling patterns, etc.

Generated with cmux

@ammario ammario force-pushed the openai-reasoning-test branch 5 times, most recently from e82a50a to 6b32d5e Compare October 7, 2025 18:01
This test reproduces the intermittent error:
"Item 'rs_*' of type 'reasoning' was provided without its required following item"

⚠️  DISABLED BY DEFAULT - Opt-in via OPENAI_REASONING_TEST_RUNS env variable

Key features (based on cmux-docs-style analysis):
- Uses ONLY read_file tool (safety - no file modifications)
- Sets reasoning effort to HIGH (maximize reasoning content)
- Uses aggressive prompts that trigger extensive reasoning (3800+ parts observed in real error)
- Tests pattern: user → assistant (reasoning + tools) → tool results → user (follow-up)

Rationale for opt-in:
- Makes real API calls (costs money)
- Specifically for reproducing an intermittent error, not for validating functionality
- Should be run manually when investigating the bug

Usage:
  OPENAI_REASONING_TEST_RUNS=10 TEST_INTEGRATION=1 bun x jest tests/ipcMain/openaiReasoning.test.ts

The test:
- Runs N attempts to trigger the intermittent error
- For each attempt, creates fresh workspace
- Sends 3 messages with prompts designed to trigger extensive reasoning
- Tool policy limits to read_file only (safety)
- Thinking level set to 'high' for maximum reasoning
- Checks for the specific error in stream events
- Reports whether error was reproduced

Generated with `cmux`
Added second test that replays actual chat history from cmux-docs-style workspace
which reliably reproduces the OpenAI reasoning error.

Benefits:
- Deterministic reproduction (no random prompts)
- Real-world conversation pattern that actually triggered the bug
- Faster debugging since we know exactly what causes it

The test:
- Loads chat.jsonl from tests/testdata/openai-reasoning-error-repro.jsonl
- Replays the 3 user messages in sequence
- Tests against the actual conversation that had 3800+ reasoning parts
- Should reliably trigger the error

Both tests are opt-in (require OPENAI_REASONING_TEST_RUNS env variable).

Generated with `cmux`
## Changes
- Update AI SDK from 5.0.56 to 5.0.60
- Update @ai-sdk/openai from 2.0.40 to 2.0.44
- Remove include: ['reasoning.encrypted_content'] from OpenAI provider options

## Reasoning
Based on OpenAI documentation and AI SDK guidelines, encrypted reasoning
content is only required when store: false (for ZDR compliance). Since we use
store: true (default) with previousResponseId, OpenAI manages reasoning state
automatically via the response ID.

## Known Issue
Despite this configuration, there is an intermittent Vercel AI SDK bug that
causes 'reasoning without following item' errors with OpenAI reasoning models
when using tool calls. This is tracked in:
- vercel/ai#7099 (supposedly fixed)
- vercel/ai#8031 (duplicate)

The error still occurs occasionally in production. Future work may include
implementing retry logic or additional workarounds if the SDK fix is incomplete.

Generated with `cmux`
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant