🤖 Add integration test for OpenAI reasoning error #73

ammario · 2025-10-07T16:47:06Z

Summary

This PR adds integration tests to reproduce the intermittent OpenAI reasoning error and updates dependencies to attempt to fix it:

Item 'rs_*' of type 'reasoning' was provided without its required following item.

Changes

1. Integration Test (Test Coverage)

File: tests/ipcMain/openaiReasoning.test.ts

Two test approaches:

Random prompts - Generates aggressive prompts to trigger extensive reasoning
Deterministic history replay ✅ - Uses real chat history from cmux-docs-style workspace that had 3800+ reasoning parts and 17 tool calls

The deterministic test successfully reproduces the error with 100% consistency on Turn 1.

Run with:

# Run deterministic test (1 attempt)
OPENAI_REASONING_TEST_RUNS=1 TEST_INTEGRATION=1 bun x jest tests/ipcMain/openaiReasoning.test.ts -t "should reproduce error with real chat history"

# Run random test (10 attempts)
OPENAI_REASONING_TEST_RUNS=10 TEST_INTEGRATION=1 bun x jest tests/ipcMain/openaiReasoning.test.ts

2. Dependency Updates & Configuration Fix

AI SDK: 5.0.56 → 5.0.60
@ai-sdk/openai: 2.0.40 → 2.0.44
Removed include: ['reasoning.encrypted_content'] from provider options

Rationale: Per OpenAI docs, encrypted reasoning content is only needed when store: false (ZDR compliance). We use store: true (default) with previousResponseId, so OpenAI manages reasoning state automatically.

Investigation Results

Key Findings

Error is a Vercel AI SDK bug, tracked in:
- Issue #7099 (marked closed but still occurring)
- Issue #8031 (duplicate with gpt-5)
- Issue #8977 (reasoning config breaks metadata)
Error occurs during streaming on FIRST turn (no history), when:
- Model generates reasoning
- Model makes tool call
- Stream sequence: [stream-start, reasoning-delta..., reasoning-end, tool-call-start, stream-error]
Configuration doesn't fix it: Tested both WITH and WITHOUT include: ['reasoning.encrypted_content'] - error occurs in both cases
SDK updates don't fix it: Error persists even with latest AI SDK 5.0.60

Debug Dumps

Test creates detailed dumps in ~/.cmux/debug_obj/:

1_original_messages.json - Messages from history
1b_openai_stripped.json - After stripping reasoning
2_model_messages.json - After convertToModelMessages
3_final_messages.json - Final messages sent to API

Known Limitation

⚠️ The underlying Vercel AI SDK bug is NOT resolved. The error still occurs due to how the SDK handles reasoning items during tool call streaming. This appears to be an issue in the SDK's serialization of reasoning items when transitioning to tool calls.

Testing Status

✅ Integration test reproduces error with 100% consistency
✅ Test confirms error occurs with both config options
✅ All existing tests pass
⚠️ Error remains unfixed - this PR tracks the issue and provides reproduction case

Next Steps

If monitoring shows the error persists in production:

Report to Vercel with our detailed reproduction case
Implement retry logic for this specific error
Consider workarounds: Disable reasoning summaries, use different tool calling patterns, etc.

Generated with cmux

This test reproduces the intermittent error: "Item 'rs_*' of type 'reasoning' was provided without its required following item" ⚠️ DISABLED BY DEFAULT - Opt-in via OPENAI_REASONING_TEST_RUNS env variable Key features (based on cmux-docs-style analysis): - Uses ONLY read_file tool (safety - no file modifications) - Sets reasoning effort to HIGH (maximize reasoning content) - Uses aggressive prompts that trigger extensive reasoning (3800+ parts observed in real error) - Tests pattern: user → assistant (reasoning + tools) → tool results → user (follow-up) Rationale for opt-in: - Makes real API calls (costs money) - Specifically for reproducing an intermittent error, not for validating functionality - Should be run manually when investigating the bug Usage: OPENAI_REASONING_TEST_RUNS=10 TEST_INTEGRATION=1 bun x jest tests/ipcMain/openaiReasoning.test.ts The test: - Runs N attempts to trigger the intermittent error - For each attempt, creates fresh workspace - Sends 3 messages with prompts designed to trigger extensive reasoning - Tool policy limits to read_file only (safety) - Thinking level set to 'high' for maximum reasoning - Checks for the specific error in stream events - Reports whether error was reproduced Generated with `cmux`

Added second test that replays actual chat history from cmux-docs-style workspace which reliably reproduces the OpenAI reasoning error. Benefits: - Deterministic reproduction (no random prompts) - Real-world conversation pattern that actually triggered the bug - Faster debugging since we know exactly what causes it The test: - Loads chat.jsonl from tests/testdata/openai-reasoning-error-repro.jsonl - Replays the 3 user messages in sequence - Tests against the actual conversation that had 3800+ reasoning parts - Should reliably trigger the error Both tests are opt-in (require OPENAI_REASONING_TEST_RUNS env variable). Generated with `cmux`

Generated with `cmux`

## Changes - Update AI SDK from 5.0.56 to 5.0.60 - Update @ai-sdk/openai from 2.0.40 to 2.0.44 - Remove include: ['reasoning.encrypted_content'] from OpenAI provider options ## Reasoning Based on OpenAI documentation and AI SDK guidelines, encrypted reasoning content is only required when store: false (for ZDR compliance). Since we use store: true (default) with previousResponseId, OpenAI manages reasoning state automatically via the response ID. ## Known Issue Despite this configuration, there is an intermittent Vercel AI SDK bug that causes 'reasoning without following item' errors with OpenAI reasoning models when using tool calls. This is tracked in: - vercel/ai#7099 (supposedly fixed) - vercel/ai#8031 (duplicate) The error still occurs occasionally in production. Future work may include implementing retry logic or additional workarounds if the SDK fix is incomplete. Generated with `cmux`

ammario force-pushed the openai-reasoning-test branch 5 times, most recently from e82a50a to 6b32d5e Compare October 7, 2025 18:01

ammario added 4 commits October 7, 2025 13:02

🤖 Fix type error in history replay test

33728a7

Generated with `cmux`

ammario force-pushed the openai-reasoning-test branch from 6b32d5e to 34140a0 Compare October 7, 2025 18:02

ammario mentioned this pull request Oct 7, 2025

🤖 Fix OpenAI reasoning error using SDK middleware #77

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

🤖 Add integration test for OpenAI reasoning error #73

🤖 Add integration test for OpenAI reasoning error #73

ammario commented Oct 7, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

🤖 Add integration test for OpenAI reasoning error #73

Are you sure you want to change the base?

🤖 Add integration test for OpenAI reasoning error #73

Conversation

ammario commented Oct 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes

1. Integration Test (Test Coverage)

2. Dependency Updates & Configuration Fix

Investigation Results

Key Findings

Debug Dumps

Known Limitation

Testing Status

Next Steps

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

ammario commented Oct 7, 2025 •

edited

Loading