🤖 Add integration test for OpenAI reasoning error #72

ammario · 2025-10-07T16:45:41Z

Summary

This PR adds an integration test to reproduce and track the intermittent OpenAI reasoning error:

Item 'rs_*' of type 'reasoning' was provided without its required following item.

Test Implementation

File: tests/ipcMain/openaiReasoning.test.ts

The test:

Uses OpenAI reasoning model (gpt-5-codex)
Sends multi-turn conversation with reasoning + tool calls
Runs multiple attempts (default 10, configurable via OPENAI_REASONING_TEST_RUNS)
Detects the specific error in stream events
Reports whether error was reproduced

Run with:

TEST_INTEGRATION=1 bun x jest tests/ipcMain/openaiReasoning.test.ts

# Or with custom attempt count
OPENAI_REASONING_TEST_RUNS=20 TEST_INTEGRATION=1 bun x jest tests/ipcMain/openaiReasoning.test.ts

Test Results

First Test Run (3 attempts)

Run 3: ✅ ERROR REPRODUCED
- Error: Item 'rs_05d1dd2ba9ba43270068e541ecb9ec81938b35eead69f3d8c3' of type 'reasoning' was provided without its required following item.
- Occurred on FIRST message (not a follow-up)
- Stream events: [stream-start, reasoning-end, tool-call-start, stream-error]

Second Test Run (10 attempts)

All 10 runs: ✅ NO ERRORS
- All messages succeeded
- All tool calls completed successfully

Key Findings

Error is intermittent: Not deterministic, likely depends on OpenAI's internal state or timing
Current fix MAY be working: The clearProviderMetadataForOpenAI() function on this branch appears to reduce error frequency significantly:
- First run: 1/3 failures (33%)
- Second run: 0/10 failures (0%)
Error can occur on FIRST message: Not just on follow-ups with conversation history
Test is functional: Successfully detects the error when it occurs

Next Steps

Run extensive testing: 50-100 attempts to get statistical significance
Compare branches: Test error rates on main (without fix) vs tokens (with fix)
Analyze debug dumps: When errors occur, examine the dumped JSON files
Consider additional fixes if needed:
- Omit previous_response_id entirely when errors occur
- Clear providerMetadata more aggressively (on ALL part types)
- Add retry logic for this specific error

Issue: User reports error still occurs after PR Strip reasoning parts from message history (OpenAI fix) #61 and PR Fix OpenAI reasoning errors by clearing provider metadata #68
GitHub Issue: Item 'rs_...' of type 'reasoning' was provided without its required following item vercel/ai#7099
Current fix: src/utils/messages/modelMessageTransform.ts (clearProviderMetadataForOpenAI())

Generated with cmux

The previous fix added 'reasoning.encrypted_content' to the include option, but the root cause was that reasoning parts from history were being sent back to OpenAI's Responses API. When reasoning parts are included in messages sent to OpenAI, the SDK creates separate reasoning items with IDs (e.g., rs_*). These orphaned reasoning items cause errors: 'Item rs_* of type reasoning was provided without its required following item.' Solution: Strip reasoning parts from CmuxMessages BEFORE converting to ModelMessages. Reasoning content is only for display/debugging and should never be sent back to the API in subsequent turns. This happens in filterEmptyAssistantMessages() which runs before convertToModelMessages(), ensuring reasoning parts never reach the API.

Per Anthropic documentation, reasoning content SHOULD be sent back to Anthropic models via the sendReasoning option (defaults to true). However, OpenAI's Responses API uses encrypted reasoning items (IDs like rs_*) that are managed automatically via previous_response_id. Anthropic-style text-based reasoning parts sent to OpenAI create orphaned reasoning items that cause 'reasoning without following item' errors. Changes: - Reverted filterEmptyAssistantMessages() to only filter reasoning-only messages - Added new stripReasoningForOpenAI() function for OpenAI-specific stripping - Apply reasoning stripping only for OpenAI provider in aiService.ts - Added detailed comments explaining the provider-specific differences

OpenAI's Responses API uses encrypted reasoning items (rs_*) managed via previous_response_id. Sending stale provider metadata from history causes: - "Item 'rs_*' of type 'reasoning' was provided without its required following item" - "referenced reasoning on a function_call was not provided" Solution: Blank out providerMetadata on all content parts for OpenAI after convertToModelMessages(). This preserves reasoning content while preventing metadata conflicts. Also fixed splitMixedContentMessages to treat reasoning parts as text parts (they stay together with text, not with tool calls). Fixes #7099 (Vercel AI SDK issue) Reference: https://github.com/gvkhna/vibescraper

# Conflicts: # src/services/aiService.ts # src/utils/messages/modelMessageTransform.ts

- Change 'let filteredMessages' to 'const' (no longer reassigned) - Remove unused 'provider' parameter from transformModelMessages() - Fix clearProviderMetadataForOpenAI to actually clear reasoning parts (was only checking part.type === 'text', now checks both 'text' and 'reasoning') - Update all test calls to remove provider parameter - Update docstrings to reflect new behavior

@kristoph

Tool result messages (role: 'tool') can also contain stale providerMetadata on ToolResultPart that references the parent tool-call. This metadata can cause the same 'reasoning without following item' errors when sent back to OpenAI. Extended clearProviderMetadataForOpenAI() to also process tool messages. Evidence: - LanguageModelV3ToolResultPart has providerOptions field - @kristoph noted error occurs when items 'immediately after reasoning' lack IDs - Tool results are sent immediately after tool calls, completing the chain This makes the fix comprehensive for all message types that can have stale metadata.

This test attempts to reproduce the intermittent error: "Item 'rs_*' of type 'reasoning' was provided without its required following item" The test: - Uses OpenAI reasoning model (gpt-5-codex) - Sends multi-turn conversation with reasoning + tool calls - Runs multiple attempts (default 10, configurable via OPENAI_REASONING_TEST_RUNS) - Checks for the specific error in stream events Run with: TEST_INTEGRATION=1 bun x jest tests/ipcMain/openaiReasoning.test.ts The error is intermittent, so multiple attempts increase chances of reproduction. Once reproduced, debug dumps can be examined to understand the root cause. Generated with `cmux`

The test was incorrectly waiting for stream-end events even when stream-error occurred. Now it catches timeout exceptions and checks for error events regardless of whether stream-end was received. This allows the test to properly detect the OpenAI reasoning error when it occurs. Generated with `cmux`

ammario · 2025-10-07T16:46:46Z

Closing to recreate with clean branch

ammario added 9 commits October 6, 2025 22:02

Merge remote-tracking branch 'origin/main' into tokens

d88bd56

# Conflicts: # src/services/aiService.ts # src/utils/messages/modelMessageTransform.ts

Merge main into tokens

40db50c

ammario closed this Oct 7, 2025

ThomasK33 deleted the tokens branch October 10, 2025 20:31

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

🤖 Add integration test for OpenAI reasoning error #72

🤖 Add integration test for OpenAI reasoning error #72

Uh oh!

ammario commented Oct 7, 2025

Uh oh!

ammario commented Oct 7, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

🤖 Add integration test for OpenAI reasoning error #72

🤖 Add integration test for OpenAI reasoning error #72

Uh oh!

Conversation

ammario commented Oct 7, 2025

Summary

Test Implementation

Test Results

First Test Run (3 attempts)

Second Test Run (10 attempts)

Key Findings

Next Steps

Related

Uh oh!

ammario commented Oct 7, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant