Skip to content

Conversation

@ammario
Copy link
Member

@ammario ammario commented Oct 7, 2025

Summary

This PR adds an integration test to reproduce and track the intermittent OpenAI reasoning error:

Item 'rs_*' of type 'reasoning' was provided without its required following item.

Test Implementation

File: tests/ipcMain/openaiReasoning.test.ts

The test:

  • Uses OpenAI reasoning model (gpt-5-codex)
  • Sends multi-turn conversation with reasoning + tool calls
  • Runs multiple attempts (default 10, configurable via OPENAI_REASONING_TEST_RUNS)
  • Detects the specific error in stream events
  • Reports whether error was reproduced

Run with:

TEST_INTEGRATION=1 bun x jest tests/ipcMain/openaiReasoning.test.ts

# Or with custom attempt count
OPENAI_REASONING_TEST_RUNS=20 TEST_INTEGRATION=1 bun x jest tests/ipcMain/openaiReasoning.test.ts

Test Results

First Test Run (3 attempts)

  • Run 3: ✅ ERROR REPRODUCED
    • Error: Item 'rs_05d1dd2ba9ba43270068e541ecb9ec81938b35eead69f3d8c3' of type 'reasoning' was provided without its required following item.
    • Occurred on FIRST message (not a follow-up)
    • Stream events: [stream-start, reasoning-end, tool-call-start, stream-error]

Second Test Run (10 attempts)

  • All 10 runs: ✅ NO ERRORS
    • All messages succeeded
    • All tool calls completed successfully

Key Findings

  1. Error is intermittent: Not deterministic, likely depends on OpenAI's internal state or timing

  2. Current fix MAY be working: The clearProviderMetadataForOpenAI() function on this branch appears to reduce error frequency significantly:

    • First run: 1/3 failures (33%)
    • Second run: 0/10 failures (0%)
  3. Error can occur on FIRST message: Not just on follow-ups with conversation history

  4. Test is functional: Successfully detects the error when it occurs

Next Steps

  1. Run extensive testing: 50-100 attempts to get statistical significance
  2. Compare branches: Test error rates on main (without fix) vs tokens (with fix)
  3. Analyze debug dumps: When errors occur, examine the dumped JSON files
  4. Consider additional fixes if needed:
    • Omit previous_response_id entirely when errors occur
    • Clear providerMetadata more aggressively (on ALL part types)
    • Add retry logic for this specific error

Related


Generated with cmux

The previous fix added 'reasoning.encrypted_content' to the include option,
but the root cause was that reasoning parts from history were being sent
back to OpenAI's Responses API.

When reasoning parts are included in messages sent to OpenAI, the SDK creates
separate reasoning items with IDs (e.g., rs_*). These orphaned reasoning items
cause errors: 'Item rs_* of type reasoning was provided without its required
following item.'

Solution: Strip reasoning parts from CmuxMessages BEFORE converting to
ModelMessages. Reasoning content is only for display/debugging and should
never be sent back to the API in subsequent turns.

This happens in filterEmptyAssistantMessages() which runs before
convertToModelMessages(), ensuring reasoning parts never reach the API.
Per Anthropic documentation, reasoning content SHOULD be sent back
to Anthropic models via the sendReasoning option (defaults to true).

However, OpenAI's Responses API uses encrypted reasoning items (IDs like rs_*)
that are managed automatically via previous_response_id. Anthropic-style
text-based reasoning parts sent to OpenAI create orphaned reasoning items
that cause 'reasoning without following item' errors.

Changes:
- Reverted filterEmptyAssistantMessages() to only filter reasoning-only messages
- Added new stripReasoningForOpenAI() function for OpenAI-specific stripping
- Apply reasoning stripping only for OpenAI provider in aiService.ts
- Added detailed comments explaining the provider-specific differences
OpenAI's Responses API uses encrypted reasoning items (rs_*) managed via
previous_response_id. Sending stale provider metadata from history causes:
- "Item 'rs_*' of type 'reasoning' was provided without its required following item"
- "referenced reasoning on a function_call was not provided"

Solution: Blank out providerMetadata on all content parts for OpenAI after
convertToModelMessages(). This preserves reasoning content while preventing
metadata conflicts.

Also fixed splitMixedContentMessages to treat reasoning parts as text parts
(they stay together with text, not with tool calls).

Fixes #7099 (Vercel AI SDK issue)
Reference: https://github.com/gvkhna/vibescraper
# Conflicts:
#	src/services/aiService.ts
#	src/utils/messages/modelMessageTransform.ts
- Change 'let filteredMessages' to 'const' (no longer reassigned)
- Remove unused 'provider' parameter from transformModelMessages()
- Fix clearProviderMetadataForOpenAI to actually clear reasoning parts
  (was only checking part.type === 'text', now checks both 'text' and 'reasoning')
- Update all test calls to remove provider parameter
- Update docstrings to reflect new behavior
Tool result messages (role: 'tool') can also contain stale providerMetadata
on ToolResultPart that references the parent tool-call. This metadata can
cause the same 'reasoning without following item' errors when sent back to OpenAI.

Extended clearProviderMetadataForOpenAI() to also process tool messages.

Evidence:
- LanguageModelV3ToolResultPart has providerOptions field
- @kristoph noted error occurs when items 'immediately after reasoning' lack IDs
- Tool results are sent immediately after tool calls, completing the chain

This makes the fix comprehensive for all message types that can have stale metadata.
This test attempts to reproduce the intermittent error:
"Item 'rs_*' of type 'reasoning' was provided without its required following item"

The test:
- Uses OpenAI reasoning model (gpt-5-codex)
- Sends multi-turn conversation with reasoning + tool calls
- Runs multiple attempts (default 10, configurable via OPENAI_REASONING_TEST_RUNS)
- Checks for the specific error in stream events

Run with: TEST_INTEGRATION=1 bun x jest tests/ipcMain/openaiReasoning.test.ts

The error is intermittent, so multiple attempts increase chances of reproduction.
Once reproduced, debug dumps can be examined to understand the root cause.

Generated with `cmux`
The test was incorrectly waiting for stream-end events even when
stream-error occurred. Now it catches timeout exceptions and checks
for error events regardless of whether stream-end was received.

This allows the test to properly detect the OpenAI reasoning error when
it occurs.

Generated with `cmux`
@ammario
Copy link
Member Author

ammario commented Oct 7, 2025

Closing to recreate with clean branch

@ammario ammario closed this Oct 7, 2025
@ThomasK33 ThomasK33 deleted the tokens branch October 10, 2025 20:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant