Skip to content

Conversation

@ammario
Copy link
Member

@ammario ammario commented Oct 7, 2025

Prepares support for automatic conversation truncation with OpenAI Responses API.

⚠️ Note: After investigating the Vercel AI SDK source, this change will not work with the current SDK version (@ai-sdk/openai v2.0.40) because the SDK does not map the truncation parameter from provider options. See investigation comment below for details.

Changes

  • Added truncation: "auto" parameter to OpenAI provider options in buildProviderOptions()
  • Extended TypeScript types to include the truncation parameter
  • Documented the OpenAI Responses API limitation with /truncate command

Current Behavior

  • The type extension is prepared for a future SDK update
  • OpenAI models will continue using server-side state management without explicit truncation control
  • Users should use /clear or /compact commands to manage conversation history

Next Steps

File an issue/PR with Vercel AI SDK to add truncation to the provider options mapping.

Generated with cmux

@ammario ammario force-pushed the oai-truncate branch 2 times, most recently from 45d44e4 to 6168efc Compare October 7, 2025 21:15
@ammario
Copy link
Member Author

ammario commented Oct 7, 2025

SDK Investigation Results

After investigating the Vercel AI SDK source code ( v2.0.40), I discovered that the truncation parameter will NOT be passed through with the current SDK version.

What I Found:

  1. SDK supports truncation: "auto" - it's in the implementation (line 2587 of dist/index.js)
  2. However, it's only added when modelConfig.requiredAutoTruncation is true
  3. No models currently set this flag - defaults to false for all models
  4. Provider options don't include truncation mapping - only explicitly mapped options are passed through

Code Evidence:

// From @ai-sdk/openai/dist/index.js line 2586-2588
...modelConfig.requiredAutoTruncation && {
  truncation: "auto"
}

The SDK uses explicit mapping for provider options - it doesn't spread the entire openaiOptions object. Since truncation isn't in the mapping list, our provider option won't be passed to the API.

Current Status:

  • ✅ Type extension added (prepared for future SDK update)
  • ❌ Parameter will NOT be passed through until SDK is updated
  • 📝 Documentation explains the limitation and workarounds

Next Steps:

We should file an issue/PR with the Vercel AI SDK to add truncation to the provider options mapping. Until then, users should:

  • Use /clear to start fresh conversations
  • Use /compact to intelligently summarize history
  • Wait for SDK update to enable automatic truncation

Generated with cmux

@ammario
Copy link
Member Author

ammario commented Oct 8, 2025

Update: Force

Implemented a fetch wrapper that injects into every
OpenAI Responses API request. This overrides the missing SDK support and
ensures automatic truncation works immediately.

Key safeguards:

  • Only applies to POST requests hitting
  • Does not modify requests that already set
  • Preserves original headers (removing stale )
  • Works with custom fetch implementations (preserves )

All lint/type checks pass.

_Generated with _

Enables automatic conversation truncation for OpenAI Responses API to
prevent context overflow errors. When set to 'auto', the API will
automatically drop input items in the middle of the conversation to fit
within the model's context window.

This prevents failures when the conversation history exceeds the
available context size and allows long conversations to continue
seamlessly.

_Generated with `cmux`_
Clarifies that the /truncate command does not work with OpenAI models
due to the Responses API's server-side conversation state management.

Explains that OpenAI uses automatic truncation instead and provides
workarounds for users (/clear, /compact, or relying on auto-truncation).

_Generated with `cmux`_
Extended OpenAIResponsesProviderOptions type to include the truncation
parameter since the SDK types don't yet include it, but it's supported
by the OpenAI Responses API.

_Generated with `cmux`_
Wrap the OpenAI provider fetch to inject  on every
Responses API call. This ensures automatic conversation truncation works
immediately, regardless of @ai-sdk/openai support.

_Generated with _
Add disableAutoTruncation flag to SendMessageOptions for testing context
overflow behavior. Test verifies:
1. Context limit exceeded when auto-truncation disabled
2. Successful recovery with auto-truncation enabled

Test sends large messages (~10k tokens each) to trigger 128k context limit,
then verifies truncation:auto allows continuation. Will run in CI with API keys.

_Generated with `cmux`_
ammar-agent and others added 4 commits October 8, 2025 14:17
_Generated with `cmux`_
The test now properly waits for either stream-end or stream-error events
during the context overflow phase, rather than assuming only IPC-level
errors. This matches OpenAI's actual behavior where context exceeded
errors are sent as stream-error events.

_Generated with `cmux`_
The existing token limit error test was failing because auto-truncation
is now enabled by default for OpenAI, preventing the expected context
errors from occurring. This commit adds disableAutoTruncation: true to
that test to preserve its original behavior.

_Generated with `cmux`_
// Send up to 20 large messages (200k tokens total)
// Should exceed 128k context limit and trigger error
for (let i = 0; i < 20; i++) {
const result = await sendMessageWithModel(
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use historyService to fill up context to reduce API costs... see sendMessage e2e tests for example

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Done! Updated the test to use HistoryService to populate history directly, reducing from 20 API calls to just 2. Test timeout reduced from 180s to 60s.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Done in c5fdf66

Instead of making 20 API calls to fill up the context window, directly
use HistoryService to populate the conversation history. This reduces
API costs significantly while still testing the auto-truncation behavior.

Changes:
- Use HistoryService.appendToHistory() to inject 12 large messages
- Reduced from 20 API calls to just 2 (one to trigger error, one to verify success)
- Reduced timeout from 180s to 60s (much faster execution)
- Follows same pattern as existing token limit error test

_Generated with `cmux`_
@ammar-agent
Copy link
Collaborator

✅ Review comment addressed in commit c5fdf66 - test now uses HistoryService to reduce API costs

The test was only using 12 messages, which wasn't enough to exceed
the context window. Updated to use 80 messages (4M chars) to match
the token limit error test for OpenAI.

_Generated with `cmux`_
@ammario ammario added this pull request to the merge queue Oct 8, 2025
Merged via the queue into main with commit dba64ac Oct 8, 2025
8 checks passed
@ammario ammario deleted the oai-truncate branch October 8, 2025 20:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants