-
Notifications
You must be signed in to change notification settings - Fork 26
🤖 Add truncation: auto to OpenAI Responses API #87
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
45d44e4 to
6168efc
Compare
SDK Investigation ResultsAfter investigating the Vercel AI SDK source code ( v2.0.40), I discovered that the What I Found:
Code Evidence:// From @ai-sdk/openai/dist/index.js line 2586-2588
...modelConfig.requiredAutoTruncation && {
truncation: "auto"
}The SDK uses explicit mapping for provider options - it doesn't spread the entire Current Status:
Next Steps:We should file an issue/PR with the Vercel AI SDK to add
Generated with |
Update: ForceImplemented a fetch wrapper that injects into every Key safeguards:
All lint/type checks pass. _Generated with _ |
Enables automatic conversation truncation for OpenAI Responses API to prevent context overflow errors. When set to 'auto', the API will automatically drop input items in the middle of the conversation to fit within the model's context window. This prevents failures when the conversation history exceeds the available context size and allows long conversations to continue seamlessly. _Generated with `cmux`_
Clarifies that the /truncate command does not work with OpenAI models due to the Responses API's server-side conversation state management. Explains that OpenAI uses automatic truncation instead and provides workarounds for users (/clear, /compact, or relying on auto-truncation). _Generated with `cmux`_
Extended OpenAIResponsesProviderOptions type to include the truncation parameter since the SDK types don't yet include it, but it's supported by the OpenAI Responses API. _Generated with `cmux`_
Wrap the OpenAI provider fetch to inject on every Responses API call. This ensures automatic conversation truncation works immediately, regardless of @ai-sdk/openai support. _Generated with _
Add disableAutoTruncation flag to SendMessageOptions for testing context overflow behavior. Test verifies: 1. Context limit exceeded when auto-truncation disabled 2. Successful recovery with auto-truncation enabled Test sends large messages (~10k tokens each) to trigger 128k context limit, then verifies truncation:auto allows continuation. Will run in CI with API keys. _Generated with `cmux`_
d18a38b to
eb6ff33
Compare
_Generated with `cmux`_
The test now properly waits for either stream-end or stream-error events during the context overflow phase, rather than assuming only IPC-level errors. This matches OpenAI's actual behavior where context exceeded errors are sent as stream-error events. _Generated with `cmux`_
The existing token limit error test was failing because auto-truncation is now enabled by default for OpenAI, preventing the expected context errors from occurring. This commit adds disableAutoTruncation: true to that test to preserve its original behavior. _Generated with `cmux`_
tests/ipcMain/sendMessage.test.ts
Outdated
| // Send up to 20 large messages (200k tokens total) | ||
| // Should exceed 128k context limit and trigger error | ||
| for (let i = 0; i < 20; i++) { | ||
| const result = await sendMessageWithModel( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use historyService to fill up context to reduce API costs... see sendMessage e2e tests for example
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
✅ Done! Updated the test to use HistoryService to populate history directly, reducing from 20 API calls to just 2. Test timeout reduced from 180s to 60s.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
✅ Done in c5fdf66
Instead of making 20 API calls to fill up the context window, directly use HistoryService to populate the conversation history. This reduces API costs significantly while still testing the auto-truncation behavior. Changes: - Use HistoryService.appendToHistory() to inject 12 large messages - Reduced from 20 API calls to just 2 (one to trigger error, one to verify success) - Reduced timeout from 180s to 60s (much faster execution) - Follows same pattern as existing token limit error test _Generated with `cmux`_
|
✅ Review comment addressed in commit c5fdf66 - test now uses HistoryService to reduce API costs |
The test was only using 12 messages, which wasn't enough to exceed the context window. Updated to use 80 messages (4M chars) to match the token limit error test for OpenAI. _Generated with `cmux`_
Prepares support for automatic conversation truncation with OpenAI Responses API.
truncationparameter from provider options. See investigation comment below for details.Changes
truncation: "auto"parameter to OpenAI provider options inbuildProviderOptions()/truncatecommandCurrent Behavior
/clearor/compactcommands to manage conversation historyNext Steps
File an issue/PR with Vercel AI SDK to add
truncationto the provider options mapping.Generated with
cmux