-
Notifications
You must be signed in to change notification settings - Fork 10k
Description
Feature hasn't been suggested before.
- I have verified this feature I'm about to request hasn't been suggested before.
Describe the enhancement you want to request
Problem
Some custom providers' OpenAI-compatible APIs (streaming) don't return usage information in their responses. This causes two critical issues:
- Context usage not displayed: opencode cannot show token consumption to users
- Automatic compact not triggered: The compact mechanism relies on usage data to know when to trigger, so it fails entirely when usage is missing
Context
Users are integrating with various custom OpenAI-compatible providers that may not include usage metadata in their streaming response chunks. This is a common scenario for:
- Self-hosted language model services
- Custom proxy implementations
- Alternative LLM providers that don't follow the full OpenAI API specification
Without accurate usage tracking, users lose visibility into their token consumption and, more importantly, the automatic compact feature becomes non-functional, potentially leading to context overflow.
Request
Add a fallback mechanism to locally estimate token usage when the API response doesn't include usage information. This should:
- Apply primarily to streaming responses (where usage is most commonly missing)
- Maintain compatibility with providers that do return usage (prefer actual usage when available)
- Ensure the automatic compact mechanism continues to work even with estimated data
- Be reasonably accurate for practical purposes
Proposed Solution
- Implement a local token counter that tracks tokens as streaming chunks arrive
- When API response usage is empty or missing:
- Use the local estimation as a fallback
- Log a warning that usage is estimated (not from API)
- Keep using actual API usage when it's available (for accuracy)
- The estimation should handle both prompt tokens and completion tokens separately
Additional Notes
Token estimation libraries like tiktoken or simple character-based heuristics could be used for the fallback implementation. The estimation doesn't need to be perfect—just accurate enough to enable the compact functionality to work reliably.