Skip to content

[FEATURE]: Add local token estimation fallback when API usage is empty #13141

@FurryWolfX

Description

@FurryWolfX

Feature hasn't been suggested before.

  • I have verified this feature I'm about to request hasn't been suggested before.

Describe the enhancement you want to request

Problem

Some custom providers' OpenAI-compatible APIs (streaming) don't return usage information in their responses. This causes two critical issues:

  1. Context usage not displayed: opencode cannot show token consumption to users
  2. Automatic compact not triggered: The compact mechanism relies on usage data to know when to trigger, so it fails entirely when usage is missing

Context

Users are integrating with various custom OpenAI-compatible providers that may not include usage metadata in their streaming response chunks. This is a common scenario for:

  • Self-hosted language model services
  • Custom proxy implementations
  • Alternative LLM providers that don't follow the full OpenAI API specification

Without accurate usage tracking, users lose visibility into their token consumption and, more importantly, the automatic compact feature becomes non-functional, potentially leading to context overflow.

Request

Add a fallback mechanism to locally estimate token usage when the API response doesn't include usage information. This should:

  • Apply primarily to streaming responses (where usage is most commonly missing)
  • Maintain compatibility with providers that do return usage (prefer actual usage when available)
  • Ensure the automatic compact mechanism continues to work even with estimated data
  • Be reasonably accurate for practical purposes

Proposed Solution

  1. Implement a local token counter that tracks tokens as streaming chunks arrive
  2. When API response usage is empty or missing:
    • Use the local estimation as a fallback
    • Log a warning that usage is estimated (not from API)
  3. Keep using actual API usage when it's available (for accuracy)
  4. The estimation should handle both prompt tokens and completion tokens separately

Additional Notes

Token estimation libraries like tiktoken or simple character-based heuristics could be used for the fallback implementation. The estimation doesn't need to be perfect—just accurate enough to enable the compact functionality to work reliably.

Metadata

Metadata

Assignees

Labels

discussionUsed for feature requests, proposals, ideas, etc. Open discussion

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions