Problem
The /usage command currently only shows total input/output token counts per model. It does not show how many input tokens were served from the provider prompt cache (cache hits) vs freshly computed. Users cannot verify whether prompt caching is effective without this breakdown.
Proposed Solution
Add a cache breakdown subline below each model line in /usage, showing:
- A 20-segment progress bar for the cache hit ratio
- The percentage of input tokens read from cache
- The absolute numbers: cache read tokens and non-cached (other) tokens
Example:
Session usage
kimi-code/kimi-for-coding input 776.2k output 12.9k total 789.1k
cache ████████████████░░░░ 81.7% hit (634.4k read · 141.8k other)
deepseek-v4-flash input 693.4k output 4.8k total 698.2k
cache ███████████████████░ 92.6% hit (642.2k read · 51.2k other)
The underlying TokenUsage type already tracks inputCacheRead, inputCacheCreation, and inputOther — this change only surfaces that data in the UI.
Implementation Notes
- Model names are now padded to max width for clean alignment across multiple models
- Percentage shows one decimal place when not a whole number (e.g., 81.7%)
- Cache line always appears even when zero, showing 0%
- Total summary line format is unchanged
- Only
usage-panel.ts and its tests are changed
Problem
The
/usagecommand currently only shows total input/output token counts per model. It does not show how many input tokens were served from the provider prompt cache (cache hits) vs freshly computed. Users cannot verify whether prompt caching is effective without this breakdown.Proposed Solution
Add a cache breakdown subline below each model line in
/usage, showing:Example:
The underlying
TokenUsagetype already tracksinputCacheRead,inputCacheCreation, andinputOther— this change only surfaces that data in the UI.Implementation Notes
usage-panel.tsand its tests are changed