fix: use actual token counts for /stats cost calculation instead of estimates#106
Conversation
Previously, /stats expense calculations used estimated input tokens (body.length / 4) and max_tokens for output, then applied a 20% buffer. This caused logged costs to significantly exceed actual costs. Changes: - Capture completion_tokens from API responses (streaming + non-streaming) - Use actual input/output token counts when available, fall back to estimates - Remove 20% buffer from logged cost (buffer remains in estimateAmount() for pre-payment, which is correct) - Add outputTokens field to UsageEntry type and log entries The pre-payment 20% buffer in estimateAmount() and the 1.5x BALANCE_CHECK_BUFFER are intentionally unchanged — those are conservative safeguards for payment, not for reporting. Closes BlockRunAI#36
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (2)
📝 WalkthroughWalkthroughThe changes introduce output token tracking alongside existing input token tracking and refactor cost calculations to use actual token counts from provider responses instead of estimates based on body length. A 20% buffer previously applied to cost estimates is removed. The logging payload is extended to include output token data when available. Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes 🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
📝 Coding Plan
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
1bcMax
left a comment
There was a problem hiding this comment.
Fix is correct and well-scoped. Both issues accurately diagnosed: maxTokens ≠ completion_tokens, and the 1.2x buffer belongs only in pre-payment estimation, not in logged costs. Pre-payment path (estimateAmount, BALANCE_CHECK_BUFFER) correctly left untouched. No conflicts with recent main. Merging.
Problem
Fixes #36
The
/statsendpoint overstates costs by ~2.5x compared to on-chain charges. Two compounding issues:1.
maxTokensused instead of actual output tokenscalculateModelCost()receivesmaxTokens(the maximum allowed output, e.g. 4096) rather than the actualcompletion_tokensfrom the response. A response generating 200 tokens is costed as if 4096 were produced.2. 20% buffer applied to logged cost
The pre-payment estimation correctly applies a 1.2x buffer (you overpay slightly because exact cost is unknown upfront). However, this same buffer was also applied to the logged cost shown in
/stats, inflating displayed expenses.Fix
Changes in
src/proxy.ts:completion_tokensfrom upstream responses (both streaming and non-streaming paths) into a newresponseOutputTokensvariableresponseInputTokens,responseOutputTokens) for cost calculation when available, falling back to estimates only when the upstream provider doesn't return usage dataUsageEntry— log actual cost, not the pre-payment estimateChanges in
src/logger.ts:outputTokensfield toUsageEntrytype for richer/statsreportingWhat was NOT changed:
estimateAmount()— pre-payment 20% buffer is correct and staysBALANCE_CHECK_BUFFER(1.5x) — balance checking buffer is correct and stayscalculateModelCost()— function is correct, it was just receiving wrong inputsTesting
tsc --noEmit)Summary by CodeRabbit
Release Notes
New Features
Bug Fixes