tl;dr
I suspect that OpenCode Go/Zen incorrectly includes the reasoning_tokens count in the output token count when calculating costs for models that use the OpenAI-compatible API.
I believe it is standard to use completion_tokens as the sole source of the output token count when calculating API costs, because reasoning tokens are already included in that count. completion_tokens_details.reasoning_tokens is not a reliable data point, as shown in the example below, and should be handled carefully or ignored completely.
Recommended actions: Please verify whether the calculation is indeed off, as I suspected. If so, please rectify this issue so that users do not get overcharged. Additionally, the internal OpenCode bookkeeping table may need some adjustments to handle cases where the reasoning token count is larger than the output token count (see the extended evidence section).
A related issue: #17566
Evidence (Quick Version)
I've sent a test request and asked the model (Kimi K2.6) to simply respond with the word "Hi". The OpenCode Usage console page reported 77 output tokens and 78 reasoning tokens, resulting in 155 total output tokens billed. It's extremely unlikely that a single "Hi" would require 77 tokens. The more plausible explanation is that the API-reported 77 output tokens (completion_tokens) include reasoning tokens, and that the reported 78 reasoning tokens are inaccurate and higher than they should be. The API-reported upstream_inference_completions_cost is also based purely on completion_tokens.
The same behavior is also present in other models that use an OpenAI-compatible API (e.g., MiMo v2.5).
{
"id": "gen-1777100177-ykze5vmKKPzbRy7Itoe2",
"object": "chat.completion",
"created": 1777100177,
"model": "moonshotai/kimi-k2.6-20260420",
"provider": "Moonshot AI",
"system_fingerprint": "fpv0_d5906b82",
"choices": [
{
"index": 0,
"logprobs": null,
"finish_reason": "stop",
"native_finish_reason": "stop",
"message": {
"role": "assistant",
"content": "hi",
"refusal": null,
"reasoning": "The user wants me to answer with a single word \"hi\". This is a very simple request. I should respond with exactly \"hi\" and nothing else.\n\nWait, let me check the instructions. The user said \"Answer with a single word 'hi'.\" So I should output just: hi\n\nNo punctuation, no extra text, just the word hi.",
"reasoning_details": [
{
"type": "reasoning.text",
"text": "The user wants me to answer with a single word \"hi\". This is a very simple request. I should respond with exactly \"hi\" and nothing else.\n\nWait, let me check the instructions. The user said \"Answer with a single word 'hi'.\" So I should output just: hi\n\nNo punctuation, no extra text, just the word hi.",
"format": "unknown",
"index": 0
}
]
}
}
],
"usage": {
"prompt_tokens": 22,
"completion_tokens": 77,
"total_tokens": 99,
"cost": 0,
"is_byok": true,
"prompt_tokens_details": {
"cached_tokens": 0,
"cache_write_tokens": 0,
"audio_tokens": 0,
"video_tokens": 0
},
"cost_details": {
"upstream_inference_cost": 0.0003289,
"upstream_inference_prompt_cost": 2.09e-05,
"upstream_inference_completions_cost": 0.000308
},
"completion_tokens_details": {
"reasoning_tokens": 78,
"image_tokens": 0,
"audio_tokens": 0
}
},
"cost": "0"
}
Extended Evidence
(These screenshots have also been shared in the OpenCode Discord's zen channel).
Let's take a single message in an OpenCode session as an example. The Usage console (first screenshot) shows a total output of 2,016 tokens, consisting of 1,226 output tokens and 790 reasoning tokens. However, the message log from the local OpenCode database (.local/share/opencode/opencode.db) shows 436 output tokens and 790 reasoning tokens (second screenshot). The cost on the Usage console seems to be based on the 2,016-token count (third screenshot).
It seems that the local OpenCode database correctly treats completion_tokens as a token count that includes reasoning token counts. However, as we've seen in the previous section, the reasoning token count can be inaccurate, so we need to be careful in cases where reasoning_tokens is larger than completion_tokens. Having negative token.output values would give us the correct total completion tokens when aggregating, but is semantically incoherent. On the other hand, using max(0, completion_tokens - reasoning_tokens) would give us inaccurate overall statistics. I'm not sure how it is currently handled in OpenCode. I just want to flag this potential issue for inspection.
I've confirmed that OpenRouter charges users based on completion_tokens and ignores reasoning_tokens (following the norm). Below is an example in which reasoning_tokens is larger than completion_tokens.
Steps to reproduce
Run this python script (requires the requests package and the OPENAI_API_KEY environment variable set) and check the OpenCode Zen Usage console page:
import os
import json
import requests
API_KEY = os.environ.get("OPENAI_API_KEY")
URL = "https://opencode.ai/zen/go/v1/chat/completions"
headers = {
"Authorization": f"Bearer {API_KEY}",
"Content-Type": "application/json",
}
payload = {
"model": "kimi-k2.6",
"messages": [{"role": "user", "content": 'Hello, this is a test. Answer with a single word "hi".'}],
}
resp = requests.post(URL, headers=headers, json=payload)
resp.raise_for_status()
data = resp.json()
print(json.dumps(data, indent=2))
Screenshot and/or share link
No response
Operating System
No response
Terminal
No response
tl;dr
I suspect that OpenCode Go/Zen incorrectly includes the
reasoning_tokenscount in the output token count when calculating costs for models that use the OpenAI-compatible API.I believe it is standard to use
completion_tokensas the sole source of the output token count when calculating API costs, because reasoning tokens are already included in that count.completion_tokens_details.reasoning_tokensis not a reliable data point, as shown in the example below, and should be handled carefully or ignored completely.Recommended actions: Please verify whether the calculation is indeed off, as I suspected. If so, please rectify this issue so that users do not get overcharged. Additionally, the internal OpenCode bookkeeping table may need some adjustments to handle cases where the reasoning token count is larger than the output token count (see the extended evidence section).
A related issue: #17566
Evidence (Quick Version)
I've sent a test request and asked the model (Kimi K2.6) to simply respond with the word "Hi". The OpenCode Usage console page reported 77 output tokens and 78 reasoning tokens, resulting in 155 total output tokens billed. It's extremely unlikely that a single "Hi" would require 77 tokens. The more plausible explanation is that the API-reported 77 output tokens (
completion_tokens) include reasoning tokens, and that the reported 78 reasoning tokens are inaccurate and higher than they should be. The API-reportedupstream_inference_completions_costis also based purely oncompletion_tokens.The same behavior is also present in other models that use an OpenAI-compatible API (e.g., MiMo v2.5).
{ "id": "gen-1777100177-ykze5vmKKPzbRy7Itoe2", "object": "chat.completion", "created": 1777100177, "model": "moonshotai/kimi-k2.6-20260420", "provider": "Moonshot AI", "system_fingerprint": "fpv0_d5906b82", "choices": [ { "index": 0, "logprobs": null, "finish_reason": "stop", "native_finish_reason": "stop", "message": { "role": "assistant", "content": "hi", "refusal": null, "reasoning": "The user wants me to answer with a single word \"hi\". This is a very simple request. I should respond with exactly \"hi\" and nothing else.\n\nWait, let me check the instructions. The user said \"Answer with a single word 'hi'.\" So I should output just: hi\n\nNo punctuation, no extra text, just the word hi.", "reasoning_details": [ { "type": "reasoning.text", "text": "The user wants me to answer with a single word \"hi\". This is a very simple request. I should respond with exactly \"hi\" and nothing else.\n\nWait, let me check the instructions. The user said \"Answer with a single word 'hi'.\" So I should output just: hi\n\nNo punctuation, no extra text, just the word hi.", "format": "unknown", "index": 0 } ] } } ], "usage": { "prompt_tokens": 22, "completion_tokens": 77, "total_tokens": 99, "cost": 0, "is_byok": true, "prompt_tokens_details": { "cached_tokens": 0, "cache_write_tokens": 0, "audio_tokens": 0, "video_tokens": 0 }, "cost_details": { "upstream_inference_cost": 0.0003289, "upstream_inference_prompt_cost": 2.09e-05, "upstream_inference_completions_cost": 0.000308 }, "completion_tokens_details": { "reasoning_tokens": 78, "image_tokens": 0, "audio_tokens": 0 } }, "cost": "0" }Extended Evidence
(These screenshots have also been shared in the OpenCode Discord's zen channel).
Let's take a single message in an OpenCode session as an example. The Usage console (first screenshot) shows a total output of 2,016 tokens, consisting of 1,226 output tokens and 790 reasoning tokens. However, the message log from the local OpenCode database (.local/share/opencode/opencode.db) shows 436 output tokens and 790 reasoning tokens (second screenshot). The cost on the Usage console seems to be based on the 2,016-token count (third screenshot).
It seems that the local OpenCode database correctly treats
completion_tokensas a token count that includes reasoning token counts. However, as we've seen in the previous section, the reasoning token count can be inaccurate, so we need to be careful in cases wherereasoning_tokensis larger thancompletion_tokens. Having negativetoken.outputvalues would give us the correct total completion tokens when aggregating, but is semantically incoherent. On the other hand, usingmax(0, completion_tokens - reasoning_tokens)would give us inaccurate overall statistics. I'm not sure how it is currently handled in OpenCode. I just want to flag this potential issue for inspection.I've confirmed that OpenRouter charges users based on
completion_tokensand ignoresreasoning_tokens(following the norm). Below is an example in whichreasoning_tokensis larger thancompletion_tokens.Steps to reproduce
Run this python script (requires the
requestspackage and theOPENAI_API_KEYenvironment variable set) and check the OpenCode Zen Usage console page:Screenshot and/or share link
No response
Operating System
No response
Terminal
No response