Bug: OpenCode Go/Zen appears to overcount output tokens when reasoning tokens are present

### tl;dr

I suspect that OpenCode Go/Zen incorrectly includes the `reasoning_tokens` count in the output token count when calculating costs for models that use the OpenAI-compatible API.

I believe it is standard to use `completion_tokens` as the sole source of the output token count when calculating API costs, because reasoning tokens are already included in that count. `completion_tokens_details.reasoning_tokens` is not a reliable data point, as shown in the example below, and should be handled carefully or ignored completely.

Recommended actions: Please verify whether the calculation is indeed off, as I suspected. If so, please rectify this issue so that users do not get overcharged. Additionally, the internal OpenCode bookkeeping table may need some adjustments to handle cases where the reasoning token count is larger than the output token count (see the extended evidence section).

A related issue: https://github.com/anomalyco/opencode/issues/17566

### Evidence (Quick Version)

I've sent a test request and asked the model (Kimi K2.6) to simply respond with the word "Hi". The OpenCode Usage console page reported 77 output tokens and 78 reasoning tokens, resulting in 155 total output tokens billed. It's extremely unlikely that a single "Hi" would require 77 tokens. The more plausible explanation is that the API-reported 77 output tokens (`completion_tokens`) include reasoning tokens, and that the reported 78 reasoning tokens are inaccurate and higher than they should be. The API-reported `upstream_inference_completions_cost` is also based purely on `completion_tokens`.

The same behavior is also present in other models that use an OpenAI-compatible API (e.g., MiMo v2.5).

<img width="1131" height="336" alt="Image" src="https://github.com/user-attachments/assets/3805489c-fd42-42f4-b62d-3ce61640595f" />

```json
{
  "id": "gen-1777100177-ykze5vmKKPzbRy7Itoe2",
  "object": "chat.completion",
  "created": 1777100177,
  "model": "moonshotai/kimi-k2.6-20260420",
  "provider": "Moonshot AI",
  "system_fingerprint": "fpv0_d5906b82",
  "choices": [
    {
      "index": 0,
      "logprobs": null,
      "finish_reason": "stop",
      "native_finish_reason": "stop",
      "message": {
        "role": "assistant",
        "content": "hi",
        "refusal": null,
        "reasoning": "The user wants me to answer with a single word \"hi\". This is a very simple request. I should respond with exactly \"hi\" and nothing else.\n\nWait, let me check the instructions. The user said \"Answer with a single word 'hi'.\" So I should output just: hi\n\nNo punctuation, no extra text, just the word hi.",
        "reasoning_details": [
          {
            "type": "reasoning.text",
            "text": "The user wants me to answer with a single word \"hi\". This is a very simple request. I should respond with exactly \"hi\" and nothing else.\n\nWait, let me check the instructions. The user said \"Answer with a single word 'hi'.\" So I should output just: hi\n\nNo punctuation, no extra text, just the word hi.",
            "format": "unknown",
            "index": 0
          }
        ]
      }
    }
  ],
  "usage": {
    "prompt_tokens": 22,
    "completion_tokens": 77,
    "total_tokens": 99,
    "cost": 0,
    "is_byok": true,
    "prompt_tokens_details": {
      "cached_tokens": 0,
      "cache_write_tokens": 0,
      "audio_tokens": 0,
      "video_tokens": 0
    },
    "cost_details": {
      "upstream_inference_cost": 0.0003289,
      "upstream_inference_prompt_cost": 2.09e-05,
      "upstream_inference_completions_cost": 0.000308
    },
    "completion_tokens_details": {
      "reasoning_tokens": 78,
      "image_tokens": 0,
      "audio_tokens": 0
    }
  },
  "cost": "0"
}
```

### Extended Evidence

(These screenshots have also been shared in the [OpenCode Discord's zen channel](https://discord.com/channels/1391832426048651334/1419220713738473503/1497439942467522570)).

Let's take a single message in an OpenCode session as an example. The Usage console (first screenshot) shows a total output of 2,016 tokens, consisting of 1,226 output tokens and 790 reasoning tokens. However, the message log from the local OpenCode database (.local/share/opencode/opencode.db) shows 436 output tokens and 790 reasoning tokens (second screenshot). The cost on the Usage console seems to be based on the 2,016-token count (third screenshot).

It seems that the local OpenCode database correctly treats `completion_tokens` as a token count that includes reasoning token counts. However, as we've seen in the previous section, the reasoning token count can be inaccurate, so we need to be careful in cases where `reasoning_tokens` is larger than `completion_tokens`. Having negative `token.output` values would give us the correct total completion tokens when aggregating, but is semantically incoherent. On the other hand, using `max(0, completion_tokens - reasoning_tokens)` would give us inaccurate overall statistics. I'm not sure how it is currently handled in OpenCode. I just want to flag this potential issue for inspection.

<img width="1075" height="235" alt="Image" src="https://github.com/user-attachments/assets/77c77f50-dbe0-4cb7-ad81-60ae2c421bc6" />

<img width="672" height="548" alt="Image" src="https://github.com/user-attachments/assets/c58e6fae-7bd1-4499-888d-c9f04006f785" />

<img width="700" height="112" alt="Image" src="https://github.com/user-attachments/assets/5f38d147-fa00-4ac2-8d21-e8f831808ca8" />


I've confirmed that OpenRouter charges users based on `completion_tokens` and ignores `reasoning_tokens` (following the norm). Below is an example in which `reasoning_tokens` is larger than `completion_tokens`.


<img width="856" height="738" alt="Image" src="https://github.com/user-attachments/assets/41f59896-bba3-4e88-a742-160628b6ca7c" />

<img width="826" height="952" alt="Image" src="https://github.com/user-attachments/assets/d8a5152c-8c09-4e0a-904d-67e5199db4f4" />

<img width="508" height="59" alt="Image" src="https://github.com/user-attachments/assets/ec079dd3-f894-4e13-8b04-6938df9c6435" />

### Steps to reproduce

Run this python script (requires the `requests` package and the `OPENAI_API_KEY` environment variable set) and check the OpenCode Zen Usage console page: 

```python 
import os
import json

import requests

API_KEY = os.environ.get("OPENAI_API_KEY")
URL = "https://opencode.ai/zen/go/v1/chat/completions"

headers = {
    "Authorization": f"Bearer {API_KEY}",
    "Content-Type": "application/json",
}

payload = {
    "model": "kimi-k2.6",
    "messages": [{"role": "user", "content": 'Hello, this is a test. Answer with a single word "hi".'}],
}

resp = requests.post(URL, headers=headers, json=payload)
resp.raise_for_status()
data = resp.json()

print(json.dumps(data, indent=2))
```

### Screenshot and/or share link

_No response_

### Operating System

_No response_

### Terminal

_No response_

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bug: OpenCode Go/Zen appears to overcount output tokens when reasoning tokens are present #24268

tl;dr

Evidence (Quick Version)

Extended Evidence

Steps to reproduce

Screenshot and/or share link

Operating System

Terminal

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Bug: OpenCode Go/Zen appears to overcount output tokens when reasoning tokens are present #24268

Description

tl;dr

Evidence (Quick Version)

Extended Evidence

Steps to reproduce

Screenshot and/or share link

Operating System

Terminal

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions