Fix token usage accounting buckets by chihsuan · Pull Request #58 · Automattic/symphony

chihsuan · 2026-05-21T14:05:49Z

Context

Codex/OpenAI and Claude/Anthropic report cached input differently, which made token totals overstate Codex usage and undercount Claude cache creation.

TL;DR

Split agent token accounting into comparable uncached, cached, cache-created, and output buckets.

Summary

Normalize Codex total input into separate uncached and cached input buckets.
Preserve Claude uncached, cache-read, cache-creation, and output token buckets.
Propagate provider-neutral usage fields through run state, quality entries, and dashboards.
Keep legacy input-token payloads compatible, including Claude-style cache fields.
Merge current origin/main and fix merged-tree lint and Dialyzer failures.
Document the normalized token bucket contract.

Alternatives

Keep one input counter with cache labels; rejected because provider semantics would still differ.

Test Plan

make all blocked locally: MCP socket open returns {:mcp_socket_open_failed, :eperm}; 113 tests fail and coverage reports 99.09% against the 100% threshold.
mix format --check-formatted
mix specs.check
mix lint
MIX_HOME=/private/tmp/symphony-mix-home HEX_HOME=/private/tmp/symphony-hex-home mix dialyzer --format short
MIX_ENV=test mix test test/symphony_elixir/app_server_test.exs:18 test/symphony_elixir/claude_code/app_server_test.exs:275 test/symphony_elixir/orchestrator_status_test.exs:836 test/symphony_elixir/orchestrator_status_test.exs:1314 test/symphony_elixir/orchestrator_status_test.exs:1570 test/symphony_elixir/orchestrator_status_test.exs:1659 test/symphony_elixir/orchestrator_status_test.exs:4832 test/symphony_elixir/status_dashboard_snapshot_test.exs test/symphony_elixir/quality_test.exs
git diff --check origin/main...HEAD

Summary: - Track uncached input, cached input, cache creation input, output, and total token counters through agent updates, run metadata, quality entries, dashboards, and presenter payloads. - Convert Claude usage into the four-bucket shape and split Codex total input into uncached and cached buckets while preserving legacy counters for compatibility. - Update dashboard snapshots and focused tests for Codex cache reads, Claude cache creation, legacy counter conversion, and partial totals. Rationale: - Codex reports cached prompt tokens inside input_tokens while Claude reports cache reads separately, so using a single input counter made comparable runs look asymmetric. - Keeping explicit buckets avoids silent provider-specific semantics and lets old payloads continue rendering while new runs expose the clearer accounting shape. Tests: - mix test test/symphony_elixir/claude_code/app_server_test.exs:125 test/symphony_elixir/claude_code/app_server_test.exs:270 - mix test test/symphony_elixir/orchestrator_status_test.exs test/symphony_elixir/extensions_test.exs test/symphony_elixir/quality_test.exs test/symphony_elixir/status_dashboard_snapshot_test.exs - mix specs.check - make all (fails: app-server MCP socket setup returns {:mcp_socket_open_failed, :eperm}, causing coverage to miss 100%)

Summary: - Add an orchestrator regression test that feeds equivalent Codex and Claude cache-heavy usage into running entries. - Assert both providers report comparable uncached input buckets while preserving cached, cache-creation, and output buckets. Rationale: - Codex reports cached tokens inside input_tokens while Claude reports cache reads and creations separately, so this guards the normalized four-bucket accounting from regressing. Tests: - mix test test/symphony_elixir/claude_code/app_server_test.exs:273 test/symphony_elixir/orchestrator_status_test.exs:1510 test/symphony_elixir/orchestrator_status_test.exs:1562 test/symphony_elixir/quality_test.exs - mix test test/symphony_elixir/orchestrator_status_test.exs - mix specs.check - mix format --check-formatted

Summary: - Extract Codex total_token_usage from nested event payload metadata. - Normalize Codex usage into uncached, cached, cache-creation, output, and total token buckets before the orchestrator sees it. - Add a focused regression test for the Codex cached-token split. Rationale: - Codex reports input_tokens as full prompt input including cached tokens, so downstream provider-neutral accounting needs the split at ingestion time. - Codex has no cache-creation bucket, so the normalized payload emits zero for that field while preserving total token metadata. Tests: - mix test test/symphony_elixir/app_server_test.exs:20 - mix test test/symphony_elixir/claude_code/app_server_test.exs:273 test/symphony_elixir/claude_code/app_server_test.exs:124 - mix test test/symphony_elixir/orchestrator_status_test.exs:1615 test/symphony_elixir/orchestrator_status_test.exs:1480 - mix test test/symphony_elixir/quality_test.exs - mix test test/symphony_elixir/status_dashboard_snapshot_test.exs - mix test test/symphony_elixir/extensions_test.exs:1916 test/symphony_elixir/extensions_test.exs:2179 - mix format --check-formatted lib/symphony_elixir/codex/app_server.ex test/symphony_elixir/app_server_test.exs - mix specs.check - make all (fails: app-server/agent-runner tests cannot open MCP sockets in this sandbox with {:mcp_socket_open_failed, :eperm}; coverage then reports 99.12% below the 100% threshold)

Summary: - Treat Claude assistant usage input_tokens as the derived total input value while preserving uncached, cached, and cache-creation buckets. - Return total input deltas from orchestrator token extraction alongside the provider-neutral bucket deltas. - Update the token-accounting docs and spec examples for the four-bucket token contract and legacy input_tokens compatibility. Rationale: - Direct consumers of input_tokens should not see provider-specific semantics now that the explicit bucket fields are the source of truth. - The docs and spec need to match the implementation so dashboards, run history, and quality entries have a consistent token contract. Tests: - mix format --check-formatted - mix specs.check - MIX_ENV=test mix test test/symphony_elixir/app_server_test.exs:18 \ test/symphony_elixir/claude_code/app_server_test.exs:275 \ test/symphony_elixir/orchestrator_status_test.exs:836 \ test/symphony_elixir/orchestrator_status_test.exs:1314 \ test/symphony_elixir/orchestrator_status_test.exs:1570 \ test/symphony_elixir/status_dashboard_snapshot_test.exs \ test/symphony_elixir/quality_test.exs - make all (fails: MCP Unix socket open returns :eperm in this environment)

Copilot

Pull request overview

This PR refactors token usage accounting to make Codex/OpenAI and Claude/Anthropic usage comparable by splitting input tokens into provider-neutral buckets (uncached/new, cached, cache-created) while preserving legacy totals and propagating the new fields through orchestrator state, metadata, UI, and docs.

Changes:

Introduces uncached_input_tokens, cached_input_tokens, cache_creation_input_tokens buckets across orchestrator state/totals, quality, presenter/API payloads, and dashboard rendering.
Normalizes Codex nested total_token_usage payloads and extends Claude stream-json parsing to include cache-creation tokens.
Updates tests and status-dashboard snapshot fixtures to match the new bucketed display/contract.

Reviewed changes

Copilot reviewed 31 out of 31 changed files in this pull request and generated 2 comments.

Show a summary per file

File	Description
test/symphony_elixir/quality_test.exs	Updates expected token map to include `cache_creation_input_tokens`.
test/symphony_elixir/orchestrator_status_test.exs	Expands token accounting assertions and adds coverage for legacy conversion and provider parity.
test/symphony_elixir/extensions_test.exs	Updates API/state payload expectations for new token buckets.
test/symphony_elixir/claude_code/app_server_test.exs	Extends Claude event parsing tests for uncached + cache creation buckets.
test/symphony_elixir/app_server_test.exs	Adds test for normalizing nested Codex `total_token_usage` metadata.
test/fixtures/status_dashboard_snapshots/watching_issues.snapshot.txt	Updates dashboard snapshot token line to bucketed format.
test/fixtures/status_dashboard_snapshots/watching_issues.evidence.md	Updates evidence text to bucketed token format.
test/fixtures/status_dashboard_snapshots/super_busy.snapshot.txt	Updates dashboard snapshot token line to bucketed format.
test/fixtures/status_dashboard_snapshots/super_busy.evidence.md	Updates evidence text to bucketed token format.
test/fixtures/status_dashboard_snapshots/idle.snapshot.txt	Updates dashboard snapshot token line to bucketed format.
test/fixtures/status_dashboard_snapshots/idle.evidence.md	Updates evidence text to bucketed token format.
test/fixtures/status_dashboard_snapshots/idle_with_dashboard_url.snapshot.txt	Updates dashboard snapshot token line to bucketed format.
test/fixtures/status_dashboard_snapshots/idle_with_dashboard_url.evidence.md	Updates evidence text to bucketed token format.
test/fixtures/status_dashboard_snapshots/credits_unlimited.snapshot.txt	Updates dashboard snapshot token line and row summary text to bucketed format.
test/fixtures/status_dashboard_snapshots/credits_unlimited.evidence.md	Updates evidence text to bucketed token format.
test/fixtures/status_dashboard_snapshots/backoff_queue.snapshot.txt	Updates dashboard snapshot token line to bucketed format.
test/fixtures/status_dashboard_snapshots/backoff_queue.evidence.md	Updates evidence text to bucketed token format.
SPEC.md	Documents the provider-neutral token bucket contract and legacy compatibility fields.
lib/symphony_elixir/status_dashboard.ex	Renders bucketed token totals and updates event “usage” humanization formatting.
lib/symphony_elixir/quality.ex	Normalizes tokens into the new bucketed structure for quality evaluations.
lib/symphony_elixir/orchestrator.ex	Implements bucket-aware token delta extraction, accumulation, persistence, and compatibility aliases.
lib/symphony_elixir/one_shot.ex	Extends empty token map to include `cache_creation_input_tokens`.
lib/symphony_elixir/codex/app_server.ex	Normalizes Codex usage payloads (including nested `total_token_usage`) into bucketed fields.
lib/symphony_elixir/claude_code/app_server.ex	Includes cache-creation tokens in Claude usage deltas and turn results.
lib/symphony_elixir/audit_log.ex	Logs delta/reported fields for the new token buckets.
lib/symphony_elixir/agent_runner.ex	Extends review-agent verdict token payload shape to include new buckets.
lib/symphony_elixir_web/presenter.ex	Propagates bucketed tokens through API payload normalization and transcript formatting.
lib/symphony_elixir_web/live/quality_live.ex	Shows created-cache tokens in the UI token breakdown.
lib/symphony_elixir_web/live/dashboard_live.ex	Updates dashboard token labels/lines to reflect bucketed semantics.
docs/token_accounting.md	Adds documentation for provider-neutral token buckets and mapping rules.
docs/configuration.md	Updates config docs to describe normalized usage buckets and dashboard display.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-05-21T14:42:53Z

+  defp get_token_usage(usage, :uncached_input) do
+    explicit =
      payload_get(usage, [
-        "input_tokens",
-        "prompt_tokens",
-        :input_tokens,
-        :prompt_tokens,
-        :input,
-        "promptTokens",
-        :promptTokens,
-        "inputTokens",
-        :inputTokens
+        "uncached_input_tokens",
+        :uncached_input_tokens,
+        "uncachedInputTokens",
+        :uncachedInputTokens
      ])

+    if is_integer(explicit) do
+      explicit
+    else
+      input =
+        payload_get(usage, [
+          "input_tokens",
+          "prompt_tokens",
+          :input_tokens,
+          :prompt_tokens,
+          :input,
+          "promptTokens",
+          :promptTokens,
+          "inputTokens",
+          :inputTokens
+        ])
+
+      cached = get_token_usage(usage, :cached_input)
+
+      cond do
+        is_integer(input) and is_integer(cached) -> max(input - cached, 0)
+        is_integer(input) -> input
+        true -> nil
+      end


Updated in 9c927cd. The uncached-input fallback now subtracts only cached_input_tokens (Codex/OpenAI-style), while Anthropic cache fields (cache_read_input_tokens/cache_creation_input_tokens) keep input_tokens as uncached when explicit uncached tokens are absent.

Copilot · 2026-05-21T14:42:57Z

+    cached =
+      parse_integer(
+        map_value(usage, [
+          "cached_input_tokens",
+          :cached_input_tokens,
+          "cachedInputTokens",
+          :cachedInputTokens,
+          "cache_read_input_tokens",
+          :cache_read_input_tokens
+        ])
+      )
+
+    input =
+      parse_integer(
+        map_value(usage, [
+          "uncached_input_tokens",
+          :uncached_input_tokens,
+          "uncachedInputTokens",
+          :uncachedInputTokens
+        ])
+      ) || if(is_integer(legacy_input) and is_integer(cached), do: max(legacy_input - cached, 0), else: legacy_input)
+


Addressed in 9c927cd. format_usage_counts/1 now uses subtraction fallback only with cached_input_tokens; Anthropic-style cache fields no longer reduce the new bucket when uncached_input_tokens is missing.

Summary: - Treat legacy Claude usage maps with cache-read or cache-creation fields as Anthropic-style payloads where input_tokens is already uncached input. - Keep the Codex/OpenAI subtraction fallback for payloads that report cached_input_tokens, and cover the dashboard and orchestrator behavior. - Remove unreachable fallback clauses flagged by Dialyzer in token usage helpers. Rationale: - The previous compatibility fallback subtracted cached tokens from Anthropic-style input_tokens, which could undercount or clamp new tokens to zero in old usage maps. Tests: - mix format --check-formatted - mix specs.check - mix lint - MIX_HOME=/private/tmp/symphony-mix-home HEX_HOME=/private/tmp/symphony-hex-home mix dialyzer --format short - MIX_ENV=test mix test test/symphony_elixir/app_server_test.exs:18 test/symphony_elixir/claude_code/app_server_test.exs:275 test/symphony_elixir/orchestrator_status_test.exs:836 test/symphony_elixir/orchestrator_status_test.exs:1314 test/symphony_elixir/orchestrator_status_test.exs:1570 test/symphony_elixir/orchestrator_status_test.exs:1659 test/symphony_elixir/orchestrator_status_test.exs:4832 test/symphony_elixir/status_dashboard_snapshot_test.exs test/symphony_elixir/quality_test.exs - make all blocked locally: MCP socket open returns {:mcp_socket_open_failed, :eperm}

Agent-Logs-Url: https://github.com/Automattic/symphony/sessions/8ba1a677-73a6-492d-ae5d-a24148faad5b Co-authored-by: chihsuan <4344253+chihsuan@users.noreply.github.com>

Summary: - Merge the token-accounting branch with current origin/main. - Replace two main-branch lint offenders with equivalent code. - Remove the unreachable Codex stderr-tail nil branch. Rationale: - GitHub validates the PR against the branch merged with current main. - The merged tree exposed Credo and Dialyzer failures that were not present on the older branch tip alone. Tests: - mix format --check-formatted - mix lint - MIX_HOME=/private/tmp/symphony-mix-home HEX_HOME=/private/tmp/symphony-hex-home mix dialyzer --format short - MIX_ENV=test mix test test/symphony_elixir/app_server_test.exs:18 test/symphony_elixir/claude_code/app_server_test.exs:275 test/symphony_elixir/orchestrator_status_test.exs:836 test/symphony_elixir/orchestrator_status_test.exs:1314 test/symphony_elixir/orchestrator_status_test.exs:1570 test/symphony_elixir/orchestrator_status_test.exs:1659 test/symphony_elixir/orchestrator_status_test.exs:4832 test/symphony_elixir/status_dashboard_snapshot_test.exs test/symphony_elixir/quality_test.exs - make all (blocked locally by MCP Unix socket {:mcp_socket_open_failed, :eperm}; 113 failures, coverage 99.09%)

Summary: - Match the recovered command-completion event in the receive pattern. - Keep the recovery reason assertion on a separate line for lint. Rationale: - The prior line-length fix consumed the first app-server message too broadly, making the CI coverage shard fail when session_started arrived first. Tests: - mix format --check-formatted - mix lint - MIX_ENV=test mix test test/symphony_elixir/app_server_test.exs:1838 (blocked locally by {:mcp_socket_open_failed, :eperm})

chihsuan added 4 commits May 21, 2026 19:03

chihsuan requested a review from Copilot May 21, 2026 14:06

Copilot started reviewing on behalf of chihsuan May 21, 2026 14:06 View session

Copilot AI reviewed May 21, 2026

View reviewed changes

Copilot started work on behalf of chihsuan May 21, 2026 14:33 View session

Copilot AI and others added 2 commits May 21, 2026 14:41

fix(tokens): only subtract cached input for codex-style usage

9c927cd

Agent-Logs-Url: https://github.com/Automattic/symphony/sessions/8ba1a677-73a6-492d-ae5d-a24148faad5b Co-authored-by: chihsuan <4344253+chihsuan@users.noreply.github.com>

Merge remote-tracking branch 'origin/main' into auto/RSM-3704

d0f83fe

Copilot finished work on behalf of chihsuan May 21, 2026 14:43

chihsuan added 3 commits May 21, 2026 22:47

Merge remote-tracking branch 'origin/auto/RSM-3704' into auto/RSM-3704

c24b7b6

chihsuan merged commit b9d661f into main May 21, 2026
11 checks passed

chihsuan deleted the auto/RSM-3704 branch May 25, 2026 06:00

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix token usage accounting buckets#58

Fix token usage accounting buckets#58
chihsuan merged 10 commits into
mainfrom
auto/RSM-3704

chihsuan commented May 21, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI May 21, 2026

Uh oh!

Copilot AI May 21, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

chihsuan commented May 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Context

TL;DR

Summary

Alternatives

Test Plan

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI May 21, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI May 21, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

chihsuan commented May 21, 2026 •

edited

Loading