Fix token usage accounting buckets#58
Conversation
Summary:
- Track uncached input, cached input, cache creation input, output,
and total token counters through agent updates, run metadata,
quality entries, dashboards, and presenter payloads.
- Convert Claude usage into the four-bucket shape and split Codex
total input into uncached and cached buckets while preserving
legacy counters for compatibility.
- Update dashboard snapshots and focused tests for Codex cache reads,
Claude cache creation, legacy counter conversion, and partial totals.
Rationale:
- Codex reports cached prompt tokens inside input_tokens while Claude
reports cache reads separately, so using a single input counter made
comparable runs look asymmetric.
- Keeping explicit buckets avoids silent provider-specific semantics and
lets old payloads continue rendering while new runs expose the clearer
accounting shape.
Tests:
- mix test test/symphony_elixir/claude_code/app_server_test.exs:125 test/symphony_elixir/claude_code/app_server_test.exs:270
- mix test test/symphony_elixir/orchestrator_status_test.exs test/symphony_elixir/extensions_test.exs test/symphony_elixir/quality_test.exs test/symphony_elixir/status_dashboard_snapshot_test.exs
- mix specs.check
- make all (fails: app-server MCP socket setup returns {:mcp_socket_open_failed, :eperm}, causing coverage to miss 100%)
Summary: - Add an orchestrator regression test that feeds equivalent Codex and Claude cache-heavy usage into running entries. - Assert both providers report comparable uncached input buckets while preserving cached, cache-creation, and output buckets. Rationale: - Codex reports cached tokens inside input_tokens while Claude reports cache reads and creations separately, so this guards the normalized four-bucket accounting from regressing. Tests: - mix test test/symphony_elixir/claude_code/app_server_test.exs:273 test/symphony_elixir/orchestrator_status_test.exs:1510 test/symphony_elixir/orchestrator_status_test.exs:1562 test/symphony_elixir/quality_test.exs - mix test test/symphony_elixir/orchestrator_status_test.exs - mix specs.check - mix format --check-formatted
Summary:
- Extract Codex total_token_usage from nested event payload metadata.
- Normalize Codex usage into uncached, cached, cache-creation, output,
and total token buckets before the orchestrator sees it.
- Add a focused regression test for the Codex cached-token split.
Rationale:
- Codex reports input_tokens as full prompt input including cached
tokens, so downstream provider-neutral accounting needs the split at
ingestion time.
- Codex has no cache-creation bucket, so the normalized payload emits
zero for that field while preserving total token metadata.
Tests:
- mix test test/symphony_elixir/app_server_test.exs:20
- mix test test/symphony_elixir/claude_code/app_server_test.exs:273 test/symphony_elixir/claude_code/app_server_test.exs:124
- mix test test/symphony_elixir/orchestrator_status_test.exs:1615 test/symphony_elixir/orchestrator_status_test.exs:1480
- mix test test/symphony_elixir/quality_test.exs
- mix test test/symphony_elixir/status_dashboard_snapshot_test.exs
- mix test test/symphony_elixir/extensions_test.exs:1916 test/symphony_elixir/extensions_test.exs:2179
- mix format --check-formatted lib/symphony_elixir/codex/app_server.ex test/symphony_elixir/app_server_test.exs
- mix specs.check
- make all (fails: app-server/agent-runner tests cannot open MCP
sockets in this sandbox with {:mcp_socket_open_failed, :eperm};
coverage then reports 99.12% below the 100% threshold)
Summary: - Treat Claude assistant usage input_tokens as the derived total input value while preserving uncached, cached, and cache-creation buckets. - Return total input deltas from orchestrator token extraction alongside the provider-neutral bucket deltas. - Update the token-accounting docs and spec examples for the four-bucket token contract and legacy input_tokens compatibility. Rationale: - Direct consumers of input_tokens should not see provider-specific semantics now that the explicit bucket fields are the source of truth. - The docs and spec need to match the implementation so dashboards, run history, and quality entries have a consistent token contract. Tests: - mix format --check-formatted - mix specs.check - MIX_ENV=test mix test test/symphony_elixir/app_server_test.exs:18 \ test/symphony_elixir/claude_code/app_server_test.exs:275 \ test/symphony_elixir/orchestrator_status_test.exs:836 \ test/symphony_elixir/orchestrator_status_test.exs:1314 \ test/symphony_elixir/orchestrator_status_test.exs:1570 \ test/symphony_elixir/status_dashboard_snapshot_test.exs \ test/symphony_elixir/quality_test.exs - make all (fails: MCP Unix socket open returns :eperm in this environment)
There was a problem hiding this comment.
Pull request overview
This PR refactors token usage accounting to make Codex/OpenAI and Claude/Anthropic usage comparable by splitting input tokens into provider-neutral buckets (uncached/new, cached, cache-created) while preserving legacy totals and propagating the new fields through orchestrator state, metadata, UI, and docs.
Changes:
- Introduces
uncached_input_tokens,cached_input_tokens,cache_creation_input_tokensbuckets across orchestrator state/totals, quality, presenter/API payloads, and dashboard rendering. - Normalizes Codex nested
total_token_usagepayloads and extends Claude stream-json parsing to include cache-creation tokens. - Updates tests and status-dashboard snapshot fixtures to match the new bucketed display/contract.
Reviewed changes
Copilot reviewed 31 out of 31 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| test/symphony_elixir/quality_test.exs | Updates expected token map to include cache_creation_input_tokens. |
| test/symphony_elixir/orchestrator_status_test.exs | Expands token accounting assertions and adds coverage for legacy conversion and provider parity. |
| test/symphony_elixir/extensions_test.exs | Updates API/state payload expectations for new token buckets. |
| test/symphony_elixir/claude_code/app_server_test.exs | Extends Claude event parsing tests for uncached + cache creation buckets. |
| test/symphony_elixir/app_server_test.exs | Adds test for normalizing nested Codex total_token_usage metadata. |
| test/fixtures/status_dashboard_snapshots/watching_issues.snapshot.txt | Updates dashboard snapshot token line to bucketed format. |
| test/fixtures/status_dashboard_snapshots/watching_issues.evidence.md | Updates evidence text to bucketed token format. |
| test/fixtures/status_dashboard_snapshots/super_busy.snapshot.txt | Updates dashboard snapshot token line to bucketed format. |
| test/fixtures/status_dashboard_snapshots/super_busy.evidence.md | Updates evidence text to bucketed token format. |
| test/fixtures/status_dashboard_snapshots/idle.snapshot.txt | Updates dashboard snapshot token line to bucketed format. |
| test/fixtures/status_dashboard_snapshots/idle.evidence.md | Updates evidence text to bucketed token format. |
| test/fixtures/status_dashboard_snapshots/idle_with_dashboard_url.snapshot.txt | Updates dashboard snapshot token line to bucketed format. |
| test/fixtures/status_dashboard_snapshots/idle_with_dashboard_url.evidence.md | Updates evidence text to bucketed token format. |
| test/fixtures/status_dashboard_snapshots/credits_unlimited.snapshot.txt | Updates dashboard snapshot token line and row summary text to bucketed format. |
| test/fixtures/status_dashboard_snapshots/credits_unlimited.evidence.md | Updates evidence text to bucketed token format. |
| test/fixtures/status_dashboard_snapshots/backoff_queue.snapshot.txt | Updates dashboard snapshot token line to bucketed format. |
| test/fixtures/status_dashboard_snapshots/backoff_queue.evidence.md | Updates evidence text to bucketed token format. |
| SPEC.md | Documents the provider-neutral token bucket contract and legacy compatibility fields. |
| lib/symphony_elixir/status_dashboard.ex | Renders bucketed token totals and updates event “usage” humanization formatting. |
| lib/symphony_elixir/quality.ex | Normalizes tokens into the new bucketed structure for quality evaluations. |
| lib/symphony_elixir/orchestrator.ex | Implements bucket-aware token delta extraction, accumulation, persistence, and compatibility aliases. |
| lib/symphony_elixir/one_shot.ex | Extends empty token map to include cache_creation_input_tokens. |
| lib/symphony_elixir/codex/app_server.ex | Normalizes Codex usage payloads (including nested total_token_usage) into bucketed fields. |
| lib/symphony_elixir/claude_code/app_server.ex | Includes cache-creation tokens in Claude usage deltas and turn results. |
| lib/symphony_elixir/audit_log.ex | Logs delta/reported fields for the new token buckets. |
| lib/symphony_elixir/agent_runner.ex | Extends review-agent verdict token payload shape to include new buckets. |
| lib/symphony_elixir_web/presenter.ex | Propagates bucketed tokens through API payload normalization and transcript formatting. |
| lib/symphony_elixir_web/live/quality_live.ex | Shows created-cache tokens in the UI token breakdown. |
| lib/symphony_elixir_web/live/dashboard_live.ex | Updates dashboard token labels/lines to reflect bucketed semantics. |
| docs/token_accounting.md | Adds documentation for provider-neutral token buckets and mapping rules. |
| docs/configuration.md | Updates config docs to describe normalized usage buckets and dashboard display. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| defp get_token_usage(usage, :uncached_input) do | ||
| explicit = | ||
| payload_get(usage, [ | ||
| "input_tokens", | ||
| "prompt_tokens", | ||
| :input_tokens, | ||
| :prompt_tokens, | ||
| :input, | ||
| "promptTokens", | ||
| :promptTokens, | ||
| "inputTokens", | ||
| :inputTokens | ||
| "uncached_input_tokens", | ||
| :uncached_input_tokens, | ||
| "uncachedInputTokens", | ||
| :uncachedInputTokens | ||
| ]) | ||
|
|
||
| if is_integer(explicit) do | ||
| explicit | ||
| else | ||
| input = | ||
| payload_get(usage, [ | ||
| "input_tokens", | ||
| "prompt_tokens", | ||
| :input_tokens, | ||
| :prompt_tokens, | ||
| :input, | ||
| "promptTokens", | ||
| :promptTokens, | ||
| "inputTokens", | ||
| :inputTokens | ||
| ]) | ||
|
|
||
| cached = get_token_usage(usage, :cached_input) | ||
|
|
||
| cond do | ||
| is_integer(input) and is_integer(cached) -> max(input - cached, 0) | ||
| is_integer(input) -> input | ||
| true -> nil | ||
| end |
There was a problem hiding this comment.
Updated in 9c927cd. The uncached-input fallback now subtracts only cached_input_tokens (Codex/OpenAI-style), while Anthropic cache fields (cache_read_input_tokens/cache_creation_input_tokens) keep input_tokens as uncached when explicit uncached tokens are absent.
| cached = | ||
| parse_integer( | ||
| map_value(usage, [ | ||
| "cached_input_tokens", | ||
| :cached_input_tokens, | ||
| "cachedInputTokens", | ||
| :cachedInputTokens, | ||
| "cache_read_input_tokens", | ||
| :cache_read_input_tokens | ||
| ]) | ||
| ) | ||
|
|
||
| input = | ||
| parse_integer( | ||
| map_value(usage, [ | ||
| "uncached_input_tokens", | ||
| :uncached_input_tokens, | ||
| "uncachedInputTokens", | ||
| :uncachedInputTokens | ||
| ]) | ||
| ) || if(is_integer(legacy_input) and is_integer(cached), do: max(legacy_input - cached, 0), else: legacy_input) | ||
|
|
There was a problem hiding this comment.
Addressed in 9c927cd. format_usage_counts/1 now uses subtraction fallback only with cached_input_tokens; Anthropic-style cache fields no longer reduce the new bucket when uncached_input_tokens is missing.
Summary:
- Treat legacy Claude usage maps with cache-read or cache-creation fields as
Anthropic-style payloads where input_tokens is already uncached input.
- Keep the Codex/OpenAI subtraction fallback for payloads that report
cached_input_tokens, and cover the dashboard and orchestrator behavior.
- Remove unreachable fallback clauses flagged by Dialyzer in token usage
helpers.
Rationale:
- The previous compatibility fallback subtracted cached tokens from
Anthropic-style input_tokens, which could undercount or clamp new tokens to
zero in old usage maps.
Tests:
- mix format --check-formatted
- mix specs.check
- mix lint
- MIX_HOME=/private/tmp/symphony-mix-home HEX_HOME=/private/tmp/symphony-hex-home
mix dialyzer --format short
- MIX_ENV=test mix test test/symphony_elixir/app_server_test.exs:18
test/symphony_elixir/claude_code/app_server_test.exs:275
test/symphony_elixir/orchestrator_status_test.exs:836
test/symphony_elixir/orchestrator_status_test.exs:1314
test/symphony_elixir/orchestrator_status_test.exs:1570
test/symphony_elixir/orchestrator_status_test.exs:1659
test/symphony_elixir/orchestrator_status_test.exs:4832
test/symphony_elixir/status_dashboard_snapshot_test.exs
test/symphony_elixir/quality_test.exs
- make all blocked locally: MCP socket open returns
{:mcp_socket_open_failed, :eperm}
Agent-Logs-Url: https://github.com/Automattic/symphony/sessions/8ba1a677-73a6-492d-ae5d-a24148faad5b Co-authored-by: chihsuan <4344253+chihsuan@users.noreply.github.com>
Summary:
- Merge the token-accounting branch with current origin/main.
- Replace two main-branch lint offenders with equivalent code.
- Remove the unreachable Codex stderr-tail nil branch.
Rationale:
- GitHub validates the PR against the branch merged with current main.
- The merged tree exposed Credo and Dialyzer failures that were not
present on the older branch tip alone.
Tests:
- mix format --check-formatted
- mix lint
- MIX_HOME=/private/tmp/symphony-mix-home HEX_HOME=/private/tmp/symphony-hex-home mix dialyzer --format short
- MIX_ENV=test mix test test/symphony_elixir/app_server_test.exs:18 test/symphony_elixir/claude_code/app_server_test.exs:275 test/symphony_elixir/orchestrator_status_test.exs:836 test/symphony_elixir/orchestrator_status_test.exs:1314 test/symphony_elixir/orchestrator_status_test.exs:1570 test/symphony_elixir/orchestrator_status_test.exs:1659 test/symphony_elixir/orchestrator_status_test.exs:4832 test/symphony_elixir/status_dashboard_snapshot_test.exs test/symphony_elixir/quality_test.exs
- make all (blocked locally by MCP Unix socket {:mcp_socket_open_failed, :eperm}; 113 failures, coverage 99.09%)
Summary:
- Match the recovered command-completion event in the receive pattern.
- Keep the recovery reason assertion on a separate line for lint.
Rationale:
- The prior line-length fix consumed the first app-server message too broadly,
making the CI coverage shard fail when session_started arrived first.
Tests:
- mix format --check-formatted
- mix lint
- MIX_ENV=test mix test test/symphony_elixir/app_server_test.exs:1838 (blocked locally by {:mcp_socket_open_failed, :eperm})
Context
Codex/OpenAI and Claude/Anthropic report cached input differently, which made token totals overstate Codex usage and undercount Claude cache creation.
TL;DR
Split agent token accounting into comparable uncached, cached, cache-created, and output buckets.
Summary
origin/mainand fix merged-tree lint and Dialyzer failures.Alternatives
Test Plan
make allblocked locally: MCP socket open returns{:mcp_socket_open_failed, :eperm}; 113 tests fail and coverage reports 99.09% against the 100% threshold.mix format --check-formattedmix specs.checkmix lintMIX_HOME=/private/tmp/symphony-mix-home HEX_HOME=/private/tmp/symphony-hex-home mix dialyzer --format shortMIX_ENV=test mix test test/symphony_elixir/app_server_test.exs:18 test/symphony_elixir/claude_code/app_server_test.exs:275 test/symphony_elixir/orchestrator_status_test.exs:836 test/symphony_elixir/orchestrator_status_test.exs:1314 test/symphony_elixir/orchestrator_status_test.exs:1570 test/symphony_elixir/orchestrator_status_test.exs:1659 test/symphony_elixir/orchestrator_status_test.exs:4832 test/symphony_elixir/status_dashboard_snapshot_test.exs test/symphony_elixir/quality_test.exsgit diff --check origin/main...HEAD