Skip to content

Fix token usage accounting buckets#58

Merged
chihsuan merged 10 commits into
mainfrom
auto/RSM-3704
May 21, 2026
Merged

Fix token usage accounting buckets#58
chihsuan merged 10 commits into
mainfrom
auto/RSM-3704

Conversation

@chihsuan
Copy link
Copy Markdown
Member

@chihsuan chihsuan commented May 21, 2026

Context

Codex/OpenAI and Claude/Anthropic report cached input differently, which made token totals overstate Codex usage and undercount Claude cache creation.

TL;DR

Split agent token accounting into comparable uncached, cached, cache-created, and output buckets.

Summary

  • Normalize Codex total input into separate uncached and cached input buckets.
  • Preserve Claude uncached, cache-read, cache-creation, and output token buckets.
  • Propagate provider-neutral usage fields through run state, quality entries, and dashboards.
  • Keep legacy input-token payloads compatible, including Claude-style cache fields.
  • Merge current origin/main and fix merged-tree lint and Dialyzer failures.
  • Document the normalized token bucket contract.

Alternatives

  • Keep one input counter with cache labels; rejected because provider semantics would still differ.

Test Plan

  • make all blocked locally: MCP socket open returns {:mcp_socket_open_failed, :eperm}; 113 tests fail and coverage reports 99.09% against the 100% threshold.
  • mix format --check-formatted
  • mix specs.check
  • mix lint
  • MIX_HOME=/private/tmp/symphony-mix-home HEX_HOME=/private/tmp/symphony-hex-home mix dialyzer --format short
  • MIX_ENV=test mix test test/symphony_elixir/app_server_test.exs:18 test/symphony_elixir/claude_code/app_server_test.exs:275 test/symphony_elixir/orchestrator_status_test.exs:836 test/symphony_elixir/orchestrator_status_test.exs:1314 test/symphony_elixir/orchestrator_status_test.exs:1570 test/symphony_elixir/orchestrator_status_test.exs:1659 test/symphony_elixir/orchestrator_status_test.exs:4832 test/symphony_elixir/status_dashboard_snapshot_test.exs test/symphony_elixir/quality_test.exs
  • git diff --check origin/main...HEAD

chihsuan added 4 commits May 21, 2026 19:03
Summary:
- Track uncached input, cached input, cache creation input, output,
  and total token counters through agent updates, run metadata,
  quality entries, dashboards, and presenter payloads.
- Convert Claude usage into the four-bucket shape and split Codex
  total input into uncached and cached buckets while preserving
  legacy counters for compatibility.
- Update dashboard snapshots and focused tests for Codex cache reads,
  Claude cache creation, legacy counter conversion, and partial totals.

Rationale:
- Codex reports cached prompt tokens inside input_tokens while Claude
  reports cache reads separately, so using a single input counter made
  comparable runs look asymmetric.
- Keeping explicit buckets avoids silent provider-specific semantics and
  lets old payloads continue rendering while new runs expose the clearer
  accounting shape.

Tests:
- mix test test/symphony_elixir/claude_code/app_server_test.exs:125 test/symphony_elixir/claude_code/app_server_test.exs:270
- mix test test/symphony_elixir/orchestrator_status_test.exs test/symphony_elixir/extensions_test.exs test/symphony_elixir/quality_test.exs test/symphony_elixir/status_dashboard_snapshot_test.exs
- mix specs.check
- make all (fails: app-server MCP socket setup returns {:mcp_socket_open_failed, :eperm}, causing coverage to miss 100%)
Summary:
- Add an orchestrator regression test that feeds equivalent Codex and
  Claude cache-heavy usage into running entries.
- Assert both providers report comparable uncached input buckets while
  preserving cached, cache-creation, and output buckets.

Rationale:
- Codex reports cached tokens inside input_tokens while Claude reports
  cache reads and creations separately, so this guards the normalized
  four-bucket accounting from regressing.

Tests:
- mix test test/symphony_elixir/claude_code/app_server_test.exs:273 test/symphony_elixir/orchestrator_status_test.exs:1510 test/symphony_elixir/orchestrator_status_test.exs:1562 test/symphony_elixir/quality_test.exs
- mix test test/symphony_elixir/orchestrator_status_test.exs
- mix specs.check
- mix format --check-formatted
Summary:
- Extract Codex total_token_usage from nested event payload metadata.
- Normalize Codex usage into uncached, cached, cache-creation, output,
  and total token buckets before the orchestrator sees it.
- Add a focused regression test for the Codex cached-token split.

Rationale:
- Codex reports input_tokens as full prompt input including cached
  tokens, so downstream provider-neutral accounting needs the split at
  ingestion time.
- Codex has no cache-creation bucket, so the normalized payload emits
  zero for that field while preserving total token metadata.

Tests:
- mix test test/symphony_elixir/app_server_test.exs:20
- mix test test/symphony_elixir/claude_code/app_server_test.exs:273 test/symphony_elixir/claude_code/app_server_test.exs:124
- mix test test/symphony_elixir/orchestrator_status_test.exs:1615 test/symphony_elixir/orchestrator_status_test.exs:1480
- mix test test/symphony_elixir/quality_test.exs
- mix test test/symphony_elixir/status_dashboard_snapshot_test.exs
- mix test test/symphony_elixir/extensions_test.exs:1916 test/symphony_elixir/extensions_test.exs:2179
- mix format --check-formatted lib/symphony_elixir/codex/app_server.ex test/symphony_elixir/app_server_test.exs
- mix specs.check
- make all (fails: app-server/agent-runner tests cannot open MCP
  sockets in this sandbox with {:mcp_socket_open_failed, :eperm};
  coverage then reports 99.12% below the 100% threshold)
Summary:
- Treat Claude assistant usage input_tokens as the derived total input
  value while preserving uncached, cached, and cache-creation buckets.
- Return total input deltas from orchestrator token extraction alongside
  the provider-neutral bucket deltas.
- Update the token-accounting docs and spec examples for the four-bucket
  token contract and legacy input_tokens compatibility.

Rationale:
- Direct consumers of input_tokens should not see provider-specific
  semantics now that the explicit bucket fields are the source of truth.
- The docs and spec need to match the implementation so dashboards, run
  history, and quality entries have a consistent token contract.

Tests:
- mix format --check-formatted
- mix specs.check
- MIX_ENV=test mix test test/symphony_elixir/app_server_test.exs:18 \
  test/symphony_elixir/claude_code/app_server_test.exs:275 \
  test/symphony_elixir/orchestrator_status_test.exs:836 \
  test/symphony_elixir/orchestrator_status_test.exs:1314 \
  test/symphony_elixir/orchestrator_status_test.exs:1570 \
  test/symphony_elixir/status_dashboard_snapshot_test.exs \
  test/symphony_elixir/quality_test.exs
- make all (fails: MCP Unix socket open returns :eperm in this environment)
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR refactors token usage accounting to make Codex/OpenAI and Claude/Anthropic usage comparable by splitting input tokens into provider-neutral buckets (uncached/new, cached, cache-created) while preserving legacy totals and propagating the new fields through orchestrator state, metadata, UI, and docs.

Changes:

  • Introduces uncached_input_tokens, cached_input_tokens, cache_creation_input_tokens buckets across orchestrator state/totals, quality, presenter/API payloads, and dashboard rendering.
  • Normalizes Codex nested total_token_usage payloads and extends Claude stream-json parsing to include cache-creation tokens.
  • Updates tests and status-dashboard snapshot fixtures to match the new bucketed display/contract.

Reviewed changes

Copilot reviewed 31 out of 31 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
test/symphony_elixir/quality_test.exs Updates expected token map to include cache_creation_input_tokens.
test/symphony_elixir/orchestrator_status_test.exs Expands token accounting assertions and adds coverage for legacy conversion and provider parity.
test/symphony_elixir/extensions_test.exs Updates API/state payload expectations for new token buckets.
test/symphony_elixir/claude_code/app_server_test.exs Extends Claude event parsing tests for uncached + cache creation buckets.
test/symphony_elixir/app_server_test.exs Adds test for normalizing nested Codex total_token_usage metadata.
test/fixtures/status_dashboard_snapshots/watching_issues.snapshot.txt Updates dashboard snapshot token line to bucketed format.
test/fixtures/status_dashboard_snapshots/watching_issues.evidence.md Updates evidence text to bucketed token format.
test/fixtures/status_dashboard_snapshots/super_busy.snapshot.txt Updates dashboard snapshot token line to bucketed format.
test/fixtures/status_dashboard_snapshots/super_busy.evidence.md Updates evidence text to bucketed token format.
test/fixtures/status_dashboard_snapshots/idle.snapshot.txt Updates dashboard snapshot token line to bucketed format.
test/fixtures/status_dashboard_snapshots/idle.evidence.md Updates evidence text to bucketed token format.
test/fixtures/status_dashboard_snapshots/idle_with_dashboard_url.snapshot.txt Updates dashboard snapshot token line to bucketed format.
test/fixtures/status_dashboard_snapshots/idle_with_dashboard_url.evidence.md Updates evidence text to bucketed token format.
test/fixtures/status_dashboard_snapshots/credits_unlimited.snapshot.txt Updates dashboard snapshot token line and row summary text to bucketed format.
test/fixtures/status_dashboard_snapshots/credits_unlimited.evidence.md Updates evidence text to bucketed token format.
test/fixtures/status_dashboard_snapshots/backoff_queue.snapshot.txt Updates dashboard snapshot token line to bucketed format.
test/fixtures/status_dashboard_snapshots/backoff_queue.evidence.md Updates evidence text to bucketed token format.
SPEC.md Documents the provider-neutral token bucket contract and legacy compatibility fields.
lib/symphony_elixir/status_dashboard.ex Renders bucketed token totals and updates event “usage” humanization formatting.
lib/symphony_elixir/quality.ex Normalizes tokens into the new bucketed structure for quality evaluations.
lib/symphony_elixir/orchestrator.ex Implements bucket-aware token delta extraction, accumulation, persistence, and compatibility aliases.
lib/symphony_elixir/one_shot.ex Extends empty token map to include cache_creation_input_tokens.
lib/symphony_elixir/codex/app_server.ex Normalizes Codex usage payloads (including nested total_token_usage) into bucketed fields.
lib/symphony_elixir/claude_code/app_server.ex Includes cache-creation tokens in Claude usage deltas and turn results.
lib/symphony_elixir/audit_log.ex Logs delta/reported fields for the new token buckets.
lib/symphony_elixir/agent_runner.ex Extends review-agent verdict token payload shape to include new buckets.
lib/symphony_elixir_web/presenter.ex Propagates bucketed tokens through API payload normalization and transcript formatting.
lib/symphony_elixir_web/live/quality_live.ex Shows created-cache tokens in the UI token breakdown.
lib/symphony_elixir_web/live/dashboard_live.ex Updates dashboard token labels/lines to reflect bucketed semantics.
docs/token_accounting.md Adds documentation for provider-neutral token buckets and mapping rules.
docs/configuration.md Updates config docs to describe normalized usage buckets and dashboard display.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +5882 to +5913
defp get_token_usage(usage, :uncached_input) do
explicit =
payload_get(usage, [
"input_tokens",
"prompt_tokens",
:input_tokens,
:prompt_tokens,
:input,
"promptTokens",
:promptTokens,
"inputTokens",
:inputTokens
"uncached_input_tokens",
:uncached_input_tokens,
"uncachedInputTokens",
:uncachedInputTokens
])

if is_integer(explicit) do
explicit
else
input =
payload_get(usage, [
"input_tokens",
"prompt_tokens",
:input_tokens,
:prompt_tokens,
:input,
"promptTokens",
:promptTokens,
"inputTokens",
:inputTokens
])

cached = get_token_usage(usage, :cached_input)

cond do
is_integer(input) and is_integer(cached) -> max(input - cached, 0)
is_integer(input) -> input
true -> nil
end
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated in 9c927cd. The uncached-input fallback now subtracts only cached_input_tokens (Codex/OpenAI-style), while Anthropic cache fields (cache_read_input_tokens/cache_creation_input_tokens) keep input_tokens as uncached when explicit uncached tokens are absent.

Comment on lines +2264 to +2285
cached =
parse_integer(
map_value(usage, [
"cached_input_tokens",
:cached_input_tokens,
"cachedInputTokens",
:cachedInputTokens,
"cache_read_input_tokens",
:cache_read_input_tokens
])
)

input =
parse_integer(
map_value(usage, [
"uncached_input_tokens",
:uncached_input_tokens,
"uncachedInputTokens",
:uncachedInputTokens
])
) || if(is_integer(legacy_input) and is_integer(cached), do: max(legacy_input - cached, 0), else: legacy_input)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed in 9c927cd. format_usage_counts/1 now uses subtraction fallback only with cached_input_tokens; Anthropic-style cache fields no longer reduce the new bucket when uncached_input_tokens is missing.

Summary:
- Treat legacy Claude usage maps with cache-read or cache-creation fields as
  Anthropic-style payloads where input_tokens is already uncached input.
- Keep the Codex/OpenAI subtraction fallback for payloads that report
  cached_input_tokens, and cover the dashboard and orchestrator behavior.
- Remove unreachable fallback clauses flagged by Dialyzer in token usage
  helpers.

Rationale:
- The previous compatibility fallback subtracted cached tokens from
  Anthropic-style input_tokens, which could undercount or clamp new tokens to
  zero in old usage maps.

Tests:
- mix format --check-formatted
- mix specs.check
- mix lint
- MIX_HOME=/private/tmp/symphony-mix-home HEX_HOME=/private/tmp/symphony-hex-home
  mix dialyzer --format short
- MIX_ENV=test mix test test/symphony_elixir/app_server_test.exs:18
  test/symphony_elixir/claude_code/app_server_test.exs:275
  test/symphony_elixir/orchestrator_status_test.exs:836
  test/symphony_elixir/orchestrator_status_test.exs:1314
  test/symphony_elixir/orchestrator_status_test.exs:1570
  test/symphony_elixir/orchestrator_status_test.exs:1659
  test/symphony_elixir/orchestrator_status_test.exs:4832
  test/symphony_elixir/status_dashboard_snapshot_test.exs
  test/symphony_elixir/quality_test.exs
- make all blocked locally: MCP socket open returns
  {:mcp_socket_open_failed, :eperm}
chihsuan added 3 commits May 21, 2026 22:47
Summary:
- Merge the token-accounting branch with current origin/main.
- Replace two main-branch lint offenders with equivalent code.
- Remove the unreachable Codex stderr-tail nil branch.

Rationale:
- GitHub validates the PR against the branch merged with current main.
- The merged tree exposed Credo and Dialyzer failures that were not
  present on the older branch tip alone.

Tests:
- mix format --check-formatted
- mix lint
- MIX_HOME=/private/tmp/symphony-mix-home HEX_HOME=/private/tmp/symphony-hex-home mix dialyzer --format short
- MIX_ENV=test mix test test/symphony_elixir/app_server_test.exs:18 test/symphony_elixir/claude_code/app_server_test.exs:275 test/symphony_elixir/orchestrator_status_test.exs:836 test/symphony_elixir/orchestrator_status_test.exs:1314 test/symphony_elixir/orchestrator_status_test.exs:1570 test/symphony_elixir/orchestrator_status_test.exs:1659 test/symphony_elixir/orchestrator_status_test.exs:4832 test/symphony_elixir/status_dashboard_snapshot_test.exs test/symphony_elixir/quality_test.exs
- make all (blocked locally by MCP Unix socket {:mcp_socket_open_failed, :eperm}; 113 failures, coverage 99.09%)
Summary:
- Match the recovered command-completion event in the receive pattern.
- Keep the recovery reason assertion on a separate line for lint.

Rationale:
- The prior line-length fix consumed the first app-server message too broadly,
  making the CI coverage shard fail when session_started arrived first.

Tests:
- mix format --check-formatted
- mix lint
- MIX_ENV=test mix test test/symphony_elixir/app_server_test.exs:1838 (blocked locally by {:mcp_socket_open_failed, :eperm})
@chihsuan chihsuan merged commit b9d661f into main May 21, 2026
11 checks passed
@chihsuan chihsuan deleted the auto/RSM-3704 branch May 25, 2026 06:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants