Add provider-specific OPTIONAL fields to usage.jsonl + drop spec-requires framing by riddim-developer-bot[bot] · Pull Request #5 · RiddimSoftware/groove

riddim-developer-bot · 2026-05-27T20:52:06Z

Summary

Previously the backfilled usage.jsonl dropped three high-signal cost fields:

Codex per-window quota readouts (the most visceral cost-as-it-happens metric)
Codex reasoning-vs-visible output split
Claude cache-tier split (5m vs 1h ephemeral cache writes)

The cost-telemetry spec allows OPTIONAL field additions without bumping schemaVersion (per its own §6.3), so this PR adds them. The bake path is now lossless for every signal the cost analysis actually uses.

Spec additions (`specs/symphony-cost-telemetry-extension/SPEC.md`)

§5.2.1 Input-Token Breakdown

inputUncachedTokens, inputCachedReadTokens, inputCacheWriteTokens
inputCacheWriteEphemeral5mTokens, inputCacheWriteEphemeral1hTokens (Anthropic-only; other writers MUST omit)

§5.2.2 Output-Token Breakdown

outputVisibleTokens, outputReasoningTokens (reasoning is Codex/o-series-only)

§5.2.3 Quota Sample

"quota": {
  "planType": "pro",
  "windows": [
    { "label": "primary",   "windowMinutes": 300,   "usedPercent": 64, "resetsAt": 1779863673 },
    { "label": "secondary", "windowMinutes": 10080, "usedPercent": 57, "resetsAt": 1780187884 }
  ]
}

Generic shape — Codex emits primary (5h) and secondary (7d), but any provider with any number of rate-limit windows fits the structure.

§5.3 Semantics adds SHOULD-sum relations across the breakdown buckets.

Implementation

transcripts/codex.mjs emits per-turn quota samples in the spec's windows shape directly (no lossy reshape).
transcript-to-usage.mjs emits every breakdown field the transcript carries plus the quota object.
usage-aggregator.mjs prefers the breakdown fields when present; falls back to the REQUIRED inputTokens/outputTokens totals when not.
bin/llm-cost.mjs quota printer is now generic over windows (renders every label the provider reports, not hard-coded to primary/secondary).

Framing fixes folded in

The broken "Symphony Telemetry Extension Spec" framing for the workspace convention was still on main (the per-issue-workspace requirement is in OpenAI Symphony's parent SPEC.md §4.1.4 — there is no extension spec for it). Re-applies the correction that got lost when its prior worktree was torn down:

DEFAULT_CWD_PATTERN broadened to match both spec-default <system-temp>/symphony_workspaces/<ID> and the common in-repo <repo>/.symphony/workspaces/<ID>.
Issue-ID character class widened from [A-Z]+-\d+ (Linear-specific) to [A-Za-z0-9._-]+ (matches the spec's workspace_key sanitization rule).

README prose changes (root README + package README) — per the user's explicit guidance:

No prose claims that llm-cost-attribution REQUIRES any extension spec.
The usage.jsonl bake feature is presented as built-in; spec interop with other tools is mentioned only as an optional side-benefit.
The OG Symphony spec (https://github.com/openai/symphony/blob/main/SPEC.md) is cited correctly for the per-issue-workspace convention it actually requires.

End-to-end verification on real data

Re-backfilled the full 4,309 sessions / 5 GB of transcripts on this machine:

	Before this PR	After this PR
`usage.jsonl` size	83 MB	125 MB (40× still smaller than the 5 GB source)
`llm-cost EPAC-1940 --from-usage` output matches transcript-source?	partial (lost quota / cache split / reasoning)	identical
Query time	0.3s	0.3s

Sample backfilled Codex record now includes:

{
  ...
  "inputUncachedTokens": 19975,
  "inputCachedReadTokens": 3456,
  "outputVisibleTokens": 135,
  "outputReasoningTokens": 60,
  "quota": {
    "planType": "pro",
    "windows": [
      { "label": "primary",   "windowMinutes": 300,   "usedPercent": 6, "resetsAt": 1778021087 },
      { "label": "secondary", "windowMinutes": 10080, "usedPercent": 7, "resetsAt": 1778548110 }
    ]
  }
}

Sample backfilled Claude record with cache writes:

{
  ...
  "inputUncachedTokens": 3,
  "inputCachedReadTokens": 0,
  "inputCacheWriteTokens": 45728,
  "inputCacheWriteEphemeral5mTokens": 45728
}

Test plan

33 of 33 tests pass (was 32) via node --test. Added: Symphony-spec-default cwd test, breakdown field tests, quota round-trip test. Updated: existing aggregator quota fixtures to the new spec shape.
node --check clean on every .mjs (including the new src/quota.mjs)
Backfilled 190,481 records and validated every one passes validateUsageRecord
llm-cost EPAC-1940 (transcripts) and llm-cost EPAC-1940 --from-usage <file> produce identical token totals, turn counts, model lists, wall-clock spans, AND quota readouts

Per-issue cost attribution loses too much signal if the bake-to- usage.jsonl path drops Codex quota readouts, Codex reasoning-output, and Claude cache-tier splits. The Symphony Coding-Agent Cost Telemetry Extension spec allows OPTIONAL field additions without bumping schemaVersion (per its own §6.3), so this PR adds them. Spec additions (specs/symphony-cost-telemetry-extension/SPEC.md): §5.2.1 Input-Token Breakdown inputUncachedTokens, inputCachedReadTokens, inputCacheWriteTokens, inputCacheWriteEphemeral5mTokens, inputCacheWriteEphemeral1hTokens (the last two are Anthropic-only; non-Anthropic writers MUST omit) §5.2.2 Output-Token Breakdown outputVisibleTokens, outputReasoningTokens (reasoning is Codex/o-series-only; other providers MUST omit) §5.2.3 Quota Sample quota: { planType, windows: [{label, windowMinutes, usedPercent, resetsAt?}] } Generic shape — Codex uses `primary` (5h) and `secondary` (7d) labels but any provider with any number of windows fits the shape. §5.3 Semantics SHOULD relations for the breakdown sums. Package implementation: - transcripts/codex.mjs now emits per-turn quota samples in the spec shape directly (no lossy reshape later). - transcript-to-usage.mjs emits every breakdown field the transcript carries plus the quota object. - usage-aggregator.mjs prefers breakdown fields when present and falls back to inputTokens/outputTokens totals when not. - bin/llm-cost.mjs's quota printer is now generic over windows (renders every label the provider reports, not hard-coded to primary/secondary). Other fixes folded in (the broken "Symphony Telemetry Extension Spec" framing for the workspace convention was still on main — the per-issue-workspace requirement is in OpenAI Symphony's parent SPEC.md §4.1.4, not an extension): - DEFAULT_CWD_PATTERN broadened to match both the spec-default `<system-temp>/symphony_workspaces/<ID>` and the common in-repo `<repo>/.symphony/workspaces/<ID>` workspace.root settings. - Issue-ID character class widened from [A-Z]+-\d+ (Linear-only) to [A-Za-z0-9._-]+ (matches the spec's workspace_key sanitization rule). - README sections rewritten: usage.jsonl bake is presented as a built-in feature of the package; spec interop is mentioned only as an optional side-benefit. No prose claims that any extension spec is required to use llm-cost. End-to-end verified on real data (4,309 sessions / 5 GB transcripts): - Backfill: 190,481 spec-compliant records in a 125 MB file (was 83 MB before the optional-field additions — the extra ~50% is the breakdown + quota payload). Still 40x smaller than the 5 GB source. - Read-back: `llm-cost EPAC-1940 --from-usage <backfilled-file>` now produces an IDENTICAL output to the transcript-source path, including the Codex 5h/7d quota readout (58 -> 64% / 56 -> 57%), the cache-read 51M-token split, and the reasoning-output 18,649 split. - 0.3s query time vs ~3min for transcript scan, unchanged. 33 tests pass (was 32) — adds Symphony-spec-default cwd test, adds tests for each new breakdown/quota OPTIONAL field, updates existing aggregator quota fixtures to the new spec shape.

riddim-developer-bot Bot enabled auto-merge (squash) May 27, 2026 20:52

riddim-developer-bot Bot merged commit d940e83 into main May 27, 2026
2 checks passed

riddim-developer-bot Bot deleted the sunny/cost-spec-token-breakdown-and-quota branch May 27, 2026 20:52

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add provider-specific OPTIONAL fields to usage.jsonl + drop spec-requires framing#5

Add provider-specific OPTIONAL fields to usage.jsonl + drop spec-requires framing#5
riddim-developer-bot[bot] merged 1 commit into
mainfrom
sunny/cost-spec-token-breakdown-and-quota

riddim-developer-bot Bot commented May 27, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

riddim-developer-bot Bot commented May 27, 2026

Summary

Spec additions (specs/symphony-cost-telemetry-extension/SPEC.md)

Implementation

Framing fixes folded in

End-to-end verification on real data

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Spec additions (`specs/symphony-cost-telemetry-extension/SPEC.md`)