feat(inference): add "Measured J per Token" metric (input + output)#393
Merged
Conversation
…nominator) Adds a third option to the gated "Measured Energy" dropdown group: - Measured J per Token (J/total-token: system energy / (input + output)) Distinct from the existing "Measured J per Output Token" which divides only by output tokens (treating the prompt as free). For workload-shape-fair comparisons — especially with prompt-heavy workloads like 8k/1k where J/output-token is ~9x higher than J/total-token despite the same energy. Wires the new field through the same plumbing as the existing measured- power metrics: - packages/constants/src/metric-keys.ts: register joules_per_total_token - packages/app/src/lib/benchmark-transform.ts: pass through (left undefined for legacy rows) - packages/app/src/components/inference/types.ts: extend AggDataEntry, InferenceData, YAxisMetricKey, ChartDefinition - packages/app/src/lib/chart-utils.ts: extend Y_AXIS_METRICS, createChartDataPoint, roofline union, markRooflinePoints - packages/app/src/components/inference/inference-chart-config.json: add y_measuredJPerTotalToken to both chartTypes (roofline lower_right / lower_left) - packages/app/src/components/inference/ui/ChartControls.tsx: add to the Measured Energy gated group Companion runner-side change: SemiAnalysisAI/InferenceX@363e49c4 emits joules_per_total_token in every agg_<run>.json. Tests: +3 covering the new field (presence, parallel independence from J/output-token, graceful absence on legacy rows). 1944/1944 vitest pass.
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
|
Claude finished @arygupt's task in 1m 46s —— View job Claude Code Review
🟢 LGTM — no blocking issues found The new
No other code paths in |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds a third option to the gated Measured Energy dropdown:
Existing Measured J per Output Token stays — it's still a useful framing for "what does it cost to generate a token." The new metric answers a different question: "what does it cost to handle a token end-to-end."
Why the distinction matters
For an 8k1k workload (8K input, 1K output) the same system energy gets divided by 1024 tokens (output) vs 9216 tokens (total). J/output-token is ~9x higher than J/total-token despite identical real-world cost. Datacenter operators usually bill per total tokens handled, so J/total-token maps more cleanly to dollars-per-token.
For balanced workloads (1k1k) the ratio is closer to 2x.
Companion runner change
semianalysisai/InferenceX@363e49c4 on PR #1558 emits
joules_per_total_tokenin everyagg_<run>.jsonalongside the existingavg_power_wandjoules_per_output_token.Wiring
Same pattern as the existing measured-power fields:
packages/constants/src/metric-keys.tsjoules_per_total_tokenpackages/app/src/lib/benchmark-transform.tsundefinedfor legacy rows)packages/app/src/components/inference/types.tsAggDataEntry,InferenceData,YAxisMetricKey,ChartDefinitionpackages/app/src/lib/chart-utils.tsY_AXIS_METRICS,createChartDataPoint,calculateRoofline/computeAllRooflinesyKey union,markRooflinePointspackages/app/src/components/inference/inference-chart-config.jsony_measuredJPerTotalTokento bothchartTypes (rooflinelower_righton interactivity,lower_lefton e2e)packages/app/src/components/inference/ui/ChartControls.tsxGraceful degradation
Same
typeof === 'number'gate as the other measured-power fields:joules_per_total_tokenabsent → those rows don't show on the J/total chart (correct)Test plan
pnpm typecheck— cleanpnpm lint/pnpm fmt— clean (pre-commit hook passes)pnpm test:unit— 1944/1944 passing (+3 new tests covering the new field's presence, independence from J/output-token, graceful absence on legacy rows)Note on overlay path
Per CLAUDE.md's overlay requirement: the new metric works on the
?unofficialrun=overlay path automatically becausetransformBenchmarkRowsis shared between the official and overlay code paths. Once runner PR #1558 merges and a sweep produces an artifact with the new field, the overlay URL will display the new metric immediately.Note
Low Risk
Additive UI and data plumbing for an optional telemetry field; no auth, billing, or breaking API changes. Main risk is empty charts until runner/ingest populates the new metric.
Overview
Adds Measured J per Token (
joules_per_total_token→measuredJPerTotalToken) as a third option in the gated Measured Energy Y-axis group, alongside avg power and J per output token. The value uses total tokens handled (input + output), which better matches billing-style “cost per token” than output-only J/tok on long-prompt workloads.The change threads the field from
METRIC_KEYSthroughbenchmark-transform, inference types,createChartDataPoint(only whentypeof joules_per_total_token === 'number'), roofline unions/marking, and both interactivity/e2e entries ininference-chart-config.json. Legacy rows without the field stay off the new chart view; unit tests cover presence, independence from J/output, and graceful omission.Depends on runner emitting
joules_per_total_tokenin aggregated JSON; until ingest has that field, the UI option exists but most historical points won’t plot.Reviewed by Cursor Bugbot for commit 42fdf80. Bugbot is set up for automated code reviews on this repo. Configure here.