feat(inference): measured-power Y-axis metrics on scatter chart by arygupt · Pull Request #375 · SemiAnalysisAI/InferenceX-app

arygupt · 2026-05-22T00:44:30Z

Summary

Adds two new options under a new Measured Energy dropdown group on both the vs. Interactivity and vs. End-to-end Latency charts:

Measured Avg Power per GPU (W) — Pareto envelope: lower_right (interactivity) / lower_left (e2e), matching "lower power at the same interactivity is more efficient"
Measured J per Output Token (J/tok) — same envelope directions

Distinct from the existing y_jTotal/y_jOutput/y_jInput which derive joules from each GPU's spec-sheet TDP. The new metrics are sourced from runner GPU telemetry averaged over the exact bench load window — see companion PR semianalysisai/InferenceX#1551 which adds aggregate_power.py and emits avg_power_w + joules_per_output_token into the agg JSON.

Wiring

File	Change
`packages/constants/src/metric-keys.ts`	Register `avg_power_w`, `joules_per_output_token` so the ETL auto-capture warning doesn't fire
`packages/app/src/lib/benchmark-transform.ts`	Pass the two raw fields through `rowToAggDataEntry`. Left `undefined` when absent so downstream code can distinguish "no measurement" from "0 W"
`packages/app/src/components/inference/types.ts`	Extend `AggDataEntry`, `InferenceData`, `YAxisMetricKey`, `ChartDefinition`
`packages/app/src/lib/chart-utils.ts`	Extend `Y_AXIS_METRICS`, `createChartDataPoint` (gated on `typeof === 'number'`), `calculateRoofline`/`computeAllRooflines` yKey union, `markRooflinePoints` init+mark blocks
`packages/app/src/components/inference/inference-chart-config.json`	Add `y_measured*` entries to both `chartType`s. `y_measuredAvgPower_roofline` is intentionally omitted — `ScatterGraph` falls through to `lower_right` by default, which matches the desired semantic.
`packages/app/src/components/inference/ui/ChartControls.tsx`	Add "Measured Energy" group to `METRIC_GROUPS`

Overlay path

The unofficial-run overlay path (?unofficialrun=…) is automatic — transformBenchmarkRows is shared between official and overlay rendering, so the new metrics flow to overlay points once #1551 merges and benchmarks emit the new fields. Per CLAUDE.md the code-wiring requirement is satisfied; Cypress E2E covering the overlay path is a follow-up to add once real data lands in the DB so the mocked artifact JSON can mirror a real payload shape.

Graceful degradation (audited end-to-end)

Site does not break when rows are missing the new fields. The protection points:

rowToAggDataEntry — assigns undefined rather than 0 so the absence is preserved through the data pipeline.
createChartDataPoint — typeof === 'number' gate omits the derived measuredAvgPower / measuredJPerOutputToken fields entirely when the raw value is absent.
useChartData (useChartData.ts:301-305) — filteredData.some((d) => metricKey in d) short-circuits to an empty array when no row has the metric, so the chart renders cleanly empty rather than throwing.
useChartData mapping (useChartData.ts:305) — .filter((d) => metricKey in d) drops any individual row missing the field before the y-axis remap.
processOverlayChartData (utils.ts:104) — same metricKey in d filter for the overlay path.
ScatterGraph Pareto front — runs only on the already-filtered point set, so the roofline is never poisoned by missing-data points.

For rows without measured-power data (historical runs, runs predating the runner PR, runs where the SMI poller didn't start), the new chart options simply omit those points — the existing TDP-derived y_jTotal/y_jOutput/y_jInput stay visible as a comparable fallback. There's a test guarding the typeof === 'number' gate so a future refactor to truthiness check doesn't silently drop legitimate zero-value measurements.

Design notes

typeof === 'number' gate, not truthiness. Preserves legitimate zero-value measurements (e.g. an idle GPU) while still treating undefined as "no measurement." Test in chart-utils.test.ts enforces this.
No new D3 / chart-rendering code. The new metrics piggyback on the existing selectedYAxisMetric plumbing — every place that already handles tpPerGpu / jTotal / etc. now also handles measuredAvgPower / measuredJPerOutputToken.

Test plan

pnpm typecheck — clean
pnpm lint — 0 warnings, 0 errors
pnpm test:unit — 1921/1921 passing (+7 new tests covering rowToAggDataEntry pass-through, createChartDataPoint field gating, zero-value preservation, missing-field handling)
Pre-commit hook (lint + format + typecheck) — passes
Dev-server smoke test — confirmed "Measured Energy" group label and both metric labels are present in the served JS bundle at /_next/static/chunks/
Code-path audit for the "missing field" case — see "Graceful degradation" above; every code path that touches the new fields either filters first or is guarded
After #1551 merges and one benchmark seeds real data: verify the live chart renders measured points in the same shape as the existing TDP-derived ones
Add Cypress E2E covering official + overlay paths against the first real payload
Decide on ChartDisplay caveat subtitle ("Measured during load window via 1 Hz SMI poll") once visible with real data

Adds two new options under a new "Measured Energy" dropdown group on both the "vs. Interactivity" and "vs. E2E Latency" charts: - Measured Avg Power per GPU (W) — no roofline (no universal "better" direction) - Measured J per Output Token (J/tok) — roofline lower_right (interactivity) / lower_left (e2e) Distinct from the existing y_jTotal/y_jOutput/y_jInput which derive joules from each GPU's spec-sheet TDP. The new metrics are sourced from runner GPU telemetry averaged over the exact bench load window (see aggregate_power.py in semianalysisai/InferenceX). Wiring: - packages/constants/src/metric-keys.ts: register avg_power_w, joules_per_output_token in the canonical metric key set so the ETL auto-capture warning doesn't fire. - packages/app/src/lib/benchmark-transform.ts: pass the two raw fields through rowToAggDataEntry. Left undefined when absent so downstream code can distinguish "no measurement" from "0 W". - packages/app/src/components/inference/types.ts: extend AggDataEntry, InferenceData, YAxisMetricKey, and ChartDefinition. - packages/app/src/lib/chart-utils.ts: extend Y_AXIS_METRICS, createChartDataPoint (gated on typeof===number), calculateRoofline and computeAllRooflines yKey union, markRooflinePoints init+mark blocks. - packages/app/src/components/inference/inference-chart-config.json: add y_measured* entries to both chartTypes. - packages/app/src/components/inference/ui/ChartControls.tsx: add "Measured Energy" group to METRIC_GROUPS. The overlay (unofficial run) path is automatic — transformBenchmarkRows is shared between official and overlay rendering, so the new metrics flow to ?unofficialrun= URLs once the runner-side PR is merged and benchmarks ingest the new fields. For rows without measured-power data (historical runs, runs predating aggregate_power.py, runs where the SMI poller didn't start), points are simply omitted from the new charts — the existing TDP-derived y_jTotal/y_jOutput/y_jInput stay visible as a comparable fallback. Verification: - pnpm typecheck: clean - pnpm lint: 0 warnings, 0 errors - pnpm test:unit: 1921/1921 passing (+7 new tests covering rowToAggDataEntry pass-through, createChartDataPoint field gating, zero-value preservation, missing-field handling) - Dev-server smoke: confirmed "Measured Energy" group label and both metric labels are present in the served JS bundle at /_next/static/chunks/ Follow-up: Cypress E2E covering both the official path and ?unofficialrun= overlay path for the two new metrics, to be added once the runner PR ships real data to the DB. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

vercel · 2026-05-22T00:44:35Z

The latest updates on your projects. Learn more about Vercel for GitHub.

Project	Deployment	Actions	Updated (UTC)
inferencemax-app	Ready	Preview, Comment	May 22, 2026 8:30pm

claude · 2026-05-22T00:44:47Z

Claude finished @arygupt's task in 4m 18s —— View job

🟢 LGTM — no blocking issues

Wiring is consistent across all six layers (constants, transform, types, chart-utils, config, controls). The graceful-degradation contract holds: useChartData.ts line 301-325 filters via metricKey in d, and the typeof === 'number' gate in createChartDataPoint keeps 'measuredAvgPower' in d false for legacy rows (rather than true with value: undefined). The "no roofline" choice for y_measuredAvgPower works because computeAllRooflines skips any metric without a _roofline entry. Test coverage for the new lib code is good — the zero-value preservation test (avg_power_w: 0) specifically guards the typeof === 'number' gate against a future refactor to truthiness.

Non-blocking observation

🟡 Historical Trends tab will render flat-zero lines for the new metrics
rowToLightweightPoint in packages/app/src/components/inference/hooks/useInterpolatedTrendData.ts:25-69 doesn't build measuredAvgPower / measuredJPerOutputToken. Because ChartControls is shared between Inference and Historical Trends, a user who selects "Measured Energy" on the Trends tab hits this path: extractMetric() returns null, interpolateMetricAtInteractivity coerces with ?? 0, the spline interpolates zeros, and every GPU's trend line renders at y=0 across all dates — a misleading flat line instead of "no data". Same latent issue already exists for y_inputTputPerMw / y_outputTputPerMw, so this PR doesn't introduce the pattern, just extends its surface. Worth either (a) adding the two measured fields to the lightweight builder once real data lands, or (b) filtering the "Measured Energy" group out of the trends-tab dropdown.

The deferred items in the PR description (ChartDisplay caveat subtitle, Cypress E2E against real payload, AI-chart prompt-template update) are appropriately blocked on #1551 emitting real data, so not a blocker for this PR.

· branch feat/measured-power-metrics

The previous comment claimed avg-power has no roofline because the field is omitted from the chart config. That's wrong: ScatterGraph.tsx:270-277 falls back to paretoFrontLowerRight when the direction is undefined, so the chart actually draws a lower_right roofline by default. The behavior happens to be sensible ("lower power at the same interactivity is more efficient"), so leaving it as-is and just fixing the comment to describe what the code actually does.

claude · 2026-05-22T02:04:02Z

Claude finished @arygupt's task in 2m 24s —— View job

Re-review: 🟢 LGTM

Only one new commit since the prior review: 19f9ab1 docs(inference): correct y_measuredAvgPower_roofline comment.

Comment-only change — packages/app/src/components/inference/types.ts:296-299. The author correctly noticed that the previous comment's claim ("no roofline for avg power") was wrong: ScatterGraph.tsx:270-277 falls back to paretoFrontLowerRight when _roofline is undefined, so a roofline line is drawn for y_measuredAvgPower. The new comment accurately describes the actual behavior, and the chosen fallback ("lower power at same interactivity = more efficient") is defensible.

Verified the doc claim:

ScatterGraph.tsx:262-277 — dir is undefined for y_measuredAvgPower, falls through the ternary chain to paretoFrontLowerRight. ✓
chart-utils.ts:616 — computeAllRooflines does still skip metrics without a direction, so per-point roof flags won't be set. Minor inconsistency (line drawn, no on-roofline markers) but it's pre-existing for any metric where config omits _roofline, not introduced here.

No behavior change, no test impact. Previous review's non-blocking observation about the Historical Trends tab still applies as a follow-up (verified useInterpolatedTrendData.ts:25-69 still doesn't build the new measured fields; extractMetric → ?? 0 coercion at line 109 would still produce flat-zero lines).

Check what changed since previous review
Verify the Historical Trends concern from prior review
Inspect overlay-path behavior with new metrics
Post re-review summary
· branch feat/measured-power-metrics

Wires the new "Measured Energy" chart options into the existing ↑↑↓↓ feature-gate so they can be toggled without a deploy. Two changes: 1. Extract useFeatureGate to a shared hook at lib/use-feature-gate.ts. It was previously private to tab-nav.tsx; three other components (gpu-power, submissions, feedback) read the same localStorage key and DOM events ad-hoc. Now there's one source of truth, and the key + event names are exported as named constants so other consumers can migrate when convenient. No behavior change for tab-nav. 2. ChartControls METRIC_GROUPS entries gain an optional `gated` flag. Inside the component, the group list, the metric→group map, and the dropdown options are derived via useMemo from useFeatureGate(), so toggling the gate flips visibility without a page reload. The Measured Energy group is marked `gated: true`; the rest stay as-is. Verified: - pnpm typecheck clean - pnpm test:unit 1921/1921 - Pre-commit hook (lint + format + typecheck) passes

claude · 2026-05-22T20:29:46Z

Claude finished @arygupt's task in 1m 59s —— View job

Re-review: 🟢 LGTM

One new commit since the prior review: aff6054 feat(inference): gate measured-energy dropdown behind feature flag.

What changed:

packages/app/src/lib/use-feature-gate.ts (new) — extracts the ↑↑↓↓ konami hook that was previously private to tab-nav.tsx. Exports FEATURE_GATE_KEY, FEATURE_GATE_UNLOCKED_EVENT, FEATURE_GATE_LOCKED_EVENT as named constants. Logic is byte-for-byte the prior in-file hook — pure refactor, no behavior change for tab-nav.
tab-nav.tsx — drops the inline hook and imports the shared one.
ChartControls.tsx — adds an optional gated flag to METRIC_GROUPS, marks "Measured Energy" as gated: true, and derives visibleGroups / metricGroupMap / groupedYAxisOptions via useMemo(…, [featureGateUnlocked]). The dropdown re-renders when the gate flips without a reload.

Verified:

Hook effects use stable deps (unlocked for the keydown effect, [] for the lock/unlock event listeners) — no stale-closure or listener-leak risk.
The three other consumers (gpu-power, submissions, feedback-viewer) still hard-code FEATURE_GATE_KEY and event-name string literals. Commit message confirms this is intentional ("can migrate when convenient") and the literals match the exported constants exactly. ✓
SSR-safe: useState(false) initial value plus localStorage read inside useEffect. Server and first-client render both produce the locked-default option list; hydration won't mismatch.
Default selectedYAxisMetric is y_tpPerGpu (InferenceContext.tsx:128), not a gated metric — fresh visitors are never stranded on a gated selection.
The previous review's Historical Trends flat-zero concern is effectively masked for normal users now that the new metrics are dropdown-hidden until the gate unlocks. The underlying useInterpolatedTrendData issue is still latent for insiders who unlock the gate, but that audience can be told to avoid the Trends tab — appropriately deferred.

No test for use-feature-gate.ts — but the logic is unchanged from the prior in-file implementation (which was also untested), so per the "only flag NEW code, not existing untested code" rule this is a refactor, not a new untested feature. Not blocking.

Minor (non-blocking) observation

🟡 If a user unlocks the gate, selects y_measuredAvgPower, then re-locks via the admin button in gpu-power/submissions/feedback-viewer, selectedYAxisMetric stays on the gated value while the dropdown filter hides it — the chart keeps rendering measured energy but the controls trigger shows a label without a matching SelectItem. Same goes for opening a shared ?i_metric=y_measuredAvgPower URL while locked. Not a crash; the user can switch to any non-gated metric. Worth a small reset-on-lock if you want belt-and-braces, but acceptable as-is for an insider-only feature.

Tasks

Identify changes since last review
Review the new feature-flag commit (aff6054)
Inspect use-feature-gate.ts (new file)
Inspect tab-nav.tsx changes
Verify ChartControls integration
Post summary
· branch feat/measured-power-metrics

vercel Bot deployed to Preview May 22, 2026 02:04 View deployment

vercel Bot deployed to Preview May 22, 2026 20:30 View deployment

arygupt mentioned this pull request May 22, 2026

feat(power): aggregate measured GPU power into agg result JSON SemiAnalysisAI/InferenceX#1558

Open

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(inference): measured-power Y-axis metrics on scatter chart#375

feat(inference): measured-power Y-axis metrics on scatter chart#375
arygupt wants to merge 3 commits into
masterfrom
feat/measured-power-metrics

arygupt commented May 22, 2026 •

edited

Loading

Uh oh!

vercel Bot commented May 22, 2026 •

edited

Loading

Uh oh!

claude Bot commented May 22, 2026 •

edited

Loading

Uh oh!

claude Bot commented May 22, 2026 •

edited

Loading

Uh oh!

claude Bot commented May 22, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

arygupt commented May 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Wiring

Overlay path

Graceful degradation (audited end-to-end)

Design notes

Test plan

Uh oh!

vercel Bot commented May 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

claude Bot commented May 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🟢 LGTM — no blocking issues

Non-blocking observation

Uh oh!

claude Bot commented May 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Re-review: 🟢 LGTM

Uh oh!

claude Bot commented May 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Re-review: 🟢 LGTM

Minor (non-blocking) observation

Tasks

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

arygupt commented May 22, 2026 •

edited

Loading

vercel Bot commented May 22, 2026 •

edited

Loading

claude Bot commented May 22, 2026 •

edited

Loading

claude Bot commented May 22, 2026 •

edited

Loading

claude Bot commented May 22, 2026 •

edited

Loading