[NVIDIA] Reduce B200 Runs & add B200 FP4 Docker Script by kimbochen · Pull Request #35 · SemiAnalysisAI/InferenceX

kimbochen · 2025-09-21T19:17:28Z

Copied commands from non-TRT slurm scripts for docker scripts
Reduced B200 TP lists to [1, 8] for baseline and low latency scenarios
Removed b200 labels from the NV slurm runners
Validating the B200 run here

… runs Builds on PR #1558 (single-node measured-power) for multinode benchmarks via srt-slurm. Pipeline: srt-slurm perfmon (per-node nvidia-smi sampling — PR #35 on NVIDIA/srt-slurm, layered on SemiAnalysisAI/srt-slurm:feat/inferencex-perfmon) perf_samples_<host>.csv in outputs/<job>/logs/ on shared NFS launch_gb300-cw.sh exports GPU_METRICS_CSV_GLOB to $GITHUB_ENV process_result.py expands the glob and hands the list to aggregate_power.run() aggregate_power.py namespaces local GPU indices per source CSV stem so each node's local indices 0..N-1 don't collide across nodes; emits cluster-wide avg_power_w + joules_per_*_token InferenceX-app ETL auto-captures the numeric fields (no schema change) Changes: - utils/aggregate_power.py: widen csv_path to Path | Iterable[Path] keeping the original param name. Per-source GPU-id namespacing only kicks in when there are 2+ sources so single-node num_gpus is unchanged. CLI adds --csv-glob (Python-side glob, mutually exclusive with --csv). - utils/process_result.py: bridge GPU_METRICS_CSV_GLOB env var. Glob takes precedence over single GPU_METRICS_CSV when both are set. - runners/launch_gb300-cw.sh: point dynamo-sglang at our srt-slurm fork, append `monitoring:` block to each recipe post-copy (idempotent), and write GPU_METRICS_CSV_GLOB to $GITHUB_ENV after the job for the downstream Process result step. - 8 new multinode tests in test_aggregate_power.py (per-source namespacing, sub-second clock drift, asymmetric prefill/decode power, missing-CSV silent skip, backward-compat single-path-in-list, Iterable acceptance, E2E run with list). 3 new in test_process_result.py (glob aggregation, precedence over single CSV, empty-match falls through). 64/64 pass. Verified data-format end-to-end on gb300 hardware: nvidia-smi inside the sglang container emits the columns aggregate_power.py needs timestamp, gpu, power_w.

kimbochen and others added 5 commits September 21, 2025 17:43

Copied slurm command to docker.

bda8cbb

Reduced B200 TP sizes.

131b2c9

Copied command from slurm script.

f51afa3

Copied command from slurm script.

75efa9b

Merge branch 'main' into reduce-b200-runs

5d95865

functionstackx changed the title ~~Reduce B200 Runs~~ Reduce B200 Runs & add B200 FP4 Docker Script Sep 22, 2025

functionstackx merged commit 27f29ac into main Sep 22, 2025

cquil11 added the NVIDIA label Apr 8, 2026

cquil11 changed the title ~~Reduce B200 Runs & add B200 FP4 Docker Script~~ [NVIDIA] Reduce B200 Runs & add B200 FP4 Docker Script Apr 8, 2026

arygupt mentioned this pull request May 27, 2026

feat(power): multinode measured-power aggregation #1574

Open

6 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[NVIDIA] Reduce B200 Runs & add B200 FP4 Docker Script#35

[NVIDIA] Reduce B200 Runs & add B200 FP4 Docker Script#35
functionstackx merged 5 commits into
mainfrom
reduce-b200-runs

kimbochen commented Sep 21, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

kimbochen commented Sep 21, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants