[AMD][ROCM] dsv4-fp4-mi355x-atom: bump image, expand concurrency, simplify script#1311
Conversation
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook If it is not, please create a PR first before we can merge your single node PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you PR authors are responsible for ensuring that after merging, all GitHub Action jobs fully pass. A lot of the time, failures are just flakes and simply re-running the failed jobs will fix it. If re-running failed jobs is attempted, PR authors are responsible for ensuring it passes. See GitHub's docs on re-running failed jobs: https://docs.github.com/en/actions/how-tos/manage-workflow-runs/re-run-workflows-and-jobs#re-running-failed-jobs-in-a-workflow As a rule of thumb, generally, PR authors should request a review & get a PR approval from the respective companies' CODEOWNERS before requesting a review from core maintainers. If additional help is needed, PR authors can reach out to core maintainers over Slack. |
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
| start_gpu_monitor | ||
|
|
||
| set -x | ||
|
|
||
| BLOCK_SIZE=${BLOCK_SIZE:-16} | ||
| export ATOM_DSV4_SPARSE_ATTN_CHUNK_TOKENS=${ATOM_DSV4_SPARSE_ATTN_CHUNK_TOKENS:-256} | ||
| # --enforce-eager is required: ROCm/ATOM#650 (PR1 skeleton) has no CUDAGraph | ||
| # support yet (deferred to a follow-up PR). max-num-seqs is sized to the | ||
| # client concurrency with a floor at 4 — the ATOM default (512) makes the | ||
| # KV/GDN-mamba allocator overshoot the GPU budget ("GDN mamba tensor | ||
| # exceeds available KV budget"), and using 1 hangs warmup at 0% GPU. 4 | ||
| # is the minimum we've seen complete warmup successfully (also the PR's | ||
| # offline repro value). The PR1 kv_cache[:1,...] hardcode in | ||
| # deepseek_v4.py means any forward with batch>1 silently corrupts | ||
| # non-slot-0 lanes; eval (gsm8k) at conc>1 is the canary. | ||
| MAX_NUM_SEQS=$(( CONC < 4 ? 4 : CONC )) | ||
| MAX_NUM_BATCHED_TOKENS=${MAX_NUM_BATCHED_TOKENS:-$MAX_MODEL_LEN_VALUE} | ||
| export ATOM_DISABLE_MMAP=true | ||
| export ATOM_USE_TRITON_MOE=1 | ||
| python3 -m atom.entrypoints.openai_server \ | ||
| --model $MODEL \ | ||
| --server-port $PORT \ |
There was a problem hiding this comment.
🔴 The simplified script still references $BLOCK_SIZE, $MAX_NUM_SEQS, and $MAX_NUM_BATCHED_TOKENS in the python invocation (lines 45, 47, 48 of benchmarks/single_node/dsv4_fp4_mi355x_atom.sh), but this PR deleted the only assignments of those variables and nothing else in the harness defines them. With set -eo pipefail (no -u), they expand to empty strings, so the rendered argv becomes --block-size --enforce-eager --max-num-seqs --max-num-batched-tokens --trust-remote-code … and the server will fail to start on every job in the expanded conc=4–1024 sweep. Fix is to either restore BLOCK_SIZE=${BLOCK_SIZE:-16} / MAX_NUM_SEQS=$(( CONC < 4 ? 4 : CONC )) / MAX_NUM_BATCHED_TOKENS=${MAX_NUM_BATCHED_TOKENS:-$MAX_MODEL_LEN_VALUE} (mirroring the sister ATOM scripts), or drop the three flags from the python command line.
Extended reasoning...
What the bug is
This PR simplifies benchmarks/single_node/dsv4_fp4_mi355x_atom.sh by removing a large block of WIP overlay/setup code. As part of that simplification it also deleted three short variable assignments that were sitting just above set -x:
BLOCK_SIZE=${BLOCK_SIZE:-16}
MAX_NUM_SEQS=$(( CONC < 4 ? 4 : CONC ))
MAX_NUM_BATCHED_TOKENS=${MAX_NUM_BATCHED_TOKENS:-$MAX_MODEL_LEN_VALUE}However, the python invocation at the bottom of the simplified script still passes those three variables through (lines 45, 47, 48 of the new file):
--block-size $BLOCK_SIZE \
--enforce-eager \
--max-num-seqs $MAX_NUM_SEQS \
--max-num-batched-tokens $MAX_NUM_BATCHED_TOKENS \Why nothing rescues this
check_env_varsat the top only validatesMODEL/TP/CONC/ISL/OSL/RANDOM_RANGE_RATIO/RESULT_FILENAME/EP_SIZE/DP_ATTENTION— none of the three deleted variables are listed there, so the check does not surface them.- A grep across
benchmarks/,.github/configs/, and the workflow templates shows no other producer forBLOCK_SIZE/MAX_NUM_SEQS/MAX_NUM_BATCHED_TOKENS; the orchestrator never injects them into the environment. - The script uses
set -eo pipefailbut notset -u, so the undefined variables expand to empty strings silently. - Every sibling ATOM script that passes
--block-sizekeeps the local default (e.g.benchmarks/single_node/dsr1_fp4_mi355x_atom.sh:50,dsr1_fp8_mi355x_atom.sh:50,gptoss_fp4_mi355x_atom.sh:51all defineBLOCK_SIZE=${BLOCK_SIZE:-16}), confirming this is required wiring that was accidentally lost in the simplification.
Step-by-step proof
- CI runs the dsv4-fp4-mi355x-atom config from the diff:
{ tp: 8, ep: 1, conc-start: 4, conc-end: 1024 }. The orchestrator exportsMODEL,TP,CONC,ISL,OSL,RANDOM_RANGE_RATIO,RESULT_FILENAME,EP_SIZE,DP_ATTENTION— but not the three deleted vars. - The script runs
check_env_vars …which passes (BLOCK_SIZEetc. are not in its list). - Reaching the python invocation, bash expands the argv. With the three vars unset and no
set -u, the rendered argv after word-splitting is:python3 -m atom.entrypoints.openai_server \ --model deepseek-ai/DeepSeek-V4-Pro --server-port 8888 -tp 8 \ --kv_cache_dtype fp8 [--max-model-len 10240] [--enable-expert-parallel?] \ --block-size --enforce-eager \ --max-num-seqs --max-num-batched-tokens --trust-remote-code - ATOM's argparse parses
--block-size --enforce-eager:--block-sizeis int-typed, and argparse consumes the next token (--enforce-eager) as its value.int('--enforce-eager')raisesargparse.ArgumentTypeError/ValueError, the server exits non-zero before opening port 8888. wait_for_server_readyeither polls until timeout or detects the dead PID and aborts the job. Every (TP=8, conc=4..1024, ISL/OSL ∈ {1024/1024, 8192/1024}) cell of the new sweep hits exactly this path, so the entire sweep this PR is trying to enable cannot start.
Even under the most charitable argparse behavior (for string-typed flags), --max-num-seqs would be assigned the literal string --max-num-batched-tokens, and --max-num-batched-tokens would consume --trust-remote-code. The server still cannot start with such values, and --trust-remote-code would in turn be missing.
Addressing the refutations
The two refutations on bug_003 and bug_004 are simply duplicate-dedup callouts between the two reports (they describe the same defect at the same location). The synthesis agent has already merged them into this single report (merged_bug_003), so the dedup objection is satisfied. Neither refutation disputes the underlying defect.
Fix
Either:
- Restore the three defaults before
set -x(mirroringdsr1_fp4_mi355x_atom.sh):(Note thatBLOCK_SIZE=${BLOCK_SIZE:-16} MAX_NUM_SEQS=$(( CONC < 4 ? 4 : CONC )) MAX_NUM_BATCHED_TOKENS=${MAX_NUM_BATCHED_TOKENS:-10240}
MAX_MODEL_LEN_VALUEwas also removed in this PR, so the third default needs a literal or a recomputed value.) - Or drop the three flags from the python invocation entirely (matching the
qwen3.5-fp4-mi355x-atomsister script which does not use them) if the newrocm/atom-dev:nightly_202605101539image provides workable defaults.
| - dsv4-fp4-mi355x-atom | ||
| description: | ||
| - "Add DeepSeek-V4-Pro FP4 MI355X ATOM benchmark config; bump image to rocm/atom-dev:nightly_202605101539, expand concurrency range (conc 4–1024), and simplify runtime script" | ||
| pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/ |
There was a problem hiding this comment.
🟡 The new perf-changelog entry for dsv4-fp4-mi355x-atom has pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/ with no PR number appended (perf-changelog.yaml:2344). It should be .../pull/1311 — every other entry in the file follows that pattern, and any downstream tooling that consumes pr-link will get a broken URL pointing to the PRs listing page instead of this PR.
Extended reasoning...
What the bug is
The new perf-changelog entry added by this PR ends with an incomplete pr-link value:
- config-keys:
- dsv4-fp4-mi355x-atom
description:
- "Add DeepSeek-V4-Pro FP4 MI355X ATOM benchmark config; bump image to rocm/atom-dev:nightly_202605101539, expand concurrency range (conc 4–1024), and simplify runtime script"
pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/The numeric PR ID is missing from the end of the URL.
Why existing patterns expect a number
Every other entry in perf-changelog.yaml follows the convention .../pull/<NUMBER>. For example, the entry immediately above (line 2338) is https://github.com/SemiAnalysisAI/InferenceX/pull/1308, and neighboring entries use /pull/1303, /pull/1304, /pull/1305, etc. The schema is uniform across the file, so consumers of this YAML will reasonably assume the value is a fully-formed URL pointing at a specific PR.
Impact
If a human clicks the link in a rendered changelog, they get the GitHub PR listing page for the repo rather than this PR. More importantly, any tooling that parses pr-link (changelog renderers, scripts that extract PR numbers via regex on the URL, dashboards that link to PRs) will either get a broken/empty PR ID or fall through to the listing page. The link silently points to the wrong place rather than failing loudly.
Step-by-step proof
- Open
perf-changelog.yamlat line 2344. - Observe the raw value:
pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/(URL ends with a trailing slash and no digits). - Compare to line 2338 (the previous entry):
pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/1308. - Visit the URL in a browser — you land on
https://github.com/SemiAnalysisAI/InferenceX/pulls(the listing), not on PR [AMD][ROCM] dsv4-fp4-mi355x-atom: bump image, expand concurrency, simplify script #1311. - Run a simple regex extractor like
url.rsplit('/', 1)[-1]on the value: it yields an empty string instead of1311.
How to fix
Append 1311 (this PR's number) to the URL:
pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/1311…m server args Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=25650584751 |
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=25650688728 |
|
@seungrokj the amd sglang team is doing from conc 1 to 256, probably makes sense for ATOM to do that at least that too? feel free to expand the range even more if u want but we should do at least conc 1 to 256 from @1am9trash 's PR https://github.com/SemiAnalysisAI/InferenceX/pull/1300/changes |
|
hi @seungrokj even on conc=4, it is getting this error, can u take a look? |
|
see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=25650857591 |
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=25651910046 |
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=25654329782 |
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=25655408841 |
|
see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=25668243296 |
@seungrokj whats the reason for chat template for non-mtp? +viz @Oseltamivir |
- Update atom-dev image to nightly_202605130853 - Expand conc-end from 256 to 512 for isl=1024 and isl=8192 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
@functionstackx @cquil11 e2e perf/accuracy https://github.com/SemiAnalysisAI/InferenceX/actions/runs/25794276782 |
chunfangamd
left a comment
There was a problem hiding this comment.
The performance is outperforming SGLang now!
|
see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=25798735228 |
|
@seungrokj ci is failing, can u take a look? |
|
see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=25798735228 |
should be okay by now... it's running |
|
see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=25798735228 |
it is still failing, can u take a look? and if it is 1 flaky node, can u drain it? https://github.com/SemiAnalysisAI/InferenceX/actions/runs/25813782165/job/75837243947?pr=1311
|
|
see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=25813782165 |
1 similar comment
|
see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=25813782165 |
|
@functionstackx @cquil11 |
|
see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=25834577379 |
|
see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=25834609567 |
|
cancelled https://github.com/SemiAnalysisAI/InferenceX/actions/runs/25834633831 |
|
see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=25834633831 |
Conflicts resolved: - .github/configs/amd-master.yaml (dsv4-fp4-mi355x-atom): took main's simplified single-range conc form from PR #1311 (we had the older discrete-point version) - .github/configs/nvidia-master.yaml (kimik2.5-int4-b200-vllm): kept our bump-rationale comment alongside main's v0.20.2 image (both sides agreed on the image, only the comment was new on ours) - .github/configs/nvidia-master.yaml (minimaxm2.5-fp8-{h100,h200}-vllm): took main's v0.20.2 image bumps (we still had v0.19.1) Cleanup: - Drop our .gitignore additions (the 'scripts/debug_*.sh' line) per review feedback -- match main - Drop docs/AGENTIC_TEST_COVERAGE.md and docs/AGENTIC_TEST_RESULTS.md (agent-generated planning slop, not load-bearing)
The earlier rebase silently dropped trailing whitespace from two unrelated entries (PRs #1311, #1322). The 'no deletions in perf-changelog' policy treats whitespace changes as deletions and failed setup. Rebuild perf-changelog by checking out main's exact bytes and re-appending only the PR #1394 entry.
* $Update gptoss-fp4-b200-vllm vLLM image to v0.20.2\n\nRef #1154\n\nCo-authored-by: Klaud Cold <Klaud-Cold@users.noreply.github.com> * fix(perf-changelog): restore trailing whitespace dropped by prior rebase The earlier rebase silently dropped trailing whitespace from two unrelated entries (PRs #1311, #1322). The 'no deletions in perf-changelog' policy treats whitespace changes as deletions and failed setup. Rebuild perf-changelog by checking out main's exact bytes and re-appending only the PR #1394 entry. --------- Co-authored-by: claude[bot] <41898282+claude[bot]@users.noreply.github.com> Co-authored-by: claude-fix-bot <claude-fix-bot@local> Co-authored-by: functionstackx <47992694+functionstackx@users.noreply.github.com>

Summary
rocm/atom-dev:nightly_202605101539dsv4_fp4_mi355x_atom.shby removing WIP workarounds that are no longer neededdsv4-fp4-mi355x-atomTest plan
🤖 Generated with Claude Code