Clean up DSv4 ATOM AITER PR2998 overlay#1260
Conversation
|
Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook If it is not, please create a PR first before we can merge your PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you PR authors are responsible for ensuring that after merging, all GitHub Action jobs fully pass. A lot of the time, failures are just flakes and simply re-running the failed jobs will fix it. If re-running failed jobs is attempted, PR authors are responsible for ensuring it passes. See GitHub's docs on re-running failed jobs: https://docs.github.com/en/actions/how-tos/manage-workflow-runs/re-run-workflows-and-jobs#re-running-failed-jobs-in-a-workflow As a rule of thumb, generally, PR authors should request a review & get a PR approval from the respective companies' CODEOWNERS before requesting a review from core maintainers. If additional help is needed, PR authors can reach out to core maintainers over Slack. |
ba98d1e to
6f27021
Compare
6f27021 to
b239475
Compare
|
see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=25246059530 |
|
see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=25246066354 |
|
see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=25246280222 |
|
see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=25246877759 |
|
see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=25257317536 |
|
see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=25257804526 |
|
see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=25259474874 |
|
see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=25292528536 |
|
see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=25293643828 |
|
see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=25298720926 |
|
see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=25299373037 |
|
see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=25306596030 |
|
see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=25333573911 |
|
see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=25335088310 |
|
see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=25335750558 |
Motivation
This recreates the DSv4 ATOM work from a clean branch, replacing the older incremental PR state with a smaller and easier-to-review diff.
The goal is to keep the useful progress from the previous branch while removing temporary performance experiments. The updated ATOM image still does not register
DeepseekV4ForCausalLM, so this PR keeps ROCm/ATOM#650 only as the required DSv4 model skeleton/registration overlay, then applies ROCm/aiter#2998 for the sparse/indexer kernels.This supersedes the previous DSv4 ATOM branch/PR: #1229
Progress From Main To This State
main, preserving the GPTOSS MI355X ATOM config-schema update from PR Fix GPT-OSS ATOM config schema #1261.rocm/atom:rocm7.2.2_ubuntu24.04_py3.12_pytorch_release_2.10.0_atom0.1.2.post.6a0ebb9730839b08287117a17b7d13007acd2d0b, because the image currently fails startup withKeyError: 'DeepseekV4ForCausalLM'without it.deepseek_v4_pro/deepseek_v4_flash; this handles the image path where config schema mapping leaveshf_config.model_type == "deepseek_v3"whilearchitectures == ["DeepseekV4ForCausalLM"].model_type='deepseek_v3' is not in per_req_cache_model_typesassertion while still preserving the silent-corruption guard.aa0c5b6d97ffc6d4d11b8172dc848239f229c863for DSv4 sparse MQA sink and Indexer scorer/top-k implementations.Technical Details
benchmarks/single_node/dsv4_fp4_mi355x_atom.shinstalls the pinned feat(deepseek_v4): PR1 skeleton — end-to-end inference with triton MoE ROCm/ATOM#650 DSv4 skeleton, verifiesDeepseekV4ForCausalLMregistration, installs pinned Dsv4 sparse indexer ROCm/aiter#2998, and verifies the batcheddsv4_indexer_topk(seq_ids, kv_lens)API.is_deepseek_v4,InputOutputProcessor.has_per_req_cache, the ModelRunner per-request cache assertion, and V4 config/block-size detection to use V4 architecture detection where needed.benchmarks/benchmark_lib.shkeeps the generic profiling and DSv4 eval plumbing, caps DSv4 eval generation withEVAL_DSV4_MAX_OUTPUT_TOKENSdefaulting to1024, and allows override viaEVAL_MAX_OUTPUT_TOKENS.utils/evals/gsm8k.yamlnow includes DSv4 EOS/role stop strings in addition to the existing</s>and<|im_end|>stops.ATOM_PROFILE_ARGSwhen profiling is enabled.1k1kconc1and8k1kconc1.perf-changelog.yamlrecords the cleaned DSv4 ATOM image + ATOM#650 skeleton + AITER#2998 overlay state at the end of the file.Test Plan
bash -n benchmarks/benchmark_lib.sh benchmarks/single_node/dsv4_fp4_mi355x_atom.sh benchmarks/single_node/gptoss_fp4_mi355x_atom.sh benchmarks/single_node/dsr1_fp4_mi355x_atom.sh benchmarks/single_node/dsr1_fp8_mi355x_atom.shpython utils/matrix_logic/generate_sweep_configs.py full-sweep --config-files .github/configs/amd-master.yaml --model-prefix dsv4 --framework atom --runner-type mi355xpython utils/matrix_logic/generate_sweep_configs.py full-sweep --config-files .github/configs/amd-master.yaml --model-prefix gptoss --framework atom --runner-type mi355xpython -m pytest utils/matrix_logic/ -v25246877759and confirmed rawrespscontinued past#### 18into a repeated junk suffix because configured stops were only</s>/<|im_end|>andmax_tokenswas 5376.Test Result
151 passed.</s>,<|im_end|>,<|end▁of▁sentence|>,<|User|>,<|Assistant|>, and the observed bidi control marker.6a0ebb9730839b08287117a17b7d13007acd2d0b.aa0c5b6d97ffc6d4d11b8172dc848239f229c863.Submission Checklist