Skip to content

Clean up DSv4 ATOM AITER PR2998 overlay#1260

Open
Oseltamivir wants to merge 26 commits intomainfrom
dsv4-atom-pr2998-clean
Open

Clean up DSv4 ATOM AITER PR2998 overlay#1260
Oseltamivir wants to merge 26 commits intomainfrom
dsv4-atom-pr2998-clean

Conversation

@Oseltamivir
Copy link
Copy Markdown
Collaborator

@Oseltamivir Oseltamivir commented May 2, 2026

Motivation

This recreates the DSv4 ATOM work from a clean branch, replacing the older incremental PR state with a smaller and easier-to-review diff.

The goal is to keep the useful progress from the previous branch while removing temporary performance experiments. The updated ATOM image still does not register DeepseekV4ForCausalLM, so this PR keeps ROCm/ATOM#650 only as the required DSv4 model skeleton/registration overlay, then applies ROCm/aiter#2998 for the sparse/indexer kernels.

This supersedes the previous DSv4 ATOM branch/PR: #1229

Progress From Main To This State

  • Rebased onto current main, preserving the GPTOSS MI355X ATOM config-schema update from PR Fix GPT-OSS ATOM config schema #1261.
  • Updated/kept DSv4 ATOM on rocm/atom:rocm7.2.2_ubuntu24.04_py3.12_pytorch_release_2.10.0_atom0.1.2.post.
  • Added the minimal feat(deepseek_v4): PR1 skeleton — end-to-end inference with triton MoE ROCm/ATOM#650 DSv4 skeleton overlay pinned to 6a0ebb9730839b08287117a17b7d13007acd2d0b, because the image currently fails startup with KeyError: 'DeepseekV4ForCausalLM' without it.
  • Added narrow V4 architecture compatibility for deepseek_v4_pro / deepseek_v4_flash; this handles the image path where config schema mapping leaves hf_config.model_type == "deepseek_v3" while architectures == ["DeepseekV4ForCausalLM"].
  • Patched both per-request cache slot allocation and the ModelRunner startup guard to recognize V4 by architecture, avoiding the model_type='deepseek_v3' is not in per_req_cache_model_types assertion while still preserving the silent-corruption guard.
  • Kept Dsv4 sparse indexer ROCm/aiter#2998 pinned to aa0c5b6d97ffc6d4d11b8172dc848239f229c863 for DSv4 sparse MQA sink and Indexer scorer/top-k implementations.
  • Fixed DSv4 GSM8K eval stopping: the task now includes DSv4 EOS/role stop strings, DSv4 eval generation is capped by default, and lm-eval parsing defensively truncates returned completions at DSv4 stop/control markers if the backend returns the runaway suffix anyway.
  • Removed temporary AITER perf-stack cherry-picks, tuned fMoE/FlyDSL overlays, dense Indexer fast paths, server monitor wrappers, and shortened benchmark-prompt defaults.
  • Preserved profiling improvements so ATOM runs can emit Torch profiler traces and profile jobs can default to a one-step trace.
  • Preserved DSv4 eval support via the custom DSv4 prompt encoding path and completions endpoint handling.

Technical Details

  • benchmarks/single_node/dsv4_fp4_mi355x_atom.sh installs the pinned feat(deepseek_v4): PR1 skeleton — end-to-end inference with triton MoE ROCm/ATOM#650 DSv4 skeleton, verifies DeepseekV4ForCausalLM registration, installs pinned Dsv4 sparse indexer ROCm/aiter#2998, and verifies the batched dsv4_indexer_topk(seq_ids, kv_lens) API.
  • The ATOM overlay patch updates is_deepseek_v4, InputOutputProcessor.has_per_req_cache, the ModelRunner per-request cache assertion, and V4 config/block-size detection to use V4 architecture detection where needed.
  • benchmarks/benchmark_lib.sh keeps the generic profiling and DSv4 eval plumbing, caps DSv4 eval generation with EVAL_DSV4_MAX_OUTPUT_TOKENS defaulting to 1024, and allows override via EVAL_MAX_OUTPUT_TOKENS.
  • utils/evals/gsm8k.yaml now includes DSv4 EOS/role stop strings in addition to the existing </s> and <|im_end|> stops.
  • ATOM benchmark launchers pass ATOM_PROFILE_ARGS when profiling is enabled.
  • DSv4 ATOM remains at the conservative single-concurrency marker from main: 1k1k conc 1 and 8k1k conc 1.
  • perf-changelog.yaml records the cleaned DSv4 ATOM image + ATOM#650 skeleton + AITER#2998 overlay state at the end of the file.

Test Plan

  • bash -n benchmarks/benchmark_lib.sh benchmarks/single_node/dsv4_fp4_mi355x_atom.sh benchmarks/single_node/gptoss_fp4_mi355x_atom.sh benchmarks/single_node/dsr1_fp4_mi355x_atom.sh benchmarks/single_node/dsr1_fp8_mi355x_atom.sh
  • python utils/matrix_logic/generate_sweep_configs.py full-sweep --config-files .github/configs/amd-master.yaml --model-prefix dsv4 --framework atom --runner-type mi355x
  • python utils/matrix_logic/generate_sweep_configs.py full-sweep --config-files .github/configs/amd-master.yaml --model-prefix gptoss --framework atom --runner-type mi355x
  • python -m pytest utils/matrix_logic/ -v
  • Parsed downloaded eval artifact from run 25246877759 and confirmed raw resps continued past #### 18 into a repeated junk suffix because configured stops were only </s> / <|im_end|> and max_tokens was 5376.

Test Result

  • Shell syntax checks passed.
  • DSv4 ATOM config generation produced the conservative marker configs from main.
  • GPTOSS ATOM config generation passed and preserved current main's schema.
  • Matrix tests passed: 151 passed.
  • GSM8K YAML parsed with stop list: </s>, <|im_end|>, <|end▁of▁sentence|>, <|User|>, <|Assistant|>, and the observed bidi control marker.
  • Confirmed feat(deepseek_v4): PR1 skeleton — end-to-end inference with triton MoE ROCm/ATOM#650 is still open at 6a0ebb9730839b08287117a17b7d13007acd2d0b.
  • Confirmed Dsv4 sparse indexer ROCm/aiter#2998 is still open at aa0c5b6d97ffc6d4d11b8172dc848239f229c863.

Submission Checklist

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 2, 2026

Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook

If it is not, please create a PR first before we can merge your PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you

PR authors are responsible for ensuring that after merging, all GitHub Action jobs fully pass. A lot of the time, failures are just flakes and simply re-running the failed jobs will fix it. If re-running failed jobs is attempted, PR authors are responsible for ensuring it passes. See GitHub's docs on re-running failed jobs: https://docs.github.com/en/actions/how-tos/manage-workflow-runs/re-run-workflows-and-jobs#re-running-failed-jobs-in-a-workflow

As a rule of thumb, generally, PR authors should request a review & get a PR approval from the respective companies' CODEOWNERS before requesting a review from core maintainers.

If additional help is needed, PR authors can reach out to core maintainers over Slack.

@Oseltamivir Oseltamivir mentioned this pull request May 2, 2026
Comment thread benchmarks/benchmark_lib.sh
Comment thread benchmarks/single_node/dsv4_fp4_mi355x_atom.sh Outdated
@Oseltamivir Oseltamivir force-pushed the dsv4-atom-pr2998-clean branch from 6f27021 to b239475 Compare May 2, 2026 06:43
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 2, 2026

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 2, 2026

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 2, 2026

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 2, 2026

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 2, 2026

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 2, 2026

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 2, 2026

@SemiAnalysisAI SemiAnalysisAI deleted a comment from github-actions Bot May 2, 2026
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 3, 2026

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 3, 2026

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 4, 2026

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 4, 2026

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 4, 2026

@SemiAnalysisAI SemiAnalysisAI deleted a comment from github-actions Bot May 4, 2026
@SemiAnalysisAI SemiAnalysisAI deleted a comment from github-actions Bot May 4, 2026
@SemiAnalysisAI SemiAnalysisAI deleted a comment from github-actions Bot May 4, 2026
@SemiAnalysisAI SemiAnalysisAI deleted a comment from github-actions Bot May 4, 2026
@SemiAnalysisAI SemiAnalysisAI deleted a comment from github-actions Bot May 4, 2026
@SemiAnalysisAI SemiAnalysisAI deleted a comment from github-actions Bot May 4, 2026
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 4, 2026

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 4, 2026

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 4, 2026

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

Status: No status

Development

Successfully merging this pull request may close these issues.

1 participant