Skip to content

[AMD][ROCM] dsv4-fp4-mi355x-atom: bump image, expand concurrency, simplify script#1311

Merged
seungrokj merged 21 commits into
mainfrom
srok/atom_dsv4_fp4
May 14, 2026
Merged

[AMD][ROCM] dsv4-fp4-mi355x-atom: bump image, expand concurrency, simplify script#1311
seungrokj merged 21 commits into
mainfrom
srok/atom_dsv4_fp4

Conversation

@seungrokj
Copy link
Copy Markdown
Collaborator

@seungrokj seungrokj commented May 11, 2026

Summary

  • Bump image to rocm/atom-dev:nightly_202605101539
  • Expand concurrency range from single-sequence (conc=1) to conc 1–256
  • Simplify dsv4_fp4_mi355x_atom.sh by removing WIP workarounds that are no longer needed
  • Add perf-changelog entry for dsv4-fp4-mi355x-atom

Test plan

  • Verify benchmark runs at expanded concurrency range
  • Verify perf-changelog entry is correctly formatted

🤖 Generated with Claude Code

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@github-actions
Copy link
Copy Markdown
Contributor

Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook

If it is not, please create a PR first before we can merge your single node PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you

PR authors are responsible for ensuring that after merging, all GitHub Action jobs fully pass. A lot of the time, failures are just flakes and simply re-running the failed jobs will fix it. If re-running failed jobs is attempted, PR authors are responsible for ensuring it passes. See GitHub's docs on re-running failed jobs: https://docs.github.com/en/actions/how-tos/manage-workflow-runs/re-run-workflows-and-jobs#re-running-failed-jobs-in-a-workflow

As a rule of thumb, generally, PR authors should request a review & get a PR approval from the respective companies' CODEOWNERS before requesting a review from core maintainers.

If additional help is needed, PR authors can reach out to core maintainers over Slack.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@seungrokj seungrokj changed the title dsv4-fp4-mi355x-atom: bump image, expand concurrency, simplify script [AMD][ROCM] dsv4-fp4-mi355x-atom: bump image, expand concurrency, simplify script May 11, 2026
@seungrokj seungrokj added the AMD label May 11, 2026
Comment on lines 42 to 49
start_gpu_monitor

set -x

BLOCK_SIZE=${BLOCK_SIZE:-16}
export ATOM_DSV4_SPARSE_ATTN_CHUNK_TOKENS=${ATOM_DSV4_SPARSE_ATTN_CHUNK_TOKENS:-256}
# --enforce-eager is required: ROCm/ATOM#650 (PR1 skeleton) has no CUDAGraph
# support yet (deferred to a follow-up PR). max-num-seqs is sized to the
# client concurrency with a floor at 4 — the ATOM default (512) makes the
# KV/GDN-mamba allocator overshoot the GPU budget ("GDN mamba tensor
# exceeds available KV budget"), and using 1 hangs warmup at 0% GPU. 4
# is the minimum we've seen complete warmup successfully (also the PR's
# offline repro value). The PR1 kv_cache[:1,...] hardcode in
# deepseek_v4.py means any forward with batch>1 silently corrupts
# non-slot-0 lanes; eval (gsm8k) at conc>1 is the canary.
MAX_NUM_SEQS=$(( CONC < 4 ? 4 : CONC ))
MAX_NUM_BATCHED_TOKENS=${MAX_NUM_BATCHED_TOKENS:-$MAX_MODEL_LEN_VALUE}
export ATOM_DISABLE_MMAP=true
export ATOM_USE_TRITON_MOE=1
python3 -m atom.entrypoints.openai_server \
--model $MODEL \
--server-port $PORT \
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 The simplified script still references $BLOCK_SIZE, $MAX_NUM_SEQS, and $MAX_NUM_BATCHED_TOKENS in the python invocation (lines 45, 47, 48 of benchmarks/single_node/dsv4_fp4_mi355x_atom.sh), but this PR deleted the only assignments of those variables and nothing else in the harness defines them. With set -eo pipefail (no -u), they expand to empty strings, so the rendered argv becomes --block-size --enforce-eager --max-num-seqs --max-num-batched-tokens --trust-remote-code … and the server will fail to start on every job in the expanded conc=4–1024 sweep. Fix is to either restore BLOCK_SIZE=${BLOCK_SIZE:-16} / MAX_NUM_SEQS=$(( CONC < 4 ? 4 : CONC )) / MAX_NUM_BATCHED_TOKENS=${MAX_NUM_BATCHED_TOKENS:-$MAX_MODEL_LEN_VALUE} (mirroring the sister ATOM scripts), or drop the three flags from the python command line.

Extended reasoning...

What the bug is

This PR simplifies benchmarks/single_node/dsv4_fp4_mi355x_atom.sh by removing a large block of WIP overlay/setup code. As part of that simplification it also deleted three short variable assignments that were sitting just above set -x:

BLOCK_SIZE=${BLOCK_SIZE:-16}
MAX_NUM_SEQS=$(( CONC < 4 ? 4 : CONC ))
MAX_NUM_BATCHED_TOKENS=${MAX_NUM_BATCHED_TOKENS:-$MAX_MODEL_LEN_VALUE}

However, the python invocation at the bottom of the simplified script still passes those three variables through (lines 45, 47, 48 of the new file):

    --block-size $BLOCK_SIZE \
    --enforce-eager \
    --max-num-seqs $MAX_NUM_SEQS \
    --max-num-batched-tokens $MAX_NUM_BATCHED_TOKENS \

Why nothing rescues this

  • check_env_vars at the top only validates MODEL/TP/CONC/ISL/OSL/RANDOM_RANGE_RATIO/RESULT_FILENAME/EP_SIZE/DP_ATTENTION — none of the three deleted variables are listed there, so the check does not surface them.
  • A grep across benchmarks/, .github/configs/, and the workflow templates shows no other producer for BLOCK_SIZE / MAX_NUM_SEQS / MAX_NUM_BATCHED_TOKENS; the orchestrator never injects them into the environment.
  • The script uses set -eo pipefail but not set -u, so the undefined variables expand to empty strings silently.
  • Every sibling ATOM script that passes --block-size keeps the local default (e.g. benchmarks/single_node/dsr1_fp4_mi355x_atom.sh:50, dsr1_fp8_mi355x_atom.sh:50, gptoss_fp4_mi355x_atom.sh:51 all define BLOCK_SIZE=${BLOCK_SIZE:-16}), confirming this is required wiring that was accidentally lost in the simplification.

Step-by-step proof

  1. CI runs the dsv4-fp4-mi355x-atom config from the diff: { tp: 8, ep: 1, conc-start: 4, conc-end: 1024 }. The orchestrator exports MODEL, TP, CONC, ISL, OSL, RANDOM_RANGE_RATIO, RESULT_FILENAME, EP_SIZE, DP_ATTENTION — but not the three deleted vars.
  2. The script runs check_env_vars … which passes (BLOCK_SIZE etc. are not in its list).
  3. Reaching the python invocation, bash expands the argv. With the three vars unset and no set -u, the rendered argv after word-splitting is:
    python3 -m atom.entrypoints.openai_server \
      --model deepseek-ai/DeepSeek-V4-Pro --server-port 8888 -tp 8 \
      --kv_cache_dtype fp8 [--max-model-len 10240] [--enable-expert-parallel?] \
      --block-size --enforce-eager \
      --max-num-seqs --max-num-batched-tokens --trust-remote-code
    
  4. ATOM's argparse parses --block-size --enforce-eager: --block-size is int-typed, and argparse consumes the next token (--enforce-eager) as its value. int('--enforce-eager') raises argparse.ArgumentTypeError/ValueError, the server exits non-zero before opening port 8888.
  5. wait_for_server_ready either polls until timeout or detects the dead PID and aborts the job. Every (TP=8, conc=4..1024, ISL/OSL ∈ {1024/1024, 8192/1024}) cell of the new sweep hits exactly this path, so the entire sweep this PR is trying to enable cannot start.

Even under the most charitable argparse behavior (for string-typed flags), --max-num-seqs would be assigned the literal string --max-num-batched-tokens, and --max-num-batched-tokens would consume --trust-remote-code. The server still cannot start with such values, and --trust-remote-code would in turn be missing.

Addressing the refutations

The two refutations on bug_003 and bug_004 are simply duplicate-dedup callouts between the two reports (they describe the same defect at the same location). The synthesis agent has already merged them into this single report (merged_bug_003), so the dedup objection is satisfied. Neither refutation disputes the underlying defect.

Fix

Either:

  1. Restore the three defaults before set -x (mirroring dsr1_fp4_mi355x_atom.sh):
    BLOCK_SIZE=${BLOCK_SIZE:-16}
    MAX_NUM_SEQS=$(( CONC < 4 ? 4 : CONC ))
    MAX_NUM_BATCHED_TOKENS=${MAX_NUM_BATCHED_TOKENS:-10240}
    (Note that MAX_MODEL_LEN_VALUE was also removed in this PR, so the third default needs a literal or a recomputed value.)
  2. Or drop the three flags from the python invocation entirely (matching the qwen3.5-fp4-mi355x-atom sister script which does not use them) if the new rocm/atom-dev:nightly_202605101539 image provides workable defaults.

Comment thread perf-changelog.yaml Outdated
- dsv4-fp4-mi355x-atom
description:
- "Add DeepSeek-V4-Pro FP4 MI355X ATOM benchmark config; bump image to rocm/atom-dev:nightly_202605101539, expand concurrency range (conc 4–1024), and simplify runtime script"
pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 The new perf-changelog entry for dsv4-fp4-mi355x-atom has pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/ with no PR number appended (perf-changelog.yaml:2344). It should be .../pull/1311 — every other entry in the file follows that pattern, and any downstream tooling that consumes pr-link will get a broken URL pointing to the PRs listing page instead of this PR.

Extended reasoning...

What the bug is

The new perf-changelog entry added by this PR ends with an incomplete pr-link value:

- config-keys:
    - dsv4-fp4-mi355x-atom
  description:
    - "Add DeepSeek-V4-Pro FP4 MI355X ATOM benchmark config; bump image to rocm/atom-dev:nightly_202605101539, expand concurrency range (conc 4–1024), and simplify runtime script"
  pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/

The numeric PR ID is missing from the end of the URL.

Why existing patterns expect a number

Every other entry in perf-changelog.yaml follows the convention .../pull/<NUMBER>. For example, the entry immediately above (line 2338) is https://github.com/SemiAnalysisAI/InferenceX/pull/1308, and neighboring entries use /pull/1303, /pull/1304, /pull/1305, etc. The schema is uniform across the file, so consumers of this YAML will reasonably assume the value is a fully-formed URL pointing at a specific PR.

Impact

If a human clicks the link in a rendered changelog, they get the GitHub PR listing page for the repo rather than this PR. More importantly, any tooling that parses pr-link (changelog renderers, scripts that extract PR numbers via regex on the URL, dashboards that link to PRs) will either get a broken/empty PR ID or fall through to the listing page. The link silently points to the wrong place rather than failing loudly.

Step-by-step proof

  1. Open perf-changelog.yaml at line 2344.
  2. Observe the raw value: pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/ (URL ends with a trailing slash and no digits).
  3. Compare to line 2338 (the previous entry): pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/1308.
  4. Visit the URL in a browser — you land on https://github.com/SemiAnalysisAI/InferenceX/pulls (the listing), not on PR [AMD][ROCM] dsv4-fp4-mi355x-atom: bump image, expand concurrency, simplify script #1311.
  5. Run a simple regex extractor like url.rsplit('/', 1)[-1] on the value: it yields an empty string instead of 1311.

How to fix

Append 1311 (this PR's number) to the URL:

  pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/1311

seungrokj and others added 2 commits May 11, 2026 13:27
…m server args

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
seungrokj and others added 2 commits May 11, 2026 13:41
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@github-actions
Copy link
Copy Markdown
Contributor

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@github-actions
Copy link
Copy Markdown
Contributor

@functionstackx
Copy link
Copy Markdown
Collaborator

@seungrokj the amd sglang team is doing from conc 1 to 256, probably makes sense for ATOM to do that at least that too? feel free to expand the range even more if u want but we should do at least conc 1 to 256

from @1am9trash 's PR https://github.com/SemiAnalysisAI/InferenceX/pull/1300/changes

@functionstackx
Copy link
Copy Markdown
Collaborator

hi @seungrokj even on conc=4, it is getting this error, can u take a look?

 File "/app/ATOM/atom/models/deepseek_v4.py", line 189, in v4_attention_with_output
    return self.forward_impl(x, positions)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/app/ATOM/atom/models/deepseek_v4.py", line 1500, in forward_impl
    compress_plans = attn_md.compress_plans
                     ^^^^^^^^^^^^^^^^^^^^^^
AttributeError: 'AttentionMetaData' object has no attribute 'compress_plans'
Process ModelRunner2/8:
Traceback (most recent call last):
  File "/usr/lib/python3.12/multiprocessing/process.py", line 314, in _bootstrap
    self.run()
  File "/usr/lib/python3.12/multiprocessing/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "/app/ATOM/atom/model_engine/async_proc.py", line 113, in __init__
    self.busy_loop()
  File "/app/ATOM/atom/model_engine/async_proc.py", line 173, in busy_loop
    out = func(*args)
          ^^^^^^^^^^^
  File "/opt/venv/lib/python3.12/site-packages/torch/utils/_contextlib.py", line 124, in decorate_context
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/app/ATOM/atom/model_engine/model_runner.py", line 2013, in capture_cudagraph
    model_output = self.model(
                   ^^^^^^^^^^^
  File "/opt/venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1776, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1787, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/app/ATOM/atom/models/deepseek_v4.py", line 2499, in forward
    return self.model(input_ids, positions)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/app/ATOM/atom/utils/decorators.py", line 529, in __call__
    model_output = self.forward(*args, **kwargs)
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/app/ATOM/atom/models/deepseek_v4.py", line 2388, in forward
    def forward(
  File "/opt/venv/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 465, in __call__
    return super().__call__(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1776, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1787, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/venv/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1181, in _fn
    return fn(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^
  File "/opt/venv/lib/python3.12/site-packages/torch/fx/graph_module.py", line 936, in call_wrapped
    return self._wrapped_call(self, *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/venv/lib/python3.12/site-packages/torch/fx/graph_module.py", line 455, in __call__
    raise e
  File "/opt/venv/lib/python3.12/site-packages/torch/fx/graph_module.py", line 442, in __call__
    return super(self.cls, obj).__call__(*args, **kwargs)  # type: ignore[misc]
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1776, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1787, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "<eval_with_key>.124", line 505, in forward
    submod_1 = self.submod_1(getitem, s72, l_positions_, s80);  getitem = None
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/venv/lib/python3.12/site-packages/torch/fx/graph_module.py", line 936, in call_wrapped
    return self._wrapped_call(self, *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/venv/lib/python3.12/site-packages/torch/fx/graph_module.py", line 455, in __call__
    raise e
  File "/opt/venv/lib/python3.12/site-packages/torch/fx/graph_module.py", line 442, in __call__
    return super(self.cls, obj).__call__(*args, **kwargs)  # type: ignore[misc]
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1776, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1787, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "<eval_with_key>.126", line 5, in forward
    v4_attention_with_output = torch.ops.aiter.v4_attention_with_output(result_2, l_positions_, 'layers.0.attn');  result_2 = l_positions_ = None
                               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/venv/lib/python3.12/site-packages/torch/_ops.py", line 1209, in __call__
    return self._op(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/venv/lib/python3.12/site-packages/torch/utils/_device.py", line 109, in __torch_function__
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/opt/venv/lib/python3.12/site-packages/torch/_ops.py", line 1209, in __call__
    return self._op(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/app/ATOM/atom/models/deepseek_v4.py", line 189, in v4_attention_with_output
    return self.forward_impl(x, positions)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/app/ATOM/atom/models/deepseek_v4.py", line 1500, in forward_impl
    compress_plans = attn_md.compress_plans
                     ^^^^^^^^^^^^^^^^^^^^^^
AttributeError: 'AttentionMetaData' object has no attribute 'compress_plans'
[aiter] import [module_rope_2c_cached_positions_fwd] under /app/aiter-test/aiter/jit/module_rope_2c_cached_positions_fwd.so
[aiter] import [module_rope_2c_cached_positions_fwd] under /app/aiter-test/aiter/jit/module_rope_2c_cached_positions_fwd.so
[aiter] import [module_rope_2c_cached_positions_fwd] under /app/aiter-test/aiter/jit/module_rope_2c_cached_positions_fwd.so
[aiter] import [module_rope_2c_cached_positions_fwd] under /app/aiter-test/aiter/jit/module_rope_2c_cached_positions_fwd.so
[aiter] import [module_rope_2c_cached_positions_fwd] under /app/aiter-test/aiter/jit/module_rope_2c_cached_positions_fwd.so
Process ModelRunner1/8:
Traceback (most recent call last):
  File "/usr/lib/python3.12/multiprocessing/process.py", line 314, in _bootstrap
    self.run()
  File "/usr/lib/python3.12/multiprocessing/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "/app/ATOM/atom/model_engine/async_proc.py", line 113, in __init__
    self.busy_loop()
  File "/app/ATOM/atom/model_engine/async_proc.py", line 173, in busy_loop
    out = func(*args)
          ^^^^^^^^^^^
  File "/opt/venv/lib/python3.12/site-packages/torch/utils/_contextlib.py", line 124, in decorate_context
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/app/ATOM/atom/model_engine/model_runner.py", line 2013, in capture_cudagraph
    model_output = self.model(
                   ^^^^^^^^^^^
  File "/opt/venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1776, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1787, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/app/ATOM/atom/models/deepseek_v4.py", line 2499, in forward
    return self.model(input_ids, positions)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/app/ATOM/atom/utils/decorators.py", line 529, in __call__
    model_output = self.forward(*args, **kwargs)
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/app/ATOM/atom/models/deepseek_v4.py", line 2388, in forward
    def forward(
  File "/opt/venv/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 465, in __call__
    return super().__call__(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1776, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1787, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/venv/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1181, in _fn
    return fn(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^
  File "/opt/venv/lib/python3.12/site-packages/torch/fx/graph_module.py", line 936, in call_wrapped
    return self._wrapped_call(self, *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/venv/lib/python3.12/site-packages/torch/fx/graph_module.py", line 455, in __call__
    raise e
  File "/opt/venv/lib/python3.12/site-packages/torch/fx/graph_module.py", line 442, in __call__
    return super(self.cls, obj).__call__(*args, **kwargs)  # type: ignore[misc]
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1776, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1787, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "<eval_with_key>.124", line 505, in forward
    submod_1 = self.submod_1(getitem, s72, l_positions_, s80);  getitem = None
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/venv/lib/python3.12/site-packages/torch/fx/graph_module.py", line 936, in call_wrapped
    return self._wrapped_call(self, *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/venv/lib/python3.12/site-packages/torch/fx/graph_module.py", line 455, in __call__
    raise e
  File "/opt/venv/lib/python3.12/site-packages/torch/fx/graph_module.py", line 442, in __call__
    return super(self.cls, obj).__call__(*args, **kwargs)  # type: ignore[misc]
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1776, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1787, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "<eval_with_key>.126", line 5, in forward
    v4_attention_with_output = torch.ops.aiter.v4_attention_with_output(result_2, l_positions_, 'layers.0.attn');  result_2 = l_positions_ = None
                               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/venv/lib/python3.12/site-packages/torch/_ops.py", line 1209, in __call__
    return self._op(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/venv/lib/python3.12/site-packages/torch/utils/_device.py", line 109, in __torch_function__
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/opt/venv/lib/python3.12/site-packages/torch/_ops.py", line 1209, in __call__
    return self._op(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/app/ATOM/atom/models/deepseek_v4.py", line 189, in v4_attention_with_output
    return self.forward_impl(x, positions)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/app/ATOM/atom/models/deepseek_v4.py", line 1500, in forward_impl
    compress_plans = attn_md.compress_plans
                     ^^^^^^^^^^^^^^^^^^^^^^
AttributeError: 'AttentionMetaData' object has no attribute 'compress_plans'

Capturing bs=512, max_q_len=1:   0%|          | 0/11 [00:00<?, ?it/s]
Process ModelRunner0/8:
[aiter] import [module_rope_2c_cached_positions_fwd] under /app/aiter-test/aiter/jit/module_rope_2c_cached_positions_fwd.so
Traceback (most recent call last):
  File "/usr/lib/python3.12/multiprocessing/process.py", line 314, in _bootstrap
    self.run()
  File "/usr/lib/python3.12/multiprocessing/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "/app/ATOM/atom/model_engine/async_proc.py", line 113, in __init__
    self.busy_loop()
  File "/app/ATOM/atom/model_engine/async_proc.py", line 173, in busy_loop
    out = func(*args)
          ^^^^^^^^^^^
  File "/opt/venv/lib/python3.12/site-packages/torch/utils/_contextlib.py", line 124, in decorate_context
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/app/ATOM/atom/model_engine/model_runner.py", line 2013, in capture_cudagraph
    model_output = self.model(
                   ^^^^^^^^^^^
  File "/opt/venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1776, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1787, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/app/ATOM/atom/models/deepseek_v4.py", line 2499, in forward
    return self.model(input_ids, positions)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/app/ATOM/atom/utils/decorators.py", line 529, in __call__
    model_output = self.forward(*args, **kwargs)
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/app/ATOM/atom/models/deepseek_v4.py", line 2388, in forward
    def forward(
  File "/opt/venv/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 465, in __call__
    return super().__call__(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1776, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1787, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/venv/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1181, in _fn
    return fn(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^
  File "/opt/venv/lib/python3.12/site-packages/torch/fx/graph_module.py", line 936, in call_wrapped
    return self._wrapped_call(self, *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/venv/lib/python3.12/site-packages/torch/fx/graph_module.py", line 455, in __call__
    raise e
  File "/opt/venv/lib/python3.12/site-packages/torch/fx/graph_module.py", line 442, in __call__
    return super(self.cls, obj).__call__(*args, **kwargs)  # type: ignore[misc]
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1776, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1787, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "<eval_with_key>.124", line 505, in forward
    submod_1 = self.submod_1(getitem, s72, l_positions_, s80);  getitem = None
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/venv/lib/python3.12/site-packages/torch/fx/graph_module.py", line 936, in call_wrapped
    return self._wrapped_call(self, *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/venv/lib/python3.12/site-packages/torch/fx/graph_module.py", line 455, in __call__
    raise e
  File "/opt/venv/lib/python3.12/site-packages/torch/fx/graph_module.py", line 442, in __call__
    return super(self.cls, obj).__call__(*args, **kwargs)  # type: ignore[misc]
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1776, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1787, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "<eval_with_key>.126", line 5, in forward
    v4_attention_with_output = torch.ops.aiter.v4_attention_with_output(result_2, l_positions_, 'layers.0.attn');  result_2 = l_positions_ = None
                               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/venv/lib/python3.12/site-packages/torch/_ops.py", line 1209, in __call__
    return self._op(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/venv/lib/python3.12/site-packages/torch/utils/_device.py", line 109, in __torch_function__
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/opt/venv/lib/python3.12/site-packages/torch/_ops.py", line 1209, in __call__
    return self._op(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/app/ATOM/atom/models/deepseek_v4.py", line 189, in v4_attention_with_output
    return self.forward_impl(x, positions)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/app/ATOM/atom/models/deepseek_v4.py", line 1500, in forward_impl
    compress_plans = attn_md.compress_plans
                     ^^^^^^^^^^^^^^^^^^^^^^
AttributeError: 'AttentionMetaData' object has no attribute 'compress_plans'
Process ModelRunner7/8:
Traceback (most recent call last):
  File "/usr/lib/python3.12/multiprocessing/process.py", line 314, in _bootstrap
    self.run()
  File "/usr/lib/python3.12/multiprocessing/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "/app/ATOM/atom/model_engine/async_proc.py", line 113, in __init__
    self.busy_loop()
  File "/app/ATOM/atom/model_engine/async_proc.py", line 173, in busy_loop
    out = func(*args)
          ^^^^^^^^^^^
  File "/opt/venv/lib/python3.12/site-packages/torch/utils/_contextlib.py", line 124, in decorate_context
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/app/ATOM/atom/model_engine/model_runner.py", line 2013, in capture_cudagraph
    model_output = self.model(
                   ^^^^^^^^^^^
  File "/opt/venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1776, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1787, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/app/ATOM/atom/models/deepseek_v4.py", line 2499, in forward
    return self.model(input_ids, positions)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/app/ATOM/atom/utils/decorators.py", line 529, in __call__
    model_output = self.forward(*args, **kwargs)
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/app/ATOM/atom/models/deepseek_v4.py", line 2388, in forward
    def forward(
  File "/opt/venv/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 465, in __call__
    return super().__call__(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1776, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1787, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/venv/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1181, in _fn
    return fn(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^
  File "/opt/venv/lib/python3.12/site-packages/torch/fx/graph_module.py", line 936, in call_wrapped
    return self._wrapped_call(self, *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/venv/lib/python3.12/site-packages/torch/fx/graph_module.py", line 455, in __call__
    raise e
  File "/opt/venv/lib/python3.12/site-packages/torch/fx/graph_module.py", line 442, in __call__
    return super(self.cls, obj).__call__(*args, **kwargs)  # type: ignore[misc]
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

@github-actions
Copy link
Copy Markdown
Contributor

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@github-actions
Copy link
Copy Markdown
Contributor

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@github-actions
Copy link
Copy Markdown
Contributor

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@github-actions
Copy link
Copy Markdown
Contributor

@github-actions
Copy link
Copy Markdown
Contributor

@functionstackx
Copy link
Copy Markdown
Collaborator

also, we need to fix this chat template issue too. will get back with a patch asap https://github.com/SemiAnalysisAI/InferenceX/actions/runs/25668243296/job/75346620974

@seungrokj whats the reason for chat template for non-mtp? +viz @Oseltamivir

seungrokj and others added 2 commits May 13, 2026 18:36
- Update atom-dev image to nightly_202605130853
- Expand conc-end from 256 to 512 for isl=1024 and isl=8192

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@seungrokj
Copy link
Copy Markdown
Collaborator Author

Copy link
Copy Markdown
Collaborator

@chunfangamd chunfangamd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The performance is outperforming SGLang now!

@github-actions
Copy link
Copy Markdown
Contributor

@functionstackx
Copy link
Copy Markdown
Collaborator

@seungrokj ci is failing, can u take a look?

@github-actions
Copy link
Copy Markdown
Contributor

@seungrokj
Copy link
Copy Markdown
Collaborator Author

@functionstackx @cquil11

  1. disk was full -> I removed some stuffs
  2. docker pull issue -> re-logged docker hub

should be okay by now... it's running

@github-actions
Copy link
Copy Markdown
Contributor

@functionstackx
Copy link
Copy Markdown
Collaborator

@functionstackx @cquil11

  1. disk was full -> I removed some stuffs
  2. docker pull issue -> re-logged docker hub

should be okay by now... it's running

it is still failing, can u take a look? and if it is 1 flaky node, can u drain it? https://github.com/SemiAnalysisAI/InferenceX/actions/runs/25813782165/job/75837243947?pr=1311

image

@github-actions
Copy link
Copy Markdown
Contributor

1 similar comment
@github-actions
Copy link
Copy Markdown
Contributor

@seungrokj
Copy link
Copy Markdown
Collaborator Author

@functionstackx @cquil11
can you plz merge this (previous failing was due to docker login issue)
https://github.com/SemiAnalysisAI/InferenceX/actions/runs/25813782165

@github-actions
Copy link
Copy Markdown
Contributor

@github-actions
Copy link
Copy Markdown
Contributor

@seungrokj
Copy link
Copy Markdown
Collaborator Author

@github-actions
Copy link
Copy Markdown
Contributor

Copy link
Copy Markdown
Collaborator

@functionstackx functionstackx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@seungrokj seungrokj merged commit b6faacd into main May 14, 2026
17 of 40 checks passed
@seungrokj seungrokj deleted the srok/atom_dsv4_fp4 branch May 14, 2026 00:51
cquil11 added a commit that referenced this pull request May 15, 2026
Conflicts resolved:
- .github/configs/amd-master.yaml (dsv4-fp4-mi355x-atom): took main's
  simplified single-range conc form from PR #1311 (we had the older
  discrete-point version)
- .github/configs/nvidia-master.yaml (kimik2.5-int4-b200-vllm): kept our
  bump-rationale comment alongside main's v0.20.2 image (both sides
  agreed on the image, only the comment was new on ours)
- .github/configs/nvidia-master.yaml (minimaxm2.5-fp8-{h100,h200}-vllm):
  took main's v0.20.2 image bumps (we still had v0.19.1)

Cleanup:
- Drop our .gitignore additions (the 'scripts/debug_*.sh' line) per
  review feedback -- match main
- Drop docs/AGENTIC_TEST_COVERAGE.md and docs/AGENTIC_TEST_RESULTS.md
  (agent-generated planning slop, not load-bearing)
functionstackx pushed a commit that referenced this pull request May 17, 2026
The earlier rebase silently dropped trailing whitespace from two
unrelated entries (PRs #1311, #1322). The 'no deletions in
perf-changelog' policy treats whitespace changes as deletions and
failed setup. Rebuild perf-changelog by checking out main's exact
bytes and re-appending only the PR #1394 entry.
functionstackx added a commit that referenced this pull request May 17, 2026
* $Update gptoss-fp4-b200-vllm vLLM image to v0.20.2\n\nRef #1154\n\nCo-authored-by: Klaud Cold <Klaud-Cold@users.noreply.github.com>

* fix(perf-changelog): restore trailing whitespace dropped by prior rebase

The earlier rebase silently dropped trailing whitespace from two
unrelated entries (PRs #1311, #1322). The 'no deletions in
perf-changelog' policy treats whitespace changes as deletions and
failed setup. Rebuild perf-changelog by checking out main's exact
bytes and re-appending only the PR #1394 entry.

---------

Co-authored-by: claude[bot] <41898282+claude[bot]@users.noreply.github.com>
Co-authored-by: claude-fix-bot <claude-fix-bot@local>
Co-authored-by: functionstackx <47992694+functionstackx@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

Development

Successfully merging this pull request may close these issues.

3 participants