Skip to content

0927 final#67

Merged
functionstackx merged 2 commits intomainfrom
amd-gpt-oss-0927final
Sep 28, 2025
Merged

0927 final#67
functionstackx merged 2 commits intomainfrom
amd-gpt-oss-0927final

Conversation

@seungrokj
Copy link
Copy Markdown
Collaborator

@kimbochen @japarada @qcolombet

please merge this

Signed-off-by: seungrokjung <seungrok.jung@amd.com>
Signed-off-by: seungrokjung <seungrok.jung@amd.com>
@functionstackx functionstackx merged commit 079c016 into main Sep 28, 2025
@functionstackx functionstackx deleted the amd-gpt-oss-0927final branch September 28, 2025 19:03
Oseltamivir added a commit that referenced this pull request Apr 24, 2026
Replaces our hand-rolled 8k/1k DSV4-Pro vLLM disagg recipes with the
four topologies from NVIDIA/srt-slurm PR #71 (source fork:
alec-flowers/srt-slurm, branch aflowers/dsv4-pr67-pr68, pinned at
commit d60e3f1c). PR #71 supersedes PR #67 that our original 8k/1k
recipes were based on, with more topologies, a wider concurrency
sweep per recipe, new env vars, explicit tokenizer-mode, and CPU/DRAM
expert offload.

We take everything except offload:

  * launch_gb200-nv.sh clones alec-flowers/srt-slurm for dsv4 instead
    of NVIDIA/srt-slurm.
  * Runtime post-clone patch strips `offload-group-size`,
    `offload-num-in-group`, `offload-prefetch-step`, and the commented
    `# offload-params` line from all four 8k/1k recipes.
  * Same post-clone patch injects our `slurm.time_limit: 8:00:00` and
    `health_check: {max_attempts: 1440, interval_seconds: 10}` (4 h
    budget) so the recipes match our cold-cache Lustre load budget.
  * Model-path alias changed from `deepseek-v4-pro` to `deepseekv4-fp4`
    to match PR #71 recipes' `model.path` field; 1k/1k local recipes
    updated to the same alias.
  * nvidia-master.yaml 8k/1k block rewritten: 4 search-space entries
    (1p1d-dep8-dep8, 3p1d-dep8-dep8, 3p1d-dep8-dep16, 6p1d-dep8-dep16),
    each running conc list [4, 8, 16, 32, 64, 256, 512, 1024] — 32 total
    8k/1k benchmark points across 4 cluster startups.
  * Obsolete local 8k/1k recipes under srt-slurm-recipes/vllm/deepseek-v4/8k1k/
    removed (superseded by the PR #71 upstream files).

1k/1k sweep is unchanged otherwise (2 matrix entries, 9 benchmark
points using the hand-rolled recipes — no PR #71 equivalent at 1k/1k).
Oseltamivir added a commit that referenced this pull request Apr 25, 2026
* runners/launch_gb200-nv.sh: switch the recipe overlay step from
  `cp -r src dst` to `cp -rT src dst` (with explicit `mkdir -p dst`
  first). Addresses the bot review nit at line 144 — `cp -r src dst`
  works only because the upstream sa-submission-q2-2026 branch has no
  `recipes/vllm/deepseek-v4/` directory today; if upstream ever ships
  one, `cp -r` would nest as `recipes/vllm/deepseek-v4/deepseek-v4/...`
  and CONFIG_FILE in nvidia-master.yaml would silently resolve to the
  upstream stub. `-T` overlays unconditionally.

* perf-changelog.yaml: refresh the dsv4-fp4-gb200-dynamo-vllm entry's
  description. The previous wording referenced "8k1k, 7p1d-dep8-dep16"
  and "Mirrors NVIDIA/srt-slurm PR #67" which is stale after the move
  to a 1k/1k sweep with TEP low-conc (mirrored from PR #71) plus two
  hand-rolled mid/high topologies. Also fixes the directory reference
  (recipes moved to benchmarks/multi_node/srt-slurm-recipes/ during
  the cleanup pass).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants