Add ROCm Qwen3.5-9B smoke path#2
Merged
Merged
Conversation
# ⭐ Feature ## Add Qwen3.5-9B ROCm smoke path - Add one-command Qwen3.5-9B DAPO-Math smoke runner that restarts Ray from a clean state. - Align Qwen3.5 runtime flags with the validated Qwen3-4B ROCm path: TE auto attention, BSHD, and disabled Dynamo/JIT fuser. - Reduce the default Qwen3.5 run to a smaller smoke profile to avoid filling SGLang KV/logits memory during initial validation. Made-with: Cursor --- # 🐛 Bug Fix ## Install ROCm training dependencies in Dockerfile - Install ROCm TransformerEngine 2.4.0 wheels for Megatron TE specs. - Install flash-linear-attention for Qwen3.5 GatedDeltaNet initialization. --- # 📝 Documentation ## Record Qwen3.5 ROCm bring-up results - Document FLA import resolution, SGLang rollout device ordinal fix, and the full-profile rollout OOM finding. - Record that the smoke reached all service registration and step-0 training/logprob flow. --- # 🎨 Style ## Apply pre-commit formatting - Apply import ordering, doc formatting, script copyright, and formatting fixes from pre-commit.
Resolve AMD runner conflicts after the Qwen3-4B ROCm PR landed on main. Keep the Qwen3.5 smoke runner changes while preserving the main branch's Qwen3-4B baseline, and avoid hardcoded private IP defaults in local smoke wrappers. Made-with: Cursor
There was a problem hiding this comment.
Code Review
This pull request introduces a one-command smoke runner for Qwen3.5-9B on ROCm and updates the documentation with validation and troubleshooting details. Key changes include updating the ROCm Dockerfile to install TransformerEngine and flash-linear-attention, implementing dynamic master address detection in shell scripts, and optimizing rollout parameters for smoke testing. Review feedback identified redundant path entries in the PYTHONPATH environment variable within the newly added shell scripts.
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
# 🐛 Bug Fix ## Avoid debug logging failures during tiny smoke runs - Skip rollout metric logging when the transient training debug buffer does not include response length metadata yet. - Keep training from failing before the rollout/logprob path has completed. Made-with: Cursor --- # ⭐ Feature ## Tighten Qwen3.5-9B smoke defaults - Use a smaller global batch for the reduced smoke profile while preserving GRPO grouping constraints. - Record the latest validation findings in the AMD runbook.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
amd/run_qwen35-9b.shwrapper for Qwen3.5-9B DAPO-Math ROCm smoke runs.flash-linear-attentionin the ROCm Dockerfile so Megatron TE and Qwen3.5GatedDeltaNetcan initialize.autoattention, BSHD, disabled Dynamo/JIT fuser, and 1-GPU SGLang rollout engines.Validation
pre-commit run --all-files --show-diff-on-failurepasses.bash amd/run_qwen35-9b.shrestarted Ray cleanly and launched the direct runner.GatedDeltaNetinitialized after installingflash-linear-attention.FusedAttention backend (sub-backend 1)during Megatron logprob/training.Notes
The full-style Qwen3.5 rollout settings progressed much further than before but eventually hit HIP OOM in SGLang logits allocation while full token usage reached ~1.0. This PR keeps the bring-up work and sets safer defaults for the next smoke attempt.
Made with Cursor