Primus docker release/v26.2 by clairesonglee · Pull Request #579 · AMD-AGI/Primus

clairesonglee · 2026-03-04T01:23:24Z

No description provided.

…32B Configs for MI300X & MI355X (#556) YF: Only SFT related config and Doc changes, bypassing unit CI tests ## Summary This PR introduces post-training documentation and updates Qwen3 32B model configuration files to support AMD MI300X and MI355X accelerators. --- ## Changes ### 📘 Documentation - **Added `posttraining.md`** - New comprehensive guide for post-training workflows - Covers setup instructions, configuration details, and usage examples - **Updated `docs/README.md`** - Added a new section referencing post-training documentation - Improved documentation organization and navigation --- ### ⚙️ Configuration Updates - **Updated Qwen3_32B model YAML configs** - Added/modified configurations optimized for: - MI300X - MI355X - Adjusted parameters for compatibility and stable execution --- ## Validation - Verified updated configs load and execute successfully on MI300X and MI355X environments - Confirmed documentation links and structure render correctly --- ## Checklist - [x] Added `posttraining.md` - [x] Updated `docs/README.md` - [x] Modified Qwen3_32B YAML configs - [x] Verified changes locally

Co-authored-by: Mingyu Yang <Mingyu.Yang@amd.com> Co-authored-by: Copilot Autofix powered by AI <223894421+github-code-quality[bot]@users.noreply.github.com> Co-authored-by: Kailash Gogineni <gkailashnath1998@gmail.com> Co-authored-by: HuangWei-95 <Wei.Huang4@amd.com> Co-authored-by: HuangWei-95 <weihuan@amd.com> Co-authored-by: Xiaoming-AMD <Xiaoming.Peng@amd.com> Co-authored-by: WangLingxun <linxwang@amd.com>

…578) Expand projection.md with memory projection and performance details.

primus/backends/megatron/core/models/hybrid/hybrid_mamba_mla_layer_specs.py

+    from megatron.core.tensor_parallel import (
+        InferenceLayerNormColumnParallelLinear,
+        InferenceRowParallelLinear,
+    )


wenxie-amd · 2026-03-04T01:41:09Z

examples/megatron/configs/MI300X/mamba_370M-pretrain.yaml

+work_group: ${PRIMUS_TEAM:amd}
+user_name: ${PRIMUS_USER:root}
+exp_name: ${PRIMUS_EXP_NAME:mamba_370M-pretrain}
+workspace: ${PRIMUS_WORKSPACE:./output}


Please add mamba and zebra model training in the UT tests/trainer/test_megatron_trainer.py

…581) Hook Megatron validate_args alongside parse_args so Primus-injected arguments are validated consistently, and run additional ROCM-specific argument checks during initialization.

vidushi8 and others added 8 commits February 12, 2026 19:13

Update torchtitan batxh size and enable CE fusion

43eacb6

update MI355 yaml for better perf

8db36dc

update yaml

143d593

tune hybrid model mi300x configs

365758c

tune hybrid model mi355x configs

06d8e1e

Expand projection.md with memory projection and performance details. (#…

fadaeb1

…578) Expand projection.md with memory projection and performance details.

clairesonglee requested review from Xiaoming-AMD, limou102 and wenxie-amd as code owners March 4, 2026 01:23

github-code-quality bot found potential problems Mar 4, 2026

View reviewed changes

primus/backends/megatron/core/models/hybrid/hybrid_mamba_mla_layer_specs.py

Comment on lines +31 to +34

from megatron.core.tensor_parallel import (

InferenceLayerNormColumnParallelLinear,

InferenceRowParallelLinear,

)

Merge branch 'main' into release/v26.2

4262afd

wenxie-amd reviewed Mar 4, 2026

View reviewed changes

vidushi8 and others added 4 commits March 5, 2026 19:51

update yamls to fix regressions and standardize

d32360e

fix(megatron): patch validate_args and add ROCM argument validation (#…

bc4c861

…581) Hook Megatron validate_args alongside parse_args so Primus-injected arguments are validated consistently, and run additional ROCM-specific argument checks during initialization.

Merge branch 'main' into release/v26.2

c6754d9

fix code-lint issue

0420f1a

wenxie-amd merged commit 7f1dae4 into main Mar 13, 2026
5 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Primus docker release/v26.2#579

Primus docker release/v26.2#579
wenxie-amd merged 13 commits intomainfrom
release/v26.2

clairesonglee commented Mar 4, 2026

Uh oh!

wenxie-amd Mar 4, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

Conversation

clairesonglee commented Mar 4, 2026

Uh oh!

wenxie-amd Mar 4, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants