Skip to content

Primus docker release/v26.2#579

Merged
wenxie-amd merged 13 commits intomainfrom
release/v26.2
Mar 13, 2026
Merged

Primus docker release/v26.2#579
wenxie-amd merged 13 commits intomainfrom
release/v26.2

Conversation

@clairesonglee
Copy link
Contributor

No description provided.

vidushi8 and others added 8 commits February 12, 2026 19:13
…32B Configs for MI300X & MI355X (#556)

YF: Only SFT related config and Doc changes, bypassing unit CI tests

## Summary

This PR introduces post-training documentation and updates Qwen3 32B
model configuration files to support AMD MI300X and MI355X accelerators.

---

## Changes

### 📘 Documentation

- **Added `posttraining.md`**
  - New comprehensive guide for post-training workflows
  - Covers setup instructions, configuration details, and usage examples

- **Updated `docs/README.md`**
  - Added a new section referencing post-training documentation
  - Improved documentation organization and navigation

---

### ⚙️ Configuration Updates

- **Updated Qwen3_32B model YAML configs**
  - Added/modified configurations optimized for:
    - MI300X
    - MI355X
  - Adjusted parameters for compatibility and stable execution

---

## Validation

- Verified updated configs load and execute successfully on MI300X and
MI355X environments
- Confirmed documentation links and structure render correctly

---

## Checklist

- [x] Added `posttraining.md`
- [x] Updated `docs/README.md`
- [x] Modified Qwen3_32B YAML configs
- [x] Verified changes locally
Co-authored-by: Mingyu Yang <Mingyu.Yang@amd.com>
Co-authored-by: Copilot Autofix powered by AI <223894421+github-code-quality[bot]@users.noreply.github.com>
Co-authored-by: Kailash Gogineni <gkailashnath1998@gmail.com>
Co-authored-by: HuangWei-95 <Wei.Huang4@amd.com>
Co-authored-by: HuangWei-95 <weihuan@amd.com>
Co-authored-by: Xiaoming-AMD <Xiaoming.Peng@amd.com>
Co-authored-by: WangLingxun <linxwang@amd.com>
…578)

Expand projection.md with memory projection and performance details.
Comment on lines +31 to +34
from megatron.core.tensor_parallel import (
InferenceLayerNormColumnParallelLinear,
InferenceRowParallelLinear,
)
work_group: ${PRIMUS_TEAM:amd}
user_name: ${PRIMUS_USER:root}
exp_name: ${PRIMUS_EXP_NAME:mamba_370M-pretrain}
workspace: ${PRIMUS_WORKSPACE:./output}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add mamba and zebra model training in the UT tests/trainer/test_megatron_trainer.py

vidushi8 and others added 4 commits March 5, 2026 19:51
…581)

Hook Megatron validate_args alongside parse_args so Primus-injected
arguments are validated consistently, and run additional ROCM-specific
argument checks during initialization.
@wenxie-amd wenxie-amd merged commit 7f1dae4 into main Mar 13, 2026
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants