Skip to content

Primus v26.3#159

Merged
gargrahul merged 14 commits into
ROCm:developfrom
gargrahul:gargrahul/primus_v26.3
May 23, 2026
Merged

Primus v26.3#159
gargrahul merged 14 commits into
ROCm:developfrom
gargrahul:gargrahul/primus_v26.3

Conversation

@gargrahul
Copy link
Copy Markdown
Collaborator

Primus v26.3 release introduces the following new models and upgrades previously supported models

Qwen 3 30B BF16/FP8
Qwen 3 235B BF16/FP8
GPT OSS 20B BF16/FP8
GPT OSS 120B BF16/FP8

Primus v26.3 release introduces the following new models and upgrades previously supported models

    Qwen 3 30B BF16/FP8
    Qwen 3 235B BF16/FP8
    GPT OSS 20B BF16/FP8
    GPT OSS 120B BF16/FP8
Copilot AI review requested due to automatic review settings May 20, 2026 21:23
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Updates MAD’s Primus integration to align with the Primus v26.3 release, adding new Megatron-LM training targets (Qwen3 30B/235B, GPT-OSS 20B/120B) and bumping Docker base images/documentation to the newer container stack.

Changes:

  • Add new Primus Megatron-LM model repos + datatype support logic (BF16/FP8 where applicable), and extend benchmark parsing to recognize these models.
  • Introduce a setup-time patch step to add training-log metrics summarization to Primus’ primus-cli-direct.sh.
  • Bump training Docker base images to rocm/primus:v26.3 and refresh benchmark README component versions.

Reviewed changes

Copilot reviewed 12 out of 12 changed files in this pull request and generated 6 comments.

Show a summary per file
File Description
scripts/pytorch_train/run.sh Ensure certain post-train model repos explicitly run with BF16.
scripts/primus/pytorch_train/primus_pytorch_benchmark_report.sh Adjust device→config mapping logic for pretrain benchmarks.
scripts/primus/megatron-lm/run.sh Add new model repo selectors + datatype support matrix updates.
scripts/primus/megatron-lm/primus_megatron-lm_benchmark_setup.sh Apply an inline patch to Primus to parse/append training metrics summaries.
scripts/primus/megatron-lm/primus_megatron-lm_benchmark_report.sh Add new models and refine benchmark execution behaviors.
scripts/primus/megatron-lm/primus_megatron-lm_benchmark_report.py Extend log parsing eligibility list for new models.
models.json Register new model repos for MAD runs (including skip-arch for GPT-OSS-120B).
docker/pytorch_train.ubuntu.amd.Dockerfile Bump base image to rocm/primus:v26.3.
docker/primus_pytorch_train.ubuntu.amd.Dockerfile Bump base image to rocm/primus:v26.3.
docker/primus_megatron_train.ubuntu.amd.Dockerfile Bump base image to rocm/primus:v26.3.
benchmark/pytorch_train/README.md Update component versions listed for the training container.
benchmark/megatron_lm/README.md Update component versions and document some new supported models + examples.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread scripts/primus/megatron-lm/primus_megatron-lm_benchmark_setup.sh Outdated
Comment thread scripts/primus/megatron-lm/primus_megatron-lm_benchmark_setup.sh Outdated
Comment thread scripts/primus/megatron-lm/primus_megatron-lm_benchmark_report.sh
Comment thread benchmark/pytorch_train/README.md
Comment thread benchmark/megatron_lm/README.md
Comment thread benchmark/megatron_lm/README.md
added examples for multi-node training of mixtral 8x22B and llama3.1-405B. Also made some other changes.
Copilot AI review requested due to automatic review settings May 21, 2026 01:35
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot was unable to review this pull request because the user who requested the review is ineligible. To be eligible to request a review, you need a paid Copilot license, or your organization must enable Copilot code review.

added multi-node training examples
Copilot AI review requested due to automatic review settings May 21, 2026 02:00
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot was unable to review this pull request because the user who requested the review is ineligible. To be eligible to request a review, you need a paid Copilot license, or your organization must enable Copilot code review.

Copilot AI review requested due to automatic review settings May 21, 2026 16:22
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot was unable to review this pull request because the user who requested the review is ineligible. To be eligible to request a review, you need a paid Copilot license, or your organization must enable Copilot code review.

Copilot AI review requested due to automatic review settings May 21, 2026 16:35
vidushi8
vidushi8 previously approved these changes May 21, 2026
Copy link
Copy Markdown
Contributor

@vidushi8 vidushi8 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 12 out of 12 changed files in this pull request and generated 8 comments.

Comment thread scripts/primus/megatron-lm/primus_megatron-lm_benchmark_setup.sh Outdated
Comment thread scripts/primus/megatron-lm/primus_megatron-lm_benchmark_setup.sh Outdated
Comment thread benchmark/megatron_lm/README.md
Comment thread benchmark/megatron_lm/README.md
Comment thread benchmark/megatron_lm/README.md Outdated
Comment thread benchmark/megatron_lm/README.md
Comment thread benchmark/megatron_lm/README.md Outdated
Comment thread benchmark/pytorch_train/README.md Outdated
@vidushi8 vidushi8 requested a review from clairesonglee May 21, 2026 16:40
amd-fuyuajin
amd-fuyuajin previously approved these changes May 21, 2026
Copy link
Copy Markdown
Contributor

@amd-fuyuajin amd-fuyuajin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Made Copilot suggested changes. Added multi-node training examples. Ready to merge.

Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
@amd-fuyuajin amd-fuyuajin dismissed stale reviews from vidushi8 and themself via e5ad91f May 21, 2026 17:03
Copilot AI review requested due to automatic review settings May 21, 2026 17:03
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot was unable to review this pull request because the user who requested the review is ineligible. To be eligible to request a review, you need a paid Copilot license, or your organization must enable Copilot code review.

Copilot AI review requested due to automatic review settings May 21, 2026 17:18
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot was unable to review this pull request because the user who requested the review is ineligible. To be eligible to request a review, you need a paid Copilot license, or your organization must enable Copilot code review.

```

The docker container hosts verified coomit `e16b27b` from [Primus repository](https://github.com/AMD-AGI/Primus/tree/e16b27bf6c1b2798f38848fc574fee60d9a9b902).
The docker container hosts verified commit `e16b27b` from [Primus repository](https://github.com/AMD-AGI/Primus/tree/e16b27bf6c1b2798f38848fc574fee60d9a9b902).
Copy link
Copy Markdown
Contributor

@peterjunpark peterjunpark May 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
The docker container hosts verified commit `e16b27b` from [Primus repository](https://github.com/AMD-AGI/Primus/tree/e16b27bf6c1b2798f38848fc574fee60d9a9b902).
The docker container hosts verified commit `43a6e0` from [Primus repository](https://github.com/AMD-AGI/Primus/tree/release/v26.3).

Should this be 43a6e006c419697208295c5523b99070e8198ad9? That's the head of the release branch https://github.com/AMD-AGI/Primus/commits/release/v26.3/

## 2. Configurations in Yaml Script (`‎examples/megatron/configs/`)
## 2. Configurations in yaml files (`‎examples/megatron/configs/`)

Primus defines training yaml for each model inside [‎examples/megatron/configs/](https://github.com/AMD-AGI/Primus/tree/e16b27bf6c1b2798f38848fc574fee60d9a9b902/examples/megatron/configs) repository. For example, use `examples/megatron/configs/llama3.1_8B-pretrain.yaml` for updating llama3.1_8B training parameters. Other yaml for the supported model can be found with `examples/megatron/configs/${MODEL_NAME}-pretrain.yaml` naming convention in this repository.
Copy link
Copy Markdown
Contributor

@peterjunpark peterjunpark May 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Primus defines training yaml for each model inside [‎examples/megatron/configs/](https://github.com/AMD-AGI/Primus/tree/e16b27bf6c1b2798f38848fc574fee60d9a9b902/examples/megatron/configs) repository. For example, use `examples/megatron/configs/llama3.1_8B-pretrain.yaml` for updating llama3.1_8B training parameters. Other yaml for the supported model can be found with `examples/megatron/configs/${MODEL_NAME}-pretrain.yaml` naming convention in this repository.
Primus defines training yaml for each model inside [‎examples/megatron/configs/](https://github.com/AMD-AGI/Primus/tree/release/v26.3) repository. For example, use `examples/megatron/configs/llama3.1_8B-pretrain.yaml` for updating llama3.1_8B training parameters. Other yaml for the supported model can be found with `examples/megatron/configs/${MODEL_NAME}-pretrain.yaml` naming convention in this repository.

@gargrahul gargrahul merged commit 1b4ec50 into ROCm:develop May 23, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants