Build Triton from source and bump Primus-Turbo by kyle-256 · Pull Request #694 · AMD-AGI/Primus

kyle-256 · 2026-04-28T05:37:44Z

Summary

Compile Triton from source (release/3.7.x @ 88b227e) inside the docker image
Wire a new TRITON_COMMIT build-arg through ci.yaml so both the main and JAX docker builds receive it
Bump PRIMUS_TURBO_COMMIT to ecb7e5c in ci.yaml and benchmark.yaml

Copilot

Pull request overview

Builds Triton from a pinned source commit during Docker image build and updates workflow-pinned Primus-Turbo commit references.

Changes:

Add TRITON_COMMIT build-arg and build/install Triton from triton-lang/triton (release/3.7.x @ pinned commit) in the Dockerfile.
Wire TRITON_COMMIT through .github/workflows/ci.yaml so both PyTorch and JAX image builds receive it.
Bump PRIMUS_TURBO_COMMIT in CI and benchmark workflows.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.

File	Description
.github/workflows/docker/Dockerfile	Adds Triton build-from-source step parameterized by `TRITON_COMMIT`.
.github/workflows/ci.yaml	Passes `TRITON_COMMIT` into Docker builds; bumps `PRIMUS_TURBO_COMMIT`.
.github/workflows/benchmark.yaml	Bumps `PRIMUS_TURBO_COMMIT` used by benchmark workflow.

Copilot

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.

Copilot · 2026-04-28T07:40:28Z

+  PRIMUS_TURBO_COMMIT: ecb7e5cad1f5f86e5719f4afbe5e644b702d0aa9
+  PRIMUS_TURBO_AITER_COMMIT: 857f4d15775a29af153a2c68a2f8e8a8d696c986


PR description mentions bumping PRIMUS_TURBO_COMMIT, but this change also updates PRIMUS_TURBO_AITER_COMMIT. Please either update the PR description to include the AITER bump (and why) or revert this env change if it’s unintended.

Copilot · 2026-04-28T07:40:28Z

-          submodules: "recursive"
          path: Primus-Turbo
          ref: ${{ env.PRIMUS_TURBO_COMMIT }}
+      - name: Init Primus-Turbo submodules (full depth)


The step name says “(full depth)”, but actions/checkout here uses the default shallow clone and git submodule update is not given any depth options. Either rename the step to avoid implying full history, or set fetch-depth: 0 (and, if needed, configure submodule depth explicitly) to match the intent.

Suggested change

- name: Init Primus-Turbo submodules (full depth)

- name: Init Primus-Turbo submodules

Copilot

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.

Copilot · 2026-04-30T01:03:37Z

+      - name: Init Primus-Turbo submodules (full clone)
+        working-directory: Primus-Turbo
+        run: |
+          rm -rf 3rdparty/composable_kernel .git/modules/3rdparty/composable_kernel
+          git submodule sync --recursive
+          git submodule update --init --recursive


This job now runs git submodule commands in the checked-out Primus-Turbo directory, but safe.directory is only configured later after moving the repo to /tmp. On runners where the workspace ownership differs (common on self-hosted/containerized runners), git submodule ... can fail with "dubious ownership". Configure safe.directory (or run the submodule init/update) after moving to /tmp/Primus-Turbo where safe.directory is already set.

Build Triton from triton-lang/triton release/3.7.x branch in the Docker image

Bump PRIMUS_TURBO_COMMIT to ecb7e5c in ci.yaml and benchmark.yaml, and pass TRITON_COMMIT=88b227e to the main and jax docker builds so the Dockerfile's `git checkout ${TRITON_COMMIT}` step has a value. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Primus-Turbo ecb7e5c pins AITER_COMMIT to 857f4d1 in setup.py and calls aiter._flash_attn_forward() with an `out=` kwarg only the new aiter accepts. Bumping the CI pin matches Turbo's expectation.

actions/checkout@v4 fetches submodules with --depth=1, which fails when Primus-Turbo's pinned composable_kernel SHA is not at the tip of CK's default branch. Drop submodules: recursive from the action and run `git submodule update --init --recursive` ourselves.

The self-hosted JAX runner caches Primus-Turbo across runs, and a prior --depth=1 submodule fetch leaves a shallow CK clone in .git/modules/3rdparty/composable_kernel/. The new Turbo pin (78ae3835) lives on a non-default branch, so the cached shallow clone cannot resolve it and `git submodule update` reports "Unable to find current revision". Remove the cached submodule and its modules-dir before re-init so git performs a full fetch.

Bump Primus-Turbo to current main (ef5b58e) and set PRIMUS_TURBO_ATTN_V3_ATOMIC_FP32=1 in run-unittest-torch to work around the gfx942 sbhd backward guard added in #275 that rejects bf16+sbhd when is_v3_atomic_fp32=False. Drop the override once the upstream loosen_atomic_fp16_constraint fix lands in Primus-Turbo main.

Copilot

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.

Stock torchtitan qwen3 Attention transposes q/k/v from BSHD to BHSD before calling self.inner_attention (PyTorch SDPA contract). When PrimusTubroConverter swaps inner_attention for TurboAttention / flash_attn_func, those callees expect q/k/v with logical shape [B, S, H, D] (format encoded as strides — see primus_turbo.pytorch.ops.attention.attention_utils._infer_qkv_format). The BHSD-shape input mismatches the BSHD-shape contract: aiter mis-interprets the seq/heads axes (computing attention along the wrong axis) and the registered fake / real op shapes diverge under torch.compile, surfacing as an inductor stride assertion on attention_aiter_forward_impl for qwen3 (test_qwen3_0_6B fails with "expected size 16==16, stride 128==524288 at dim=1"). Mirror the LLaMA3 / LLaMA4 / DeepSeek-V3 Primus overrides: keep q/k/v in BSHD (drop the BHSD transpose) and let TurboAttention handle GQA natively (no repeat_kv needed). Patch qwen3.model.model.Attention via the existing torchtitan.primus_turbo.turbo_attention setup hook. Verified locally: - tests/trainer/test_torchtitan_trainer.py 12/12 pass (incl. test_qwen3_0_6B / test_qwen3_1_7B / test_qwen3_32B) - tests/trainer/test_megatron_trainer.py 17/17 pass

Copilot

Pull request overview

Copilot reviewed 5 out of 7 changed files in this pull request and generated 4 comments.

Copilot AI review requested due to automatic review settings April 28, 2026 05:37

kyle-256 requested review from Xiaoming-AMD, limou102 and wenxie-amd as code owners April 28, 2026 05:37

Copilot started reviewing on behalf of kyle-256 April 28, 2026 05:39 View session

Copilot AI reviewed Apr 28, 2026

View reviewed changes

Comment thread .github/workflows/docker/Dockerfile

Comment thread .github/workflows/docker/Dockerfile

wenxie-amd previously approved these changes Apr 28, 2026

View reviewed changes

kyle-256 dismissed wenxie-amd’s stale review via 3086192 April 28, 2026 07:21

Copilot AI review requested due to automatic review settings April 28, 2026 07:35

Copilot started reviewing on behalf of kyle-256 April 28, 2026 07:37 View session

Copilot AI reviewed Apr 28, 2026

View reviewed changes

Copilot AI review requested due to automatic review settings April 30, 2026 00:57

Copilot started reviewing on behalf of kyle-256 April 30, 2026 00:59 View session

Copilot AI reviewed Apr 30, 2026

View reviewed changes

kyle-256 and others added 6 commits May 6, 2026 05:44

Add Triton compilation from source to Dockerfile

984085a

Build Triton from triton-lang/triton release/3.7.x branch in the Docker image

Sync PRIMUS_TURBO_AITER_COMMIT with Primus-Turbo ecb7e5c

49336c0

Primus-Turbo ecb7e5c pins AITER_COMMIT to 857f4d1 in setup.py and calls aiter._flash_attn_forward() with an `out=` kwarg only the new aiter accepts. Bumping the CI pin matches Turbo's expectation.

Copilot AI review requested due to automatic review settings May 6, 2026 05:46

kyle-256 force-pushed the dev/kyle_update_triton branch from 822a1d7 to 23744d2 Compare May 6, 2026 05:46

Copilot started reviewing on behalf of kyle-256 May 6, 2026 05:48 View session

Copilot AI reviewed May 6, 2026

View reviewed changes

Comment thread .github/workflows/docker/Dockerfile

Comment thread .github/workflows/docker/Dockerfile

kyle-256 force-pushed the dev/kyle_update_triton branch from 23744d2 to 85a2a24 Compare May 6, 2026 11:28

Copilot AI review requested due to automatic review settings May 7, 2026 03:41

kyle-256 force-pushed the dev/kyle_update_triton branch from 85a2a24 to 4289676 Compare May 7, 2026 03:41

Copilot started reviewing on behalf of kyle-256 May 7, 2026 03:43 View session

Copilot AI reviewed May 7, 2026

View reviewed changes

Comment thread .github/workflows/ci.yaml

Comment thread .github/workflows/ci.yaml

Comment thread primus/backends/torchtitan/patches/turbo/attention_patches.py

Comment thread .github/workflows/benchmark.yaml

Merge branch 'main' into dev/kyle_update_triton

1e7a7d2

wenxie-amd approved these changes May 7, 2026

View reviewed changes

wenxie-amd merged commit 9a52f9a into main May 7, 2026
6 checks passed

cursor Bot mentioned this pull request May 29, 2026

[Primus weekly report] 2026-W22 #736

Draft

		PRIMUS_TURBO_COMMIT: ecb7e5cad1f5f86e5719f4afbe5e644b702d0aa9
		PRIMUS_TURBO_AITER_COMMIT: 857f4d15775a29af153a2c68a2f8e8a8d696c986

	- name: Init Primus-Turbo submodules (full depth)
	- name: Init Primus-Turbo submodules

Conversation

kyle-256 commented Apr 28, 2026

Summary

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI Apr 28, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 28, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Copilot AI Apr 30, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants