Add validation for separate GEMM and all-scatter operations in example 20#234
Merged
mawad-amd merged 5 commits intomuhosama/all-scatter-gemm-separatev2from Oct 13, 2025
Conversation
…e 20 Co-authored-by: neoblizz <9790745+neoblizz@users.noreply.github.com>
Co-authored-by: neoblizz <9790745+neoblizz@users.noreply.github.com>
Co-authored-by: neoblizz <9790745+neoblizz@users.noreply.github.com>
Copilot
AI
changed the title
[WIP] Add validation for example 20 operations
Add validation for separate GEMM and all-scatter operations in example 20
Oct 13, 2025
mawad-amd
reviewed
Oct 13, 2025
…per-rank columns Co-authored-by: mawad-amd <112003944+mawad-amd@users.noreply.github.com>
mawad-amd
approved these changes
Oct 13, 2025
c18aded
into
muhosama/all-scatter-gemm-separatev2
3 checks passed
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
Example 20 (
20_gemm_all_scatter_independent) performs GEMM and all-scatter as separate, independent operations, but the validation only checked the GEMM result and didn't properly validate the all-scatter communication. Additionally, there was no support for using different tensor dimensions for the communication operation versus the GEMM operation.Solution
This PR adds comprehensive validation for both operations and support for separate tensor dimensions.
New validation function
Added
validate_all_scatter()inexamples/common/validation.pyto validate all-scatter communication patterns:(M, N × world_size)rank × Nto(rank + 1) × N)Updated example 20
New command-line arguments:
--m_comm: Number of rows for communication tensor (defaults tom)--n_comm: Total number of columns for communication tensor (defaults ton)Separate validation:
A @ B == Csuccess_gemmandsuccess_commfields for detailed reportingExample usage:
Implementation details
The validation correctly handles the all-scatter pattern where:
The
n_commargument represents the total number of columns (consistent withnsemantics for GEMM), and internallyn_comm_local = n_comm // world_sizeis computed for per-rank columns. The all-scatter kernel takesn_comm_localand produces anm_comm × n_commtensor that is replicated across all ranks.The
validate_all_scatter()function is reusable and can be adopted by other examples that perform all-scatter operations.Testing
ruff checkandruff format)Fixes #233
Original prompt
Fixes #233
💡 You can make Copilot smarter by setting up custom instructions, customizing its development environment and configuring Model Context Protocol (MCP) servers. Learn more Copilot coding agent tips in the docs.