Skip to content

[mlir][xegpu] Refine layout assignment in XeGPU SIMT distribution. #142687

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 53 commits into
base: main
Choose a base branch
from

Conversation

charithaintc
Copy link
Contributor

@charithaintc charithaintc commented Jun 3, 2025

Changes:

  • Decouple layout propagation from subgroup distribution and move it to an independent pass.
  • Refine layout assignment to handle control-flow ops correctly (scf.for).
  • Refine test cases.

@charithaintc charithaintc marked this pull request as ready for review June 5, 2025 21:15
@charithaintc
Copy link
Contributor Author

@adam-smnk sorry to bother you. :-P would you have some time to give me a general review on this or approve? :-)

nullptr);
terminator.getSuccessorRegions(operands, successors);

for (mlir::RegionSuccessor &successor : successors) {
Copy link
Contributor

@chencha3 chencha3 Jun 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this also handle the block arguments of forOp, which is also handled by BranchOpInterface? updateBranchOpInterface maybe not needed in this case.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

great point. I checked this again and seems like you are right. BranchOpInterface is doing some redundant work here.

I removed updateBranchOpInterface and also added some comment with example to clarify the logic.

one caveat is that we still need to a check for region ops and filter them out (done in updateOp).

Copy link
Contributor

@adam-smnk adam-smnk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor comments
General structure looks good, I'll have to leave details to @chencha3 but I'll try to make another pass later

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

General formatting comment: please add more line breaks (both for IR and checks with CHECK-SAME). The longest line right now is almost 500 characters wide which really impacts readability.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good point. I added more line breaks.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similarly, a small cleanup could be done here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found its not very severe in this case. so keep unchanged for now.

})
.Case<mlir::FunctionOpInterface>(
[&](mlir::FunctionOpInterface funcOp) {
r = updateFunctionOpInterface(builder, funcOp,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For completeness, are function calls covered too?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

currently no. we don't expect function calls inside a kernel in my understanding.

Copy link
Contributor

@chencha3 chencha3 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM with some nit suggestions.

/// clang-format on
/// In this example, at scf.yield, control-flow can transfer to successor
/// regions. One is the ^bb0 (for loop body) and the other is the scf.for op
/// itself (yield the results). So we update both the block arguments of the
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This may need to double check. My understanding is that the other is not scf.for itself, it is the region right following the scf.for.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

after out discussion I updated the comment again. thanks for the suggestions.

@@ -12,6 +12,8 @@
#include "mlir/Dialect/GPU/IR/GPUDialect.h"
#include "mlir/Dialect/GPU/Utils/DistributionUtils.h"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: some headers seem not needed. Could you help to clean them? make the header includes as less as possible.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks. I cleaned up unused headers shown by vscode. let me know if there is a better way to find this out.

//
//===----------------------------------------------------------------------===//

#include "mlir/Analysis/DataFlow/ConstantPropagationAnalysis.h"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clean up un-related header includes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants