-
Notifications
You must be signed in to change notification settings - Fork 14k
[mlir][xegpu] Refine layout assignment in XeGPU SIMT distribution. #142687
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
@adam-smnk sorry to bother you. :-P would you have some time to give me a general review on this or approve? :-) |
nullptr); | ||
terminator.getSuccessorRegions(operands, successors); | ||
|
||
for (mlir::RegionSuccessor &successor : successors) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this also handle the block arguments of forOp, which is also handled by BranchOpInterface? updateBranchOpInterface
maybe not needed in this case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
great point. I checked this again and seems like you are right. BranchOpInterface is doing some redundant work here.
I removed updateBranchOpInterface
and also added some comment with example to clarify the logic.
one caveat is that we still need to a check for region ops and filter them out (done in updateOp).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor comments
General structure looks good, I'll have to leave details to @chencha3 but I'll try to make another pass later
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
General formatting comment: please add more line breaks (both for IR and checks with CHECK-SAME
). The longest line right now is almost 500 characters wide which really impacts readability.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good point. I added more line breaks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similarly, a small cleanup could be done here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I found its not very severe in this case. so keep unchanged for now.
}) | ||
.Case<mlir::FunctionOpInterface>( | ||
[&](mlir::FunctionOpInterface funcOp) { | ||
r = updateFunctionOpInterface(builder, funcOp, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For completeness, are function calls covered too?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
currently no. we don't expect function calls inside a kernel in my understanding.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM with some nit suggestions.
/// clang-format on | ||
/// In this example, at scf.yield, control-flow can transfer to successor | ||
/// regions. One is the ^bb0 (for loop body) and the other is the scf.for op | ||
/// itself (yield the results). So we update both the block arguments of the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This may need to double check. My understanding is that the other is not scf.for itself, it is the region right following the scf.for.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
after out discussion I updated the comment again. thanks for the suggestions.
@@ -12,6 +12,8 @@ | |||
#include "mlir/Dialect/GPU/IR/GPUDialect.h" | |||
#include "mlir/Dialect/GPU/Utils/DistributionUtils.h" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: some headers seem not needed. Could you help to clean them? make the header includes as less as possible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks. I cleaned up unused headers shown by vscode. let me know if there is a better way to find this out.
// | ||
//===----------------------------------------------------------------------===// | ||
|
||
#include "mlir/Analysis/DataFlow/ConstantPropagationAnalysis.h" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
clean up un-related header includes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done.
Changes: