-
Notifications
You must be signed in to change notification settings - Fork 14.6k
[mlir][vector] Support complete folding in single pass for vector.insert/vector.extract #142124
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[mlir][vector] Support complete folding in single pass for vector.insert/vector.extract #142124
Conversation
…ert/vector.extract After successfully converting dynamic indices to static indices, continue folding instead of returning early, allowing subsequent fold operations to be executed.
Thank you for submitting a Pull Request (PR) to the LLVM Project! This PR will be automatically labeled and the relevant teams will be notified. If you wish to, you can add reviewers by using the "Reviewers" section on this page. If this is not working for you, it is probably because you do not have write permissions for the repository. In which case you can instead tag reviewers by name in a comment by using If you have received no comments on your PR for a week, you can request a review by "ping"ing the PR by adding a comment “Ping”. The common courtesy "ping" rate is once a week. Please remember that you are asking for valuable time from other developers. If you have further questions, they may be answered by the LLVM GitHub User Guide. You can also ask questions in a comment on this PR, on the LLVM Discord or on the forums. |
@llvm/pr-subscribers-mlir-tensor @llvm/pr-subscribers-mlir Author: Yang Bai (yangtetris) ChangesDescriptionThis patch improves the folding efficiency of MotivationSince the
If we use
But this is not the optimal result. Full diff: https://github.com/llvm/llvm-project/pull/142124.diff 1 Files Affected:
diff --git a/mlir/lib/Dialect/Vector/IR/VectorOps.cpp b/mlir/lib/Dialect/Vector/IR/VectorOps.cpp
index 890a5e9e5c9b4..2e0c917b2139d 100644
--- a/mlir/lib/Dialect/Vector/IR/VectorOps.cpp
+++ b/mlir/lib/Dialect/Vector/IR/VectorOps.cpp
@@ -2062,6 +2062,7 @@ static Value extractInsertFoldConstantOp(OpType op, AdaptorType adaptor,
if (opChange) {
op.setStaticPosition(staticPosition);
op.getOperation()->setOperands(operands);
+ // Return the original result to indicate an in-place folding happened.
return op.getResult();
}
return {};
@@ -2148,8 +2149,8 @@ OpFoldResult ExtractOp::fold(FoldAdaptor adaptor) {
// Fold `arith.constant` indices into the `vector.extract` operation. Make
// sure that patterns requiring constant indices are added after this fold.
SmallVector<Value> operands = {getVector()};
- if (auto val = extractInsertFoldConstantOp(*this, adaptor, operands))
- return val;
+ auto inplaceFolded = extractInsertFoldConstantOp(*this, adaptor, operands);
+
if (auto res = foldPoisonIndexInsertExtractOp(
getContext(), adaptor.getStaticPosition(), kPoisonIndex))
return res;
@@ -2171,7 +2172,8 @@ OpFoldResult ExtractOp::fold(FoldAdaptor adaptor) {
return val;
if (auto val = foldScalarExtractFromFromElements(*this))
return val;
- return OpFoldResult();
+
+ return inplaceFolded;
}
namespace {
@@ -3150,8 +3152,8 @@ OpFoldResult vector::InsertOp::fold(FoldAdaptor adaptor) {
// Fold `arith.constant` indices into the `vector.insert` operation. Make
// sure that patterns requiring constant indices are added after this fold.
SmallVector<Value> operands = {getValueToStore(), getDest()};
- if (auto val = extractInsertFoldConstantOp(*this, adaptor, operands))
- return val;
+ auto inplaceFolded = extractInsertFoldConstantOp(*this, adaptor, operands);
+
if (auto res = foldPoisonIndexInsertExtractOp(
getContext(), adaptor.getStaticPosition(), kPoisonIndex))
return res;
@@ -3161,7 +3163,7 @@ OpFoldResult vector::InsertOp::fold(FoldAdaptor adaptor) {
return res;
}
- return {};
+ return inplaceFolded;
}
//===----------------------------------------------------------------------===//
|
The existing |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you please provide a test for this?
Thanks for fixing this problem! For testing, would it make sense to add a pass similar to |
I just added two tests in |
I happened to see an existing pass that implements what you said. Do you think it is a good choice to use TestConstantFold for folding tests? |
I was not aware of that pass. It looks like pretty focus on folding constants only. I'm also surprised that we don't have a pass to test the op folder in isolation. @joker-eph, @River707, @jpienaar, do you know? |
I don't see a focus on constant for this pass: it only has some special handling to cleanup constants after it's done as far as I can tell. Otherwise it is a single traversal applying the folder. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great, LGTM!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very nice! LGTM.
I've confirmed that this does as expected by running
mlir-opt input.mlir
mlir-opt input.mlir -test-constant-fold
mlir-opt input.mlir -test-constant-fold -test-constant-fold
before and after.
The 'constant' in the pass and test file name are indeed a bit confusing, but probably not worth the churn of changing at this point.
Thanks!
To me, this raises the broader question:
If the goal is to verify that the fold happens in a single pass of the folder, that’s effectively testing an implementation detail or driver behaviour - and as far as I know, we don’t typically test that. Instead, our tests tend to focus on simple input vs. output IR comparison.
I’m not particularly in favor of using that. It works in this case, but it was designed for a slightly different purpose. Using it here would just introduce inconsistency in how we test folding, and nothing prevents someone from later refactoring it to run the folder multiple times, which would break the test's assumptions. In my view, we’re simply missing folding tests for llvm-project/mlir/test/Dialect/Vector/canonicalize.mlir Lines 3265 to 3289 in b4b86a7
Notably, there’s no test that shows This would also be easier to catch if:
At a minimum, I recommend moving the new tests to "canonicalize.mlir", so we at least maintain consistency in where we test folding behaviour. |
I agree that this introduces inconsistency about where we test the fold methods.
Can you please say what 'test diffs' are here? Is that running
Do you mean duplicate them, once in canonicalize.mlir and once in constant-fold.mlir? Having it only in canonicalize.mlir would not test this PR, because
Hmm I'm not sure. For a call to |
I meant IR diffs - that is, MLIR before vs after running a canonicalization or folding pass. Since this change doesn’t alter the final IR, the diffs should be empty (i.e., no changes to existing test output would be required). From what I understand, the new tests here illustrate a scenario where the change is beneficial (only one invocation of the folder is required), but the final IR is already correct, so the net IR diff is zero. (Please correct me if I'm mistaken.)
In my view, neither "canonicalize.mlir" nor "constant-fold.mlir" is appropriate for testing how many times the folder is invoked - and as far as I know, MLIR doesn’t currently test that sort of internal behavior. If we do want to verify “single-pass folding” explicitly, we should introduce dedicated infra for that - for example, a purpose-built test pass like the one Diego proposed earlier. That’s why I’m hesitant to use
No - I’d suggest having them only in canonicalize.mlir.
That’s true - but if the goal is to verify that the final IR is correct, then calling the folder multiple times isn’t a problem. If the goal is instead to confirm that the fold happens on the first attempt, then I agree that requires more precise infra. However, we don’t test that kind of behavior elsewhere today, so I’d argue it’s not necessary to block this PR on that. One possible compromise:
To clarify - my main concern is with reusing If verifying that the folder runs only once is considered essential, then I’d suggest renaming |
That all makes sense to me. I'm onboard with the proposals. Either (1) the proposed compromise (where a first PR adds the new tests in canonicalize.mlir) or (2) the refactoring, where first the |
Thank you for your valuable comment.
This is also my point. My project is an example - due to compilation time considerations, I hope to use
This makes sense to me. I hadn't considered these risks before.
I personally lean more towards the second option. But rather than refactoring |
That's fine with me, thanks for being flexible! On a related note, rather than creating a C++ pass (which would impact MLIR compilation time), why not try creating a TD Op instead? There are already Ops for applying canonicalisation patterns: You would probably have to introduce something new though, from transform.apply_patterns:
That's the opposite of what you need. @ftynse , is there a good TD mechanism to apply a pattern/folder exactly once? |
In my view, this issue and its fix state that canonicalization is not equivalent to folding when it comes to applying only folding patterns, which some of us (or at least me) were not aware of. I believe the solution should go in the direction of introducing a new That said, I don't think we should block this fix or request a major shift in the canonicalization/folding testing in MLIR within this PR. Adding a test to WDYAT? |
I find this framing a bit weird: the "how many times the folder is invoked" is a canonicalizer question, however the behavior of the folder outside of canonicalization is interesting. A simple walk + fold behavior is something that is worth it to the user.
What is the design goal for this pass? We can also ensure that the |
That's a good point! That should give us the behavior we want to test and keep things integrated with the rest of the canonicalization tests. |
It seems this approach requires some modifications to the GreedyPatternRewriteDriver. Through local testing, I found that while (!worklist.empty()) {
auto *op = worklist.pop();
if (succeeded(op->fold(foldResults))) {
// in-place fold
if (foldResults.empty())
addToWorklist(op);
}
} So |
If the expectation is that the folding of an operation happens in one shot and the result of a successful folding is the same operation, subsequent foldings on that operation should be no-op. Therefore, we shouldn't be adding that op back to the folding worklist. Could you try changing the |
I tried removing Take the following MLIR example:
In this case, the I’m wondering if it makes sense to add a new field like |
@yangtetris, thank you for all the investigation! Overall, the discussion in this PR has grown a bit beyond its original scope - figuring out how to ensure that we fold only once seems like a separate task, and probably deserves its own focused discussion. In situations like this, I think it's completely valid to create a GitHub issue and link to it from the relevant part of the code. You might consider filing a ticket along the lines of:
For this PR, I’d suggest following Mehdi’s recommendation:
This path would address both Mehdi’s original request (re: testing) and my later concern that To summarise:
How does it sound? Thanks again for all the discussion! |
@banach-space Thank you for the summary and your insights.
I agree that we should open a separate issue to specifically address the "fold once" concern.
That approach works well for me. Should I rename If there are no further suggestions from anyone in this discussion, I plan to commit the new changes within the next day or two. |
You can do it all in this PR |
@banach-space @dcaballe @joker-eph |
✅ With the latest revision this PR passed the C/C++ code formatter. |
Can you run |
Fixed, thanks. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Many thanks for seeing this through and for bearing with us, this is a much appreciated contribution.
@yangtetris Congratulations on having your first Pull Request (PR) merged into the LLVM Project! Your changes will be combined with recent changes from other authors, then tested by our build bots. If there is a problem with a build, you may receive a report in an email or a comment on this PR. Please check whether problems have been caused by your change specifically, as the builds can include changes from many authors. It is not uncommon for your change to be included in a build that fails due to someone else's changes, or infrastructure issues. How to do this, and the rest of the post-merge process, is covered in detail here. If your change does cause a problem, it may be reverted, or you can revert it yourself. This is a normal part of LLVM development. You can fix your changes and open a new PR to merge them again. If you don't get any reports, no action is required from you. Your changes are working as expected, well done! |
Description
This patch improves the folding efficiency of
vector.insert
andvector.extract
operations by not returning early after successfully converting dynamic indices to static indices.This PR also renames the test pass
TestConstantFold
toTestSingleFold
and adds comprehensive documentation explaining the single-pass folding behavior.Motivation
Since the
OpBuilder::createOrFold
function only callsfold
once, the currentfold
methods ofvector.insert
andvector.extract
may leave the op in a state that can be folded further. For example, consider the following un-folded IR:If we use
createOrFold
to create thevector.extract
op, then the result will be:But this is not the optimal result.
createOrFold
should have returned%e1
.The reason is that the execution of fold returns immediately after
extractInsertFoldConstantOp
, causing subsequent folding logics to be skipped.