Skip to content

[mlir][vector] Support complete folding in single pass for vector.insert/vector.extract #142124

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

yangtetris
Copy link
Contributor

@yangtetris yangtetris commented May 30, 2025

Description

This patch improves the folding efficiency of vector.insert and vector.extract operations by not returning early after successfully converting dynamic indices to static indices.

This PR also renames the test pass TestConstantFold to TestSingleFold and adds comprehensive documentation explaining the single-pass folding behavior.

Motivation

Since the OpBuilder::createOrFold function only calls fold once, the current fold methods of vector.insert and vector.extract may leave the op in a state that can be folded further. For example, consider the following un-folded IR:

%v1 = vector.insert %e1, %v0 [0] : f32 into vector<128xf32>
%c0 = arith.constant 0 : index
%e2 = vector.extract %v1[%c0] : f32 from vector<128xf32>

If we use createOrFold to create the vector.extract op, then the result will be:

%v1 = vector.insert %e1, %v0 [127] : f32 into vector<128xf32>
%e2 = vector.extract %v1[0] : f32 from vector<128xf32>

But this is not the optimal result. createOrFold should have returned %e1.
The reason is that the execution of fold returns immediately after extractInsertFoldConstantOp, causing subsequent folding logics to be skipped.

…ert/vector.extract

After successfully converting dynamic indices to static indices, continue folding
instead of returning early, allowing subsequent fold operations to be executed.
Copy link

Thank you for submitting a Pull Request (PR) to the LLVM Project!

This PR will be automatically labeled and the relevant teams will be notified.

If you wish to, you can add reviewers by using the "Reviewers" section on this page.

If this is not working for you, it is probably because you do not have write permissions for the repository. In which case you can instead tag reviewers by name in a comment by using @ followed by their GitHub username.

If you have received no comments on your PR for a week, you can request a review by "ping"ing the PR by adding a comment “Ping”. The common courtesy "ping" rate is once a week. Please remember that you are asking for valuable time from other developers.

If you have further questions, they may be answered by the LLVM GitHub User Guide.

You can also ask questions in a comment on this PR, on the LLVM Discord or on the forums.

@llvmbot
Copy link
Member

llvmbot commented May 30, 2025

@llvm/pr-subscribers-mlir-tensor
@llvm/pr-subscribers-mlir-core
@llvm/pr-subscribers-mlir-tosa
@llvm/pr-subscribers-mlir-vector

@llvm/pr-subscribers-mlir

Author: Yang Bai (yangtetris)

Changes

Description

This patch improves the folding efficiency of vector.insert and vector.extract operations by not returning early after successfully converting dynamic indices to static indices.

Motivation

Since the OpBuilder::createOrFold function only calls fold once, the current fold methods of vector.insert and vector.extract may leave the op in a state that can be folded further. For example, consider the following un-folded IR:

%v1 = vector.insert %e1, %v0 [0] : f32 into vector&lt;128xf32&gt;
%c0 = arith.constant 0 : index
%e2 = vector.extract %v1[%c0] : f32 from vector&lt;128xf32&gt;

If we use createOrFold to create the vector.extract op, then the result will be:

%v1 = vector.insert %e1, %v0 [127] : f32 into vector&lt;128xf32&gt;
%e2 = vector.extract %v1[0] : f32 from vector&lt;128xf32&gt;

But this is not the optimal result. createOrFold should have returned %e1.
The reason is that the execution of fold returns immediately after extractInsertFoldConstantOp, causing subsequent folding logics to be skipped.


Full diff: https://github.com/llvm/llvm-project/pull/142124.diff

1 Files Affected:

  • (modified) mlir/lib/Dialect/Vector/IR/VectorOps.cpp (+8-6)
diff --git a/mlir/lib/Dialect/Vector/IR/VectorOps.cpp b/mlir/lib/Dialect/Vector/IR/VectorOps.cpp
index 890a5e9e5c9b4..2e0c917b2139d 100644
--- a/mlir/lib/Dialect/Vector/IR/VectorOps.cpp
+++ b/mlir/lib/Dialect/Vector/IR/VectorOps.cpp
@@ -2062,6 +2062,7 @@ static Value extractInsertFoldConstantOp(OpType op, AdaptorType adaptor,
   if (opChange) {
     op.setStaticPosition(staticPosition);
     op.getOperation()->setOperands(operands);
+    // Return the original result to indicate an in-place folding happened.
     return op.getResult();
   }
   return {};
@@ -2148,8 +2149,8 @@ OpFoldResult ExtractOp::fold(FoldAdaptor adaptor) {
   // Fold `arith.constant` indices into the `vector.extract` operation. Make
   // sure that patterns requiring constant indices are added after this fold.
   SmallVector<Value> operands = {getVector()};
-  if (auto val = extractInsertFoldConstantOp(*this, adaptor, operands))
-    return val;
+  auto inplaceFolded = extractInsertFoldConstantOp(*this, adaptor, operands);
+
   if (auto res = foldPoisonIndexInsertExtractOp(
           getContext(), adaptor.getStaticPosition(), kPoisonIndex))
     return res;
@@ -2171,7 +2172,8 @@ OpFoldResult ExtractOp::fold(FoldAdaptor adaptor) {
     return val;
   if (auto val = foldScalarExtractFromFromElements(*this))
     return val;
-  return OpFoldResult();
+
+  return inplaceFolded;
 }
 
 namespace {
@@ -3150,8 +3152,8 @@ OpFoldResult vector::InsertOp::fold(FoldAdaptor adaptor) {
   // Fold `arith.constant` indices into the `vector.insert` operation. Make
   // sure that patterns requiring constant indices are added after this fold.
   SmallVector<Value> operands = {getValueToStore(), getDest()};
-  if (auto val = extractInsertFoldConstantOp(*this, adaptor, operands))
-    return val;
+  auto inplaceFolded = extractInsertFoldConstantOp(*this, adaptor, operands);
+
   if (auto res = foldPoisonIndexInsertExtractOp(
           getContext(), adaptor.getStaticPosition(), kPoisonIndex))
     return res;
@@ -3161,7 +3163,7 @@ OpFoldResult vector::InsertOp::fold(FoldAdaptor adaptor) {
     return res;
   }
 
-  return {};
+  return inplaceFolded;
 }
 
 //===----------------------------------------------------------------------===//

@yangtetris
Copy link
Contributor Author

The existing Dialect/Vector/canonicalize.mlir file doesn't seem to be a good place to add test for this change, as canonicalization performs repeated folding, making it impossible to verify whether the optimization occurs in a single pass. Any suggestions on where/how to test this would be appreciated. Thanks!

Copy link
Collaborator

@joker-eph joker-eph left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please provide a test for this?

@dcaballe
Copy link
Contributor

Thanks for fixing this problem!

For testing, would it make sense to add a pass similar to --canonicalize that just invokes tryToFold once for every op? That would be very helpful to separate folding testing from canonicalization testing.

@yangtetris
Copy link
Contributor Author

Can you please provide a test for this?

I just added two tests in test/Dialect/Vector/constant-fold.mlir. Please take another look.

@yangtetris
Copy link
Contributor Author

For testing, would it make sense to add a pass similar to --canonicalize that just invokes tryToFold once for every op? That would be very helpful to separate folding testing from canonicalization testing.

I happened to see an existing pass that implements what you said. Do you think it is a good choice to use TestConstantFold for folding tests?

@dcaballe
Copy link
Contributor

dcaballe commented Jun 3, 2025

I was not aware of that pass. It looks like pretty focus on folding constants only. I'm also surprised that we don't have a pass to test the op folder in isolation. @joker-eph, @River707, @jpienaar, do you know?

@joker-eph
Copy link
Collaborator

I don't see a focus on constant for this pass: it only has some special handling to cleanup constants after it's done as far as I can tell. Otherwise it is a single traversal applying the folder.

Copy link
Contributor

@dcaballe dcaballe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great, LGTM!

@dcaballe dcaballe requested a review from newling June 5, 2025 17:38
Copy link
Contributor

@newling newling left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice! LGTM.

I've confirmed that this does as expected by running

mlir-opt input.mlir
mlir-opt input.mlir -test-constant-fold
mlir-opt input.mlir -test-constant-fold -test-constant-fold

before and after.

The 'constant' in the pass and test file name are indeed a bit confusing, but probably not worth the churn of changing at this point.

@banach-space
Copy link
Contributor

Thanks!

The existing Dialect/Vector/canonicalize.mlir file doesn't seem to be a good place to add test for this change, as canonicalization performs repeated folding, making it impossible to verify whether the optimization occurs in a single pass.

To me, this raises the broader question:

  • "How do we test these patterns?".

If the goal is to verify that the fold happens in a single pass of the folder, that’s effectively testing an implementation detail or driver behaviour - and as far as I know, we don’t typically test that. Instead, our tests tend to focus on simple input vs. output IR comparison.

I happened to see an existing pass that implements what you said. Do you think it is a good choice to use TestConstantFold for folding tests?

I’m not particularly in favor of using that. It works in this case, but it was designed for a slightly different purpose. Using it here would just introduce inconsistency in how we test folding, and nothing prevents someone from later refactoring it to run the folder multiple times, which would break the test's assumptions.

In my view, we’re simply missing folding tests for vector.insert / vector.extract in "canonicalize.mlir". This becomes clear when you look at the two cases here:

// -----
// CHECK-LABEL: @fold_extract_constant_indices
// CHECK-SAME: %[[ARG:.*]]: vector<32x1xi32>) -> i32 {
// CHECK: %[[RES:.*]] = vector.extract %[[ARG]][0, 0] : i32 from vector<32x1xi32>
// CHECK: return %[[RES]] : i32
func.func @fold_extract_constant_indices(%arg : vector<32x1xi32>) -> i32 {
%0 = arith.constant 0 : index
%1 = vector.extract %arg[%0, %0] : i32 from vector<32x1xi32>
return %1 : i32
}
// -----
// CHECK-LABEL: @fold_insert_constant_indices
// CHECK-SAME: %[[ARG:.*]]: vector<4x1xi32>) -> vector<4x1xi32> {
// CHECK: %[[VAL:.*]] = arith.constant 1 : i32
// CHECK: %[[RES:.*]] = vector.insert %[[VAL]], %[[ARG]] [0, 0] : i32 into vector<4x1xi32>
// CHECK: return %[[RES]] : vector<4x1xi32>
func.func @fold_insert_constant_indices(%arg : vector<4x1xi32>) -> vector<4x1xi32> {
%0 = arith.constant 0 : index
%1 = arith.constant 1 : i32
%res = vector.insert %1, %arg[%0, %0] : i32 into vector<4x1xi32>
return %res : vector<4x1xi32>
}

Notably, there’s no test that shows vector.insert / vector.extract being folded away entirely - like the examples you're adding here (unless I missed sth). Once such tests exist, a change like yours would ideally be validated by the absence of test diffs, which is more robust.

This would also be easier to catch if:

  • we clustered all folding tests for vector.insert and vector.extract together, and/or
  • moved them to a dedicated file.

At a minimum, I recommend moving the new tests to "canonicalize.mlir", so we at least maintain consistency in where we test folding behaviour.

@newling
Copy link
Contributor

newling commented Jun 9, 2025

I agree that this introduces inconsistency about where we test the fold methods.

Once such tests exist, a change like yours would ideally be validated by the absence of test diffs, which is more robust.

Can you please say what 'test diffs' are here? Is that running --canonicalize --debug and checking how many times the folder is called?

At a minimum, I recommend moving the new tests to "canonicalize.mlir", so we at least maintain consistency in where we test folding behaviour.

Do you mean duplicate them, once in canonicalize.mlir and once in constant-fold.mlir? Having it only in canonicalize.mlir would not test this PR, because --canonicalize will call the op's folder multiple times.

If the goal is to verify that the fold happens in a single pass of the folder, that’s effectively testing an implementation detail or driver behaviour

Hmm I'm not sure. For a call to createOrFold in a pass that walk the IR just once, the difference is observable, so not something I'd call an implementation detail.

@banach-space
Copy link
Contributor

Can you please say what 'test diffs' are here?

I meant IR diffs - that is, MLIR before vs after running a canonicalization or folding pass. Since this change doesn’t alter the final IR, the diffs should be empty (i.e., no changes to existing test output would be required).

From what I understand, the new tests here illustrate a scenario where the change is beneficial (only one invocation of the folder is required), but the final IR is already correct, so the net IR diff is zero. (Please correct me if I'm mistaken.)

Is that running --canonicalize --debug and checking how many times the folder is called?

In my view, neither "canonicalize.mlir" nor "constant-fold.mlir" is appropriate for testing how many times the folder is invoked - and as far as I know, MLIR doesn’t currently test that sort of internal behavior.

If we do want to verify “single-pass folding” explicitly, we should introduce dedicated infra for that - for example, a purpose-built test pass like the one Diego proposed earlier.

That’s why I’m hesitant to use -test-constant-fold for this. It happens to run the folder once today, but that’s not its design goal, and that behaviour could change at any time.

Do you mean duplicate them, once in canonicalize.mlir and once in constant-fold.mlir?

No - I’d suggest having them only in canonicalize.mlir.

Having it only in canonicalize.mlir would not test this PR, because --canonicalize will call the op's folder multiple times.

That’s true - but if the goal is to verify that the final IR is correct, then calling the folder multiple times isn’t a problem.

If the goal is instead to confirm that the fold happens on the first attempt, then I agree that requires more precise infra. However, we don’t test that kind of behavior elsewhere today, so I’d argue it’s not necessary to block this PR on that.

One possible compromise:

  • Extract the tests into a separate PR (they should pass even without this change)
  • In this PR, you could point to those tests and explain that the new logic ensures the same result is reached without requiring multiple fold iterations.

To clarify - my main concern is with reusing TestConstantFold for something it wasn’t designed for.

If verifying that the folder runs only once is considered essential, then I’d suggest renaming TestConstantFold and adding some documentation to clarify its (updated) intended purpose. That way, future contributors will better understand what the pass is meant to test and why it behaves the way it does.

@newling
Copy link
Contributor

newling commented Jun 9, 2025

That all makes sense to me. I'm onboard with the proposals. Either (1) the proposed compromise (where a first PR adds the new tests in canonicalize.mlir) or (2) the refactoring, where first the TestConstantFold pass gets renamed and documented.

@yangtetris
Copy link
Contributor Author

Thank you for your valuable comment.

For a call to createOrFold in a pass that walk the IR just once, the difference is observable, so not something I'd call an implementation detail.

This is also my point. My project is an example - due to compilation time considerations, I hope to use createOrFold to eliminate useless vector ops early, rather than using canonicalization or similar passes later. So this change will affect my resulting IR.

Using it here would just introduce inconsistency in how we test folding, and nothing prevents someone from later refactoring it to run the folder multiple times, which would break the test's assumptions.

This makes sense to me. I hadn't considered these risks before.

Either (1) the proposed compromise (where a first PR adds the new tests in canonicalize.mlir) or (2) the refactoring, where first the TestConstantFold pass gets renamed and documented.

I personally lean more towards the second option. But rather than refactoring TestConstantFold, I would prefer to use a new pass whose functionality only includes calling the folder once. The reason is that TestConstantFold includes the functionality of deleting unused ops, which might confuse the role of the folder. Moreover, this functionality is already depended upon by many tests, so I cannot directly remove it. @banach-space @newling @dcaballe Is it ok for you?

@banach-space
Copy link
Contributor

I personally lean more towards the second option.

That's fine with me, thanks for being flexible!

On a related note, rather than creating a C++ pass (which would impact MLIR compilation time), why not try creating a TD Op instead? There are already Ops for applying canonicalisation patterns:

You would probably have to introduce something new though, from transform.apply_patterns:

greedily applies the specified patterns to the body of the targeted op until a fixpoint was reached.

That's the opposite of what you need. @ftynse , is there a good TD mechanism to apply a pattern/folder exactly once?

@dcaballe
Copy link
Contributor

In my view, this issue and its fix state that canonicalization is not equivalent to folding when it comes to applying only folding patterns, which some of us (or at least me) were not aware of. I believe the solution should go in the direction of introducing a new -fold pass, similar to the -canonicalize one, and ensuring that we test both approaches as needed, MLIR-wide.

That said, I don't think we should block this fix or request a major shift in the canonicalization/folding testing in MLIR within this PR. Adding a test to canonicalize.mlir or using the constant fold approach should be enough to land this fix. However, we should file a bug to revisit canonicalization/folding testing across MLIR and reference this PR. This is something that may need some broader discussion.

WDYAT?

@joker-eph
Copy link
Collaborator

joker-eph commented Jun 10, 2025

is appropriate for testing how many times the folder is invoked - and as far as I know, MLIR doesn’t currently test that sort of internal behavior.

I find this framing a bit weird: the "how many times the folder is invoked" is a canonicalizer question, however the behavior of the folder outside of canonicalization is interesting. A simple walk + fold behavior is something that is worth it to the user.

That’s why I’m hesitant to use -test-constant-fold for this. It happens to run the folder once today, but that’s not its design goal, and that behaviour could change at any time.

What is the design goal for this pass?
As far as I know this is only "Test operation constant folding", which we can easily refine to define whatever we want right now. Including option for walking in various orders.
I don't see the need for a new pass right now, this pass seems appropriate to me, just need to be extended / documenting the behavior we expect to test.

We can also ensure that the -max-iterations option from the canonicalization pass is respect for folding and use this for the test. This is how we test some convergence properties of the greedy driver already.

@dcaballe
Copy link
Contributor

We can also ensure that the -max-iterations option from the canonicalization pass is respect for folding and use this for the test.

That's a good point! That should give us the behavior we want to test and keep things integrated with the rest of the canonicalization tests.

@yangtetris
Copy link
Contributor Author

We can also ensure that the -max-iterations option from the canonicalization pass is respect for folding and use this for the test. This is how we test some convergence properties of the greedy driver already.

It seems this approach requires some modifications to the GreedyPatternRewriteDriver. Through local testing, I found that fold is invoked multiple times even when "-max-iterations" is set to 1. The current implementation contains code similar to:

while (!worklist.empty()) {
    auto *op = worklist.pop();
    if (succeeded(op->fold(foldResults))) {
        // in-place fold
        if (foldResults.empty())
            addToWorklist(op);
    }
}

So vector.extract/insert ops could be processed multiple times within a single iteration.

@dcaballe
Copy link
Contributor

If the expectation is that the folding of an operation happens in one shot and the result of a successful folding is the same operation, subsequent foldings on that operation should be no-op. Therefore, we shouldn't be adding that op back to the folding worklist. Could you try changing the GreedyPatternRewriteDriver code accordingly and see if testing passes? That should probably be in a different PR.

@yangtetris
Copy link
Contributor Author

yangtetris commented Jun 16, 2025

Could you try changing the GreedyPatternRewriteDriver code accordingly and see if testing passes? That should probably be in a different PR.

I tried removing addToWorklist(op);. It works for the new tests introduced in this PR. However, simply deleting it breaks an existing pass, SimplifyAffineStructures, which depends on MultiOpPatternRewriteDriver(a derived class of GreedyPatternRewriteDriver without the outermost loop).

Take the following MLIR example:

func.func @test_gaussian_elimination_empty_set0() {
  affine.for %arg0 = 1 to 10 {
    affine.for %arg1 = 1 to 100 {
      // CHECK-NOT: affine.if
      affine.if affine_set<(d0, d1) : (2 == 0)>(%arg0, %arg1) {
        func.call @external() : () -> ()
      }
    }
  }
  return
}

In this case, the affine.if op needs to be processed again after being folded.

I’m wondering if it makes sense to add a new field like bool recursiveProcessModifiedOp to GreedyRewriteConfig to handle this scenario?

@banach-space
Copy link
Contributor

@yangtetris, thank you for all the investigation!

Overall, the discussion in this PR has grown a bit beyond its original scope - figuring out how to ensure that we fold only once seems like a separate task, and probably deserves its own focused discussion.

In situations like this, I think it's completely valid to create a GitHub issue and link to it from the relevant part of the code. You might consider filing a ticket along the lines of:

  • "[mlir] How to force the canonicalizer to only fold once?"

For this PR, I’d suggest following Mehdi’s recommendation:

I don't see the need for a new pass right now, this pass seems appropriate to me, just need to be extended / documenting the behavior we expect to test.

This path would address both Mehdi’s original request (re: testing) and my later concern that TestConstantFold was being misused. I'd suggest renaming it to something like TestSingleFoldAndEraseDeadConstants. If that’s too much of a mouthful, maybe just TestSingleFold, and then clarify in the .cpp file that dead constants are also removed.

To summarise:

  • Rename and document TestConstantFold.
  • Move tests for this specific change to a dedicated file,. e.g. single-fold.mlir (please add comments to make sure the intent is clear - this will be the first such test file).
  • [nice-to-have] Create a GitHub issue to continue the discussion on finding better ways to "fold only once".

How does it sound? Thanks again for all the discussion!

@yangtetris
Copy link
Contributor Author

yangtetris commented Jun 17, 2025

@banach-space Thank you for the summary and your insights.

Overall, the discussion in this PR has grown a bit beyond its original scope - figuring out how to ensure that we fold only once seems like a separate task, and probably deserves its own focused discussion.

I agree that we should open a separate issue to specifically address the "fold once" concern.

For this PR, I’d suggest following Mehdi’s recommendation:

I don't see the need for a new pass right now, this pass seems appropriate to me, just need to be extended / documenting the behavior we expect to test.

That approach works well for me. Should I rename TestConstantFold within this PR or create a separate PR for that? I’m not entirely sure about the usual LLVM practice in this regard.

If there are no further suggestions from anyone in this discussion, I plan to commit the new changes within the next day or two.

@joker-eph
Copy link
Collaborator

You can do it all in this PR

@yangtetris
Copy link
Contributor Author

@banach-space @dcaballe @joker-eph
I‘ve renamed the TestConstantFold Pass and moved the tests to a dedicated file according to our discussion. Please help check if this change meets your expectations, thank you :)

Copy link

github-actions bot commented Jun 18, 2025

✅ With the latest revision this PR passed the C/C++ code formatter.

@joker-eph
Copy link
Collaborator

Can you run git clang-format on this?

@yangtetris
Copy link
Contributor Author

Can you run git clang-format on this?

Fixed, thanks.

Copy link
Contributor

@banach-space banach-space left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Many thanks for seeing this through and for bearing with us, this is a much appreciated contribution.

@dcaballe dcaballe merged commit fe3933d into llvm:main Jun 18, 2025
7 checks passed
Copy link

@yangtetris Congratulations on having your first Pull Request (PR) merged into the LLVM Project!

Your changes will be combined with recent changes from other authors, then tested by our build bots. If there is a problem with a build, you may receive a report in an email or a comment on this PR.

Please check whether problems have been caused by your change specifically, as the builds can include changes from many authors. It is not uncommon for your change to be included in a build that fails due to someone else's changes, or infrastructure issues.

How to do this, and the rest of the post-merge process, is covered in detail here.

If your change does cause a problem, it may be reverted, or you can revert it yourself. This is a normal part of LLVM development. You can fix your changes and open a new PR to merge them again.

If you don't get any reports, no action is required from you. Your changes are working as expected, well done!

@yangtetris yangtetris deleted the support_complete_fold_for_vector_insert_extract branch June 23, 2025 09:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants