[mlir][scf]Fix scf.forall inlining: add shared outputs #132197

Prakhar-Dixit · 2025-03-20T12:23:22Z

This patch fixes a crash in the scf-forall-to-for conversion pass by ensuring that the replacement vector used during inlining contains both the induction variables and the shared outputs. Previously, only the induction variables were passed, causing a mismatch with the expected number of block arguments in the forall op’s body. The fix concatenates the shared outputs (retrieved via getOutputs()) with the induction variables and then replaces the forall op with its shared outputs, preserving the intended semantics without introducing regressions.

Minimal Example IR:

module {
  func.func @parallel_insert_slice(%arg0: tensor<100xf32>) -> tensor<100xf32> {
    %c100 = arith.constant 100 : index
    %res = scf.forall (%i) in (%c100) shared_outs(%s = %arg0) -> (tensor<100xf32>) {
      %t = "test.foo"() : () -> tensor<100xf32>
      scf.forall.in_parallel {
        tensor.parallel_insert_slice %t into %s[%i] [100] [1] : tensor<100xf32> into tensor<100xf32>
      }
    }
    return %res : tensor<100xf32>
  }
}

llvmbot · 2025-03-20T12:23:59Z

@llvm/pr-subscribers-mlir-scf

@llvm/pr-subscribers-mlir

Author: Prakhar Dixit (Prakhar-Dixit)

Changes

Fixes #108164

This patch fixes a crash in the scf-forall-to-for conversion pass by ensuring that the replacement vector used during inlining contains both the induction variables and the shared outputs. Previously, only the induction variables were passed, causing a mismatch with the expected number of block arguments in the forall op’s body. The fix concatenates the shared outputs (retrieved via getOutputs()) with the induction variables and then replaces the forall op with its shared outputs, preserving the intended semantics without introducing regressions.

Minimal Example IR:
module {
  func.func @<!-- -->parallel_insert_slice(%arg0: tensor&lt;100xf32&gt;) -&gt; tensor&lt;100xf32&gt; {
    %c100 = arith.constant 100 : index
    %res = scf.forall (%i) in (%c100) shared_outs(%s = %arg0) -&gt; (tensor&lt;100xf32&gt;) {
      %t = "test.foo"() : () -&gt; tensor&lt;100xf32&gt;
      scf.forall.in_parallel {
        tensor.parallel_insert_slice %t into %s[%i] [100] [1] : tensor&lt;100xf32&gt; into tensor&lt;100xf32&gt;
      }
    }
    return %res : tensor&lt;100xf32&gt;
  }
}

Full diff: https://github.com/llvm/llvm-project/pull/132197.diff

2 Files Affected:

(modified) mlir/lib/Dialect/SCF/Transforms/ForallToFor.cpp (+5-2)
(modified) mlir/test/Dialect/SCF/forall-to-for.mlir (+23)

diff --git a/mlir/lib/Dialect/SCF/Transforms/ForallToFor.cpp b/mlir/lib/Dialect/SCF/Transforms/ForallToFor.cpp
index a2f03f1e1056e..a1df366cef132 100644
--- a/mlir/lib/Dialect/SCF/Transforms/ForallToFor.cpp
+++ b/mlir/lib/Dialect/SCF/Transforms/ForallToFor.cpp
@@ -40,12 +40,15 @@ mlir::scf::forallToForLoop(RewriterBase &rewriter, scf::ForallOp forallOp,
   SmallVector<Value> ivs = llvm::map_to_vector(
       loopNest.loops, [](scf::ForOp loop) { return loop.getInductionVar(); });
 
+  SmallVector<Value> replacementVals = ivs;
+  for (Value shared : forallOp.getOutputs())
+    replacementVals.push_back(shared);
   Block *innermostBlock = loopNest.loops.back().getBody();
   rewriter.eraseOp(forallOp.getBody()->getTerminator());
   rewriter.inlineBlockBefore(forallOp.getBody(), innermostBlock,
                              innermostBlock->getTerminator()->getIterator(),
-                             ivs);
-  rewriter.eraseOp(forallOp);
+                             replacementVals);
+  rewriter.replaceOp(forallOp, forallOp.getOutputs());
 
   if (results) {
     llvm::move(loopNest.loops, std::back_inserter(*results));
diff --git a/mlir/test/Dialect/SCF/forall-to-for.mlir b/mlir/test/Dialect/SCF/forall-to-for.mlir
index e7d183fb9d2b5..17598a154fefd 100644
--- a/mlir/test/Dialect/SCF/forall-to-for.mlir
+++ b/mlir/test/Dialect/SCF/forall-to-for.mlir
@@ -55,3 +55,26 @@ func.func @nested(%ub1: index, %ub2: index, %ub3: index, %ub4: index) {
   }
   return
 }
+
+// -----
+
+  func.func @parallel_insert_slice(%arg0: tensor<100xf32>) -> tensor<100xf32> {
+    %c100 = arith.constant 100 : index
+    %res = scf.forall (%i) in (%c100) shared_outs(%s = %arg0) -> (tensor<100xf32>) {
+      %t = "test.foo"() : () -> tensor<100xf32>
+      scf.forall.in_parallel {
+        tensor.parallel_insert_slice %t into %s[%i] [100] [1] : tensor<100xf32> into tensor<100xf32>
+      }
+    }
+    return %res : tensor<100xf32>
+  }
+// CHECK-LABEL:   func.func @parallel_insert_slice(
+// CHECK-SAME:      %[[VAL_0:[0-9]+|[a-zA-Z$._-][a-zA-Z0-9$._-]*]]: tensor<100xf32>) -> tensor<100xf32> {
+// CHECK:           %[[VAL_1:.*]] = arith.constant 100 : index
+// CHECK:           %[[VAL_2:.*]] = arith.constant 0 : index
+// CHECK:           %[[VAL_3:.*]] = arith.constant 1 : index
+// CHECK:           scf.for %[[VAL_4:.*]] = %[[VAL_2]] to %[[VAL_1]] step %[[VAL_3]] {
+// CHECK:             %[[VAL_5:.*]] = "test.foo"() : () -> tensor<100xf32>
+// CHECK:           }
+// CHECK:           return %[[VAL_0]] : tensor<100xf32>
+// CHECK:         }
\ No newline at end of file

Prakhar-Dixit · 2025-03-21T06:17:33Z

Could you please review this ?
I am unable to add reviewers. @CoTinker

matthias-springer · 2025-03-21T07:56:30Z

mlir/test/Dialect/SCF/forall-to-for.mlir

+// CHECK:           %[[VAL_2:.*]] = arith.constant 0 : index
+// CHECK:           %[[VAL_3:.*]] = arith.constant 1 : index
+// CHECK:           scf.for %[[VAL_4:.*]] = %[[VAL_2]] to %[[VAL_1]] step %[[VAL_3]] {
+// CHECK:             %[[VAL_5:.*]] = "test.foo"() : () -> tensor<100xf32>


Where did the parallel_insert_slice go? I think this pass is incorrect. It should have replaced the parallel_insert_slice with insert_slice.

matthias-springer · 2025-03-21T07:58:28Z

mlir/test/Dialect/SCF/forall-to-for.mlir

+// CHECK:           scf.for %[[VAL_4:.*]] = %[[VAL_2]] to %[[VAL_1]] step %[[VAL_3]] {
+// CHECK:             %[[VAL_5:.*]] = "test.foo"() : () -> tensor<100xf32>
+// CHECK:           }
+// CHECK:           return %[[VAL_0]] : tensor<100xf32>


The result of the scf.for should have been used here. It looks like the generated loop nest does not even have a result/iter_args. The issue that you are fixing here was probably an undocumented limitation of this pass, not necessarily a bug: shared_outs are generally not supported, which made the implementation a bit easier.

But it would be nice to support shared_outs.

Basically, instead of dropping the terminator of the scf.forall loop, you have to replace it with tensor.insert_slice and yield the result. Also, the loop nest that this pass is generating must have an iter_arg (and result); one per shared_out.

Fix scf.forall inlining: add shared outputs

c1ab370

llvmbot added mlir mlir:scf labels Mar 20, 2025

Prakhar-Dixit changed the title ~~Fix scf.forall inlining: add shared outputs~~ [mlir][scf]Fix scf.forall inlining: add shared outputs Mar 20, 2025

CoTinker requested review from MaheshRavishankar, ftynse, srcarroll and matthias-springer March 21, 2025 07:17

matthias-springer requested changes Mar 21, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[mlir][scf]Fix scf.forall inlining: add shared outputs #132197

[mlir][scf]Fix scf.forall inlining: add shared outputs #132197

Uh oh!

Prakhar-Dixit commented Mar 20, 2025 •

edited

Loading

Uh oh!

llvmbot commented Mar 20, 2025 •

edited

Loading

Uh oh!

Prakhar-Dixit commented Mar 21, 2025

Uh oh!

matthias-springer Mar 21, 2025

Uh oh!

matthias-springer Mar 21, 2025 •

edited

Loading

Uh oh!

matthias-springer Mar 21, 2025

Uh oh!

Uh oh!

[mlir][scf]Fix scf.forall inlining: add shared outputs #132197

Are you sure you want to change the base?

[mlir][scf]Fix scf.forall inlining: add shared outputs #132197

Uh oh!

Conversation

Prakhar-Dixit commented Mar 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

llvmbot commented Mar 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Prakhar-Dixit commented Mar 21, 2025

Uh oh!

matthias-springer Mar 21, 2025

Choose a reason for hiding this comment

Uh oh!

matthias-springer Mar 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

matthias-springer Mar 21, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Prakhar-Dixit commented Mar 20, 2025 •

edited

Loading

llvmbot commented Mar 20, 2025 •

edited

Loading

matthias-springer Mar 21, 2025 •

edited

Loading