Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[flang][OpenMP] Hoist reduction info from nested loop ops to parent teams ops #132003

Merged
merged 3 commits into from
Mar 21, 2025

Conversation

ergawy
Copy link
Member

@ergawy ergawy commented Mar 19, 2025

Fixes a bug in reductions when converting teams loop constructs with reduction clauses.

According to the spec (v5.2, p340, 36):

The effect of the reduction clause is as if it is applied to all leaf
constructs that permit the clause, except for the following constructs:
* ....
* The teams construct, when combined with the loop construct.

Therefore, for a combined directive similar to: !$omp teams loop reduction(...), the earlier stages of the compiler assign the reduction clauses only to the loop leaf and not to the teams leaf.

On the other hand, if we have a combined construct similar to: !$omp teams distribute parallel do, the reduction clauses are assigned both to the teams and the do leaves. We need to match this behavior when we convert teams op with a nested loop op since the target set of constructs/ops will be incorrect without moving the reductions up to the teams op as well.

This PR introduces a pattern that does exactly this. Given the following input:

omp.teams {
  omp.loop reduction(@red_sym %red_op -> %red_arg : !fir.ref<i32>) {
    omp.loop_nest ... {
      ...
    }
  }
}

this pattern updates the omp.teams op in-place to:

omp.teams reduction(@red_sym %red_op -> %teams_red_arg : !fir.ref<i32>) {
  omp.loop reduction(@red_sym %teams_red_arg -> %red_arg : !fir.ref<i32>) {
    omp.loop_nest ... {
      ...
    }
  }
}

Note the following:

  • The nested omp.loop is not rewritten by this pattern, this happens through GenericLoopConversionPattern.
  • The reduction info are cloned from the nested omp.loop op to the parent omp.teams op.
  • The reduction operand of the omp.loop op is updated to be the new reduction block argument of the omp.teams op.

@llvmbot llvmbot added flang Flang issues not falling into any other category flang:fir-hlfir flang:openmp labels Mar 19, 2025
@llvmbot
Copy link
Member

llvmbot commented Mar 19, 2025

@llvm/pr-subscribers-flang-openmp

@llvm/pr-subscribers-flang-fir-hlfir

Author: Kareem Ergawy (ergawy)

Changes

Fixes a bug in reductions when converting teams loop constructs with reduction clauses.

According to the spec (v5.2, p340, 36):

The effect of the reduction clause is as if it is applied to all leaf
constructs that permit the clause, except for the following constructs:
* ....
* The teams construct, when combined with the loop construct.

Therefore, for a combined directive similar to: !$omp teams loop reduction(...), the earlier stages of the compiler assign the reduction clauses only to the loop leaf and not to the teams leaf.

On the other hand, if we have a combined construct similar to: !$omp teams distribute parallel do, the reduction clauses are assigned both to the teams and the do leaves. We need to match this behavior when we convert teams op with a nested loop op since the target set of constructs/ops will be incorrect without moving the reductions up to the teams op as well.

This PR introduces a pattern that does exactly this. Given the following input:

omp.teams {
  omp.loop reduction(@<!-- -->red_sym %red_op -&gt; %red_arg : !fir.ref&lt;i32&gt;) {
    omp.loop_nest ... {
      ...
    }
  }
}

this pattern updates the omp.teams op in-place to:

omp.teams reduction(@<!-- -->red_sym %red_op -&gt; %teams_red_arg : !fir.ref&lt;i32&gt;) {
  omp.loop reduction(@<!-- -->red_sym %teams_red_arg -&gt; %red_arg : !fir.ref&lt;i32&gt;) {
    omp.loop_nest ... {
      ...
    }
  }
}

Note the following:

  • The nested omp.loop is not rewritten by this pattern, this happens through GenericLoopConversionPattern.
  • The reduction info are cloned from the nested omp.loop op to the parent omp.teams op.
  • The reduction operand of the omp.loop op is updated to be the new reduction block argument of the omp.teams op.

Full diff: https://github.com/llvm/llvm-project/pull/132003.diff

2 Files Affected:

  • (modified) flang/lib/Optimizer/OpenMP/GenericLoopConversion.cpp (+122-1)
  • (modified) flang/test/Lower/OpenMP/loop-directive.f90 (+2-2)
diff --git a/flang/lib/Optimizer/OpenMP/GenericLoopConversion.cpp b/flang/lib/Optimizer/OpenMP/GenericLoopConversion.cpp
index b0014a3aced6b..38ae90519c5cd 100644
--- a/flang/lib/Optimizer/OpenMP/GenericLoopConversion.cpp
+++ b/flang/lib/Optimizer/OpenMP/GenericLoopConversion.cpp
@@ -343,6 +343,115 @@ class GenericLoopConversionPattern
   }
 };
 
+/// According to the spec (v5.2, p340, 36):
+///
+/// ```
+/// The effect of the reduction clause is as if it is applied to all leaf
+/// constructs that permit the clause, except for the following constructs:
+/// * ....
+/// * The teams construct, when combined with the loop construct.
+/// ```
+///
+/// Therefore, for a combined directive similar to: `!$omp teams loop
+/// reduction(...)`, the earlier stages of the compiler assign the `reduction`
+/// clauses only to the `loop` leaf and not to the `teams` leaf.
+///
+/// On the other hand, if we have a combined construct similar to: `!$omp teams
+/// distribute parallel do`, the `reduction` clauses are assigned both to the
+/// `teams` and the `do` leaves. We need to match this behavior when we convert
+/// `teams` op with a nested `loop` op since the target set of constructs/ops
+/// will be incorrect without moving the reductions up to the `teams` op as
+/// well.
+///
+/// This pattern does exactly this. Given the following input:
+/// ```
+/// omp.teams {
+///   omp.loop reduction(@red_sym %red_op -> %red_arg : !fir.ref<i32>) {
+///     omp.loop_nest ... {
+///       ...
+///     }
+///   }
+/// }
+/// ```
+/// this pattern updates the `omp.teams` op in-place to:
+/// ```
+/// omp.teams reduction(@red_sym %red_op -> %teams_red_arg : !fir.ref<i32>) {
+///   omp.loop reduction(@red_sym %teams_red_arg -> %red_arg : !fir.ref<i32>) {
+///     omp.loop_nest ... {
+///       ...
+///     }
+///   }
+/// }
+/// ```
+///
+/// Note the following:
+/// * The nested `omp.loop` is not rewritten by this pattern, this happens
+///   through `GenericLoopConversionPattern`.
+/// * The reduction info are cloned from the nested `omp.loop` op to the parent
+///   `omp.teams` op.
+/// * The reduction operand of the `omp.loop` op is updated to be the **new**
+///   reduction block argument of the `omp.teams` op.
+class ReductionsMoverConverPattern
+    : public mlir::OpConversionPattern<mlir::omp::TeamsOp> {
+public:
+  using mlir::OpConversionPattern<mlir::omp::TeamsOp>::OpConversionPattern;
+
+  static mlir::omp::LoopOp
+  tryToFindNestedLoopWithReuctions(mlir::omp::TeamsOp teamsOp) {
+    assert(!teamsOp.getRegion().empty() &&
+           teamsOp.getRegion().getBlocks().size() == 1);
+
+    mlir::Block &teamsBlock = *teamsOp.getRegion().begin();
+    auto loopOpIter = llvm::find_if(teamsBlock, [](mlir::Operation &op) {
+      auto nestedLoopOp = llvm::dyn_cast<mlir::omp::LoopOp>(&op);
+
+      if (!nestedLoopOp)
+        return false;
+
+      return !nestedLoopOp.getReductionVars().empty();
+    });
+
+    if (loopOpIter == teamsBlock.end())
+      return nullptr;
+
+    // TODO return error if more than one loop op is nested. We need to
+    // coalesce reductions in this case.
+    return llvm::cast<mlir::omp::LoopOp>(loopOpIter);
+  }
+
+  mlir::LogicalResult
+  matchAndRewrite(mlir::omp::TeamsOp teamsOp, OpAdaptor adaptor,
+                  mlir::ConversionPatternRewriter &rewriter) const override {
+    mlir::omp::LoopOp nestedLoopOp = tryToFindNestedLoopWithReuctions(teamsOp);
+
+    rewriter.modifyOpInPlace(teamsOp, [&]() {
+      teamsOp.setReductionMod(nestedLoopOp.getReductionMod());
+      teamsOp.getReductionVarsMutable().assign(nestedLoopOp.getReductionVars());
+      teamsOp.setReductionByref(nestedLoopOp.getReductionByref());
+      teamsOp.setReductionSymsAttr(nestedLoopOp.getReductionSymsAttr());
+
+      auto blockArgIface =
+          llvm::cast<mlir::omp::BlockArgOpenMPOpInterface>(*teamsOp);
+      unsigned reductionArgsStart = blockArgIface.getPrivateBlockArgsStart() +
+                                    blockArgIface.numPrivateBlockArgs();
+      llvm::SmallVector<mlir::Value> newLoopOpReductionOperands;
+
+      for (auto [idx, reductionVar] :
+           llvm::enumerate(nestedLoopOp.getReductionVars())) {
+        mlir::BlockArgument newTeamsOpReductionBlockArg =
+            teamsOp.getRegion().insertArgument(reductionArgsStart + idx,
+                                               reductionVar.getType(),
+                                               reductionVar.getLoc());
+        newLoopOpReductionOperands.push_back(newTeamsOpReductionBlockArg);
+      }
+
+      nestedLoopOp.getReductionVarsMutable().assign(newLoopOpReductionOperands);
+    });
+
+    return mlir::success();
+  }
+};
+
 class GenericLoopConversionPass
     : public flangomp::impl::GenericLoopConversionPassBase<
           GenericLoopConversionPass> {
@@ -357,11 +466,23 @@ class GenericLoopConversionPass
 
     mlir::MLIRContext *context = &getContext();
     mlir::RewritePatternSet patterns(context);
-    patterns.insert<GenericLoopConversionPattern>(context);
+    patterns.insert<ReductionsMoverConverPattern, GenericLoopConversionPattern>(
+        context);
     mlir::ConversionTarget target(*context);
 
     target.markUnknownOpDynamicallyLegal(
         [](mlir::Operation *) { return true; });
+
+    target.addDynamicallyLegalOp<mlir::omp::TeamsOp>(
+        [](mlir::omp::TeamsOp teamsOp) {
+          // If teamsOp's reductions are already populated, then the op is
+          // legal. Additionally, the op is legal if it does not nest a LoopOp
+          // with reductions.
+          return !teamsOp.getReductionVars().empty() ||
+                 ReductionsMoverConverPattern::tryToFindNestedLoopWithReuctions(
+                     teamsOp) == nullptr;
+        });
+
     target.addDynamicallyLegalOp<mlir::omp::LoopOp>(
         [](mlir::omp::LoopOp loopOp) {
           return mlir::failed(
diff --git a/flang/test/Lower/OpenMP/loop-directive.f90 b/flang/test/Lower/OpenMP/loop-directive.f90
index 123dfffa350d7..61ac59d577c63 100644
--- a/flang/test/Lower/OpenMP/loop-directive.f90
+++ b/flang/test/Lower/OpenMP/loop-directive.f90
@@ -318,12 +318,12 @@ subroutine loop_parallel_bind_reduction
 subroutine loop_teams_loop_reduction
   implicit none
   integer :: x, i
-  ! CHECK: omp.teams {
+  ! CHECK: omp.teams reduction(@add_reduction_i32 %{{.*}}#0 -> %[[TEAMS_RED_ARG:.*]] : !fir.ref<i32>) {
   ! CHECK:   omp.parallel
   ! CHECK-SAME: private(@{{[^[:space:]]+}} %{{[^[:space:]]+}}#0 -> %[[PRIV_ARG:[^[:space:]]+]] : !fir.ref<i32>) {
   ! CHECK:      omp.distribute {
   ! CHECK:        omp.wsloop
-  ! CHECK-SAME:     reduction(@add_reduction_i32 %{{.*}}#0 -> %[[RED_ARG:.*]] : !fir.ref<i32>) {
+  ! CHECK-SAME:     reduction(@add_reduction_i32 %[[TEAMS_RED_ARG]] -> %[[RED_ARG:.*]] : !fir.ref<i32>) {
   ! CHECK-NEXT:     omp.loop_nest {{.*}} {
   ! CHECK-NEXT:       hlfir.declare %[[PRIV_ARG]] {uniq_name = "_QF{{.*}}Ei"}
   ! CHECK-NEXT:       hlfir.declare %[[RED_ARG]] {uniq_name = "_QF{{.*}}Ex"}

@@ -343,6 +343,115 @@ class GenericLoopConversionPattern
}
};

/// According to the spec (v5.2, p340, 36):
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PR needs more testing, working on that. I opened at this point for early feedback in case there are major issues with the approach taken.

Copy link
Contributor

@jsjodin jsjodin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The approach seems okay to me, but I need to look at the spec more. The following paragraph is a bit tricky imo.

@ergawy ergawy force-pushed the users/ergawy/fix_teams_loop_reduction branch from 21b30b7 to 37f2f8e Compare March 20, 2025 04:13
@mjklemm
Copy link
Contributor

mjklemm commented Mar 20, 2025

The semantics for teams loop reduction(op:var) clearly needs to be that the reduction clause not only affects the loop construct, but that it performs a reduction operation across the parallel regions as well as the league of teams. So, hoisting the reduction operation from omp.loop to omp.teams makes perfect sense. I wonder, though, if this should be done when the MLIR code is first generated and not later during an MLIR pass.

@ergawy
Copy link
Member Author

ergawy commented Mar 20, 2025

I wonder, though, if this should be done when the MLIR code is first generated and not later during an MLIR pass.

If we do that earlier, I think this will be confusing since it will be a direct violation of what the standard stipulates in the part quoted in the commit message. On the other hand, doing this in the pass keeps the weirdness (special handling) of loop directives self-contained in one place, I think it will be easier to connect the dots this way.

ergawy added 2 commits March 20, 2025 03:53
… `teams` ops

Fixes a bug in reductions when converting `teams loop` constructs with
`reduction` clauses.

According to the spec (v5.2, p340, 36):

```
The effect of the reduction clause is as if it is applied to all leaf
constructs that permit the clause, except for the following constructs:
* ....
* The teams construct, when combined with the loop construct.
```

Therefore, for a combined directive similar to: `!$omp teams loop
reduction(...)`, the earlier stages of the compiler assign the `reduction`
clauses only to the `loop` leaf and not to the `teams` leaf.

On the other hand, if we have a combined construct similar to: `!$omp teams
distribute parallel do`, the `reduction` clauses are assigned both to the
`teams` and the `do` leaves. We need to match this behavior when we convert
`teams` op with a nested `loop` op since the target set of constructs/ops
will be incorrect without moving the reductions up to the `teams` op as
well.

This PR introduces a pattern that does exactly this. Given the following input:
```
omp.teams {
  omp.loop reduction(@red_sym %red_op -> %red_arg : !fir.ref<i32>) {
    omp.loop_nest ... {
      ...
    }
  }
}
```
this pattern updates the `omp.teams` op in-place to:
```
omp.teams reduction(@red_sym %red_op -> %teams_red_arg : !fir.ref<i32>) {
  omp.loop reduction(@red_sym %teams_red_arg -> %red_arg : !fir.ref<i32>) {
    omp.loop_nest ... {
      ...
    }
  }
}
```

Note the following:
* The nested `omp.loop` is not rewritten by this pattern, this happens
  through `GenericLoopConversionPattern`.
* The reduction info are cloned from the nested `omp.loop` op to the parent
  `omp.teams` op.
* The reduction operand of the `omp.loop` op is updated to be the **new**
  reduction block argument of the `omp.teams` op.
@ergawy
Copy link
Member Author

ergawy commented Mar 20, 2025

I am wondering, howerver, what we should do in these 2 scearios:

subroutine scenario_1
  implicit none
  integer :: x, i

  !$omp loop reduction(+: x) bind(teams)
  DO i = 1, 5
    call foo()
  END DO
end subroutine

The above scenario is problematic because, AFAICT, this is allowed by the spec (it is allowed by flang atm) but there is no construct to hoist the reductions to.

subroutine scenario_2
  implicit none
  integer :: x, i

  !$omp teams loop reduction(+: x)
  DO i = 1, 5
    call foo()
  END DO
end subroutine

This one is easier, I think we just hoist the reductions to the teams construct and skip them when converting loop to distribute later on (note that loop cannot be mapped to parallel for here since its loop nest contains a call to a foreign function). So we need to extend this PR to handle reduction skipping part.

@mjklemm @jsjodin please take a look and let me know what you think.

@ergawy ergawy changed the title [flang][OpenMP] Clone reduction info from nested loop ops to parent teams ops [flang][OpenMP] Hoist reduction info from nested loop ops to parent teams ops Mar 20, 2025
@ergawy ergawy force-pushed the users/ergawy/fix_teams_loop_reduction branch from 37f2f8e to 05a12c0 Compare March 20, 2025 09:22
@ergawy
Copy link
Member Author

ergawy commented Mar 21, 2025

I think we can look at the above 2 scenarios in later PRs if that is ok with you.

@mjklemm
Copy link
Contributor

mjklemm commented Mar 21, 2025

I am wondering, howerver, what we should do in these 2 scearios:

subroutine scenario_1
  implicit none
  integer :: x, i

  !$omp loop reduction(+: x) bind(teams)
  DO i = 1, 5
    call foo()
  END DO
end subroutine

I do not see how we could possibly attach the reduction to the teams directive. It may be somewhere. And I would not read the specification that way. Because the section on combined constructs mandates the extension to both leaves, teams and loop. But we do not have a combined construct here, so no further action needed.

A similar case would be do reduction(+:sum) on an orphaned parallel region. Simply does not work.

Copy link
Contributor

@jsjodin jsjodin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me

@dpalermo dpalermo merged commit c031579 into main Mar 21, 2025
11 checks passed
@dpalermo dpalermo deleted the users/ergawy/fix_teams_loop_reduction branch March 21, 2025 19:12
@llvm-ci
Copy link
Collaborator

llvm-ci commented Mar 21, 2025

LLVM Buildbot has detected a new failure on builder premerge-monolithic-windows running on premerge-windows-1 while building flang at step 8 "test-build-unified-tree-check-all".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/35/builds/8415

Here is the relevant piece of the build log for the reference
Step 8 (test-build-unified-tree-check-all) failure: test (failure)
...
cl : Command line warning D9025 : overriding '/D_GLIBCXX_ASSERTIONS' with '/U_GLIBCXX_ASSERTIONS'
[52/150] Linking CXX static library lib\flang_rt.runtime.static_dbg.lib
[53/150] Linking CXX executable tools\clang\unittests\Analysis\ClangAnalysisTests.exe
[54/150] Linking CXX executable tools\clang\unittests\Analysis\FlowSensitive\ClangAnalysisFlowSensitiveTests.exe
[55/150] Linking CXX executable tools\clang\unittests\ASTMatchers\Dynamic\DynamicASTMatchersTests.exe
[56/150] Linking CXX executable tools\clang\unittests\Rewrite\RewriteTests.exe
[57/150] Linking CXX executable tools\clang\unittests\AST\ByteCode\InterpTests.exe
[58/150] Linking CXX executable bin\mlir-capi-translation-test.exe
[59/150] Linking CXX static library lib\flang_rt.runtime.dynamic_dbg.lib
[60/150] Preparing lit tests
FAILED: utils/lit/CMakeFiles/prepare-check-lit 
cmd.exe /C "cd /D C:\ws\buildbot\premerge-monolithic-windows\build\utils\lit && C:\BuildTools\Common7\IDE\CommonExtensions\Microsoft\CMake\CMake\bin\cmake.exe -E remove_directory C:/ws/buildbot/premerge-monolithic-windows/build/utils/lit/tests && C:\BuildTools\Common7\IDE\CommonExtensions\Microsoft\CMake\CMake\bin\cmake.exe -E copy_directory C:/ws/buildbot/premerge-monolithic-windows/llvm-project/llvm/utils/lit/tests C:/ws/buildbot/premerge-monolithic-windows/build/utils/lit/tests && C:\BuildTools\Common7\IDE\CommonExtensions\Microsoft\CMake\CMake\bin\cmake.exe -E copy C:/ws/buildbot/premerge-monolithic-windows/build/utils/lit/lit.site.cfg C:/ws/buildbot/premerge-monolithic-windows/build/utils/lit/tests"
Error removing directory "C:/ws/buildbot/premerge-monolithic-windows/build/utils/lit/tests".
[61/150] Linking CXX static library lib\flang_rt.runtime.lib
[62/150] Linking CXX executable tools\polly\unittests\Isl\IslTests.exe
[63/150] Linking CXX executable tools\polly\unittests\Flatten\FlattenTests.exe
[64/150] Linking CXX executable tools\polly\unittests\Support\ISLToolsTests.exe
[65/150] Linking CXX executable tools\clang\unittests\CrossTU\CrossTUTests.exe
[66/150] Linking CXX executable tools\clang\unittests\ASTMatchers\ASTMatchersTests.exe
[67/150] Linking CXX executable tools\flang\unittests\Optimizer\FlangOptimizerTests.exe
[68/150] Linking CXX executable tools\clang\unittests\Tooling\Syntax\SyntaxTests.exe
[69/150] Linking CXX executable tools\clang\unittests\Serialization\SerializationTests.exe
[70/150] Linking CXX executable tools\clang\unittests\Index\IndexTests.exe
[71/150] Linking CXX executable tools\clang\unittests\StaticAnalyzer\StaticAnalysisTests.exe
[72/150] Linking CXX executable tools\clang\unittests\Support\ClangSupportTests.exe
[73/150] Linking CXX executable tools\clang\unittests\Sema\SemaTests.exe
[74/150] Linking CXX executable tools\clang\unittests\AST\ASTTests.exe
[75/150] Linking CXX executable tools\clang\unittests\CodeGen\ClangCodeGenTests.exe
[76/150] Linking CXX executable tools\clang\tools\extra\clangd\unittests\ClangdTests.exe
[77/150] Linking CXX executable tools\polly\unittests\DeLICM\DeLICMTests.exe
[78/150] Linking CXX executable tools\mlir\unittests\Target\LLVM\MLIRTargetLLVMTests.exe
[79/150] Linking CXX executable tools\polly\unittests\ScheduleOptimizer\ScheduleOptimizerTests.exe
[80/150] Linking CXX executable tools\polly\unittests\ScopPassManager\ScopPassManagerTests.exe
[81/150] Linking CXX executable unittests\Analysis\AnalysisTests.exe
[82/150] Linking CXX executable bin\mlir-capi-pass-test.exe
[83/150] Linking CXX executable tools\clang\unittests\Frontend\FrontendTests.exe
[84/150] Linking CXX executable bin\mlir-capi-ir-test.exe
[85/150] Linking CXX executable bin\mlir-capi-execution-engine-test.exe
[86/150] Linking CXX executable tools\clang\unittests\Driver\ClangDriverTests.exe
[87/150] Linking CXX executable tools\lld\unittests\AsLibELF\LLDAsLibELFTests.exe
[88/150] Linking CXX executable tools\lld\unittests\AsLibAll\LLDAsLibAllTests.exe
[89/150] Linking CXX executable tools\clang\unittests\Tooling\ToolingTests.exe
[90/150] Linking CXX executable tools\clang\unittests\Interpreter\ClangReplInterpreterTests.exe
[91/150] Linking CXX executable tools\flang\unittests\Frontend\FlangFrontendTests.exe
ninja: build stopped: subcommand failed.

searlmc1 pushed a commit to ROCm/llvm-project that referenced this pull request Mar 22, 2025
… `teams` ops (llvm#132003)

Fixes a bug in reductions when converting `teams loop` constructs with
`reduction` clauses.

According to the spec (v5.2, p340, 36):

```
The effect of the reduction clause is as if it is applied to all leaf
constructs that permit the clause, except for the following constructs:
* ....
* The teams construct, when combined with the loop construct.
```

Therefore, for a combined directive similar to: `!$omp teams loop
reduction(...)`, the earlier stages of the compiler assign the
`reduction` clauses only to the `loop` leaf and not to the `teams` leaf.

On the other hand, if we have a combined construct similar to: `!$omp
teams distribute parallel do`, the `reduction` clauses are assigned both
to the `teams` and the `do` leaves. We need to match this behavior when
we convert `teams` op with a nested `loop` op since the target set of
constructs/ops will be incorrect without moving the reductions up to the
`teams` op as well.

This PR introduces a pattern that does exactly this. Given the following
input:
```
omp.teams {
  omp.loop reduction(@red_sym %red_op -> %red_arg : !fir.ref<i32>) {
    omp.loop_nest ... {
      ...
    }
  }
}
```
this pattern updates the `omp.teams` op in-place to:
```
omp.teams reduction(@red_sym %red_op -> %teams_red_arg : !fir.ref<i32>) {
  omp.loop reduction(@red_sym %teams_red_arg -> %red_arg : !fir.ref<i32>) {
    omp.loop_nest ... {
      ...
    }
  }
}
```

Note the following:
* The nested `omp.loop` is not rewritten by this pattern, this happens
through `GenericLoopConversionPattern`.
* The reduction info are cloned from the nested `omp.loop` op to the
parent `omp.teams` op.
* The reduction operand of the `omp.loop` op is updated to be the
**new** reduction block argument of the `omp.teams` op.
searlmc1 pushed a commit to ROCm/llvm-project that referenced this pull request Mar 23, 2025
searlmc1 pushed a commit to ROCm/llvm-project that referenced this pull request Mar 24, 2025
… `teams` ops (llvm#132003)

Fixes a bug in reductions when converting `teams loop` constructs with
`reduction` clauses.

According to the spec (v5.2, p340, 36):

```
The effect of the reduction clause is as if it is applied to all leaf
constructs that permit the clause, except for the following constructs:
* ....
* The teams construct, when combined with the loop construct.
```

Therefore, for a combined directive similar to: `!$omp teams loop
reduction(...)`, the earlier stages of the compiler assign the
`reduction` clauses only to the `loop` leaf and not to the `teams` leaf.

On the other hand, if we have a combined construct similar to: `!$omp
teams distribute parallel do`, the `reduction` clauses are assigned both
to the `teams` and the `do` leaves. We need to match this behavior when
we convert `teams` op with a nested `loop` op since the target set of
constructs/ops will be incorrect without moving the reductions up to the
`teams` op as well.

This PR introduces a pattern that does exactly this. Given the following
input:
```
omp.teams {
  omp.loop reduction(@red_sym %red_op -> %red_arg : !fir.ref<i32>) {
    omp.loop_nest ... {
      ...
    }
  }
}
```
this pattern updates the `omp.teams` op in-place to:
```
omp.teams reduction(@red_sym %red_op -> %teams_red_arg : !fir.ref<i32>) {
  omp.loop reduction(@red_sym %teams_red_arg -> %red_arg : !fir.ref<i32>) {
    omp.loop_nest ... {
      ...
    }
  }
}
```

Note the following:
* The nested `omp.loop` is not rewritten by this pattern, this happens
through `GenericLoopConversionPattern`.
* The reduction info are cloned from the nested `omp.loop` op to the
parent `omp.teams` op.
* The reduction operand of the `omp.loop` op is updated to be the
**new** reduction block argument of the `omp.teams` op.
ergawy added a commit that referenced this pull request Mar 25, 2025
…ed to `distribute`

Follow-up to #132003, in particular, see
#132003 (comment).

This PR extends reduction support for `loop` directives. Consider the
following scenario:
```fortran
subroutine bar
  implicit none
  integer :: x, i

  !$omp teams loop reduction(+: x)
  DO i = 1, 5
    call foo()
  END DO
end subroutine
```
Note the following:
* According to the spec, the `reduction` clause will be attached to
  `loop` during earlier stages in the compiler.
* Additionally, `loop` cannot be mapped to `distribute parallel for`
  due to the call to a foreign function inside the loop's body.
* Therefore, `loop` must be mapped to `distribute`.
* However, `distribute` does not have `reduction` clauses.
* As a result, we have to move the `reduction`s from the `loop` to its
  parent `teams` directive, which is what is done by this PR.
@ergawy
Copy link
Member Author

ergawy commented Mar 25, 2025

I do not see how we could possibly attach the reduction to the teams directive.

Not saying we should. I am just saying this scenario is allowed by the spec (or at least by flang atm) and therefore we should handle it somehow. Maybe one option would be to just drop the reductions in this case and serialize the loop! But I think we should find a solution for this case.

The 2nd scenario is handled in #132920.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
flang:fir-hlfir flang:openmp flang Flang issues not falling into any other category
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants