-
Notifications
You must be signed in to change notification settings - Fork 13.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[flang][OpenMP] Hoist reduction info from nested loop
ops to parent teams
ops
#132003
Conversation
@llvm/pr-subscribers-flang-openmp @llvm/pr-subscribers-flang-fir-hlfir Author: Kareem Ergawy (ergawy) ChangesFixes a bug in reductions when converting According to the spec (v5.2, p340, 36):
Therefore, for a combined directive similar to: On the other hand, if we have a combined construct similar to: This PR introduces a pattern that does exactly this. Given the following input:
this pattern updates the
Note the following:
Full diff: https://github.com/llvm/llvm-project/pull/132003.diff 2 Files Affected:
diff --git a/flang/lib/Optimizer/OpenMP/GenericLoopConversion.cpp b/flang/lib/Optimizer/OpenMP/GenericLoopConversion.cpp
index b0014a3aced6b..38ae90519c5cd 100644
--- a/flang/lib/Optimizer/OpenMP/GenericLoopConversion.cpp
+++ b/flang/lib/Optimizer/OpenMP/GenericLoopConversion.cpp
@@ -343,6 +343,115 @@ class GenericLoopConversionPattern
}
};
+/// According to the spec (v5.2, p340, 36):
+///
+/// ```
+/// The effect of the reduction clause is as if it is applied to all leaf
+/// constructs that permit the clause, except for the following constructs:
+/// * ....
+/// * The teams construct, when combined with the loop construct.
+/// ```
+///
+/// Therefore, for a combined directive similar to: `!$omp teams loop
+/// reduction(...)`, the earlier stages of the compiler assign the `reduction`
+/// clauses only to the `loop` leaf and not to the `teams` leaf.
+///
+/// On the other hand, if we have a combined construct similar to: `!$omp teams
+/// distribute parallel do`, the `reduction` clauses are assigned both to the
+/// `teams` and the `do` leaves. We need to match this behavior when we convert
+/// `teams` op with a nested `loop` op since the target set of constructs/ops
+/// will be incorrect without moving the reductions up to the `teams` op as
+/// well.
+///
+/// This pattern does exactly this. Given the following input:
+/// ```
+/// omp.teams {
+/// omp.loop reduction(@red_sym %red_op -> %red_arg : !fir.ref<i32>) {
+/// omp.loop_nest ... {
+/// ...
+/// }
+/// }
+/// }
+/// ```
+/// this pattern updates the `omp.teams` op in-place to:
+/// ```
+/// omp.teams reduction(@red_sym %red_op -> %teams_red_arg : !fir.ref<i32>) {
+/// omp.loop reduction(@red_sym %teams_red_arg -> %red_arg : !fir.ref<i32>) {
+/// omp.loop_nest ... {
+/// ...
+/// }
+/// }
+/// }
+/// ```
+///
+/// Note the following:
+/// * The nested `omp.loop` is not rewritten by this pattern, this happens
+/// through `GenericLoopConversionPattern`.
+/// * The reduction info are cloned from the nested `omp.loop` op to the parent
+/// `omp.teams` op.
+/// * The reduction operand of the `omp.loop` op is updated to be the **new**
+/// reduction block argument of the `omp.teams` op.
+class ReductionsMoverConverPattern
+ : public mlir::OpConversionPattern<mlir::omp::TeamsOp> {
+public:
+ using mlir::OpConversionPattern<mlir::omp::TeamsOp>::OpConversionPattern;
+
+ static mlir::omp::LoopOp
+ tryToFindNestedLoopWithReuctions(mlir::omp::TeamsOp teamsOp) {
+ assert(!teamsOp.getRegion().empty() &&
+ teamsOp.getRegion().getBlocks().size() == 1);
+
+ mlir::Block &teamsBlock = *teamsOp.getRegion().begin();
+ auto loopOpIter = llvm::find_if(teamsBlock, [](mlir::Operation &op) {
+ auto nestedLoopOp = llvm::dyn_cast<mlir::omp::LoopOp>(&op);
+
+ if (!nestedLoopOp)
+ return false;
+
+ return !nestedLoopOp.getReductionVars().empty();
+ });
+
+ if (loopOpIter == teamsBlock.end())
+ return nullptr;
+
+ // TODO return error if more than one loop op is nested. We need to
+ // coalesce reductions in this case.
+ return llvm::cast<mlir::omp::LoopOp>(loopOpIter);
+ }
+
+ mlir::LogicalResult
+ matchAndRewrite(mlir::omp::TeamsOp teamsOp, OpAdaptor adaptor,
+ mlir::ConversionPatternRewriter &rewriter) const override {
+ mlir::omp::LoopOp nestedLoopOp = tryToFindNestedLoopWithReuctions(teamsOp);
+
+ rewriter.modifyOpInPlace(teamsOp, [&]() {
+ teamsOp.setReductionMod(nestedLoopOp.getReductionMod());
+ teamsOp.getReductionVarsMutable().assign(nestedLoopOp.getReductionVars());
+ teamsOp.setReductionByref(nestedLoopOp.getReductionByref());
+ teamsOp.setReductionSymsAttr(nestedLoopOp.getReductionSymsAttr());
+
+ auto blockArgIface =
+ llvm::cast<mlir::omp::BlockArgOpenMPOpInterface>(*teamsOp);
+ unsigned reductionArgsStart = blockArgIface.getPrivateBlockArgsStart() +
+ blockArgIface.numPrivateBlockArgs();
+ llvm::SmallVector<mlir::Value> newLoopOpReductionOperands;
+
+ for (auto [idx, reductionVar] :
+ llvm::enumerate(nestedLoopOp.getReductionVars())) {
+ mlir::BlockArgument newTeamsOpReductionBlockArg =
+ teamsOp.getRegion().insertArgument(reductionArgsStart + idx,
+ reductionVar.getType(),
+ reductionVar.getLoc());
+ newLoopOpReductionOperands.push_back(newTeamsOpReductionBlockArg);
+ }
+
+ nestedLoopOp.getReductionVarsMutable().assign(newLoopOpReductionOperands);
+ });
+
+ return mlir::success();
+ }
+};
+
class GenericLoopConversionPass
: public flangomp::impl::GenericLoopConversionPassBase<
GenericLoopConversionPass> {
@@ -357,11 +466,23 @@ class GenericLoopConversionPass
mlir::MLIRContext *context = &getContext();
mlir::RewritePatternSet patterns(context);
- patterns.insert<GenericLoopConversionPattern>(context);
+ patterns.insert<ReductionsMoverConverPattern, GenericLoopConversionPattern>(
+ context);
mlir::ConversionTarget target(*context);
target.markUnknownOpDynamicallyLegal(
[](mlir::Operation *) { return true; });
+
+ target.addDynamicallyLegalOp<mlir::omp::TeamsOp>(
+ [](mlir::omp::TeamsOp teamsOp) {
+ // If teamsOp's reductions are already populated, then the op is
+ // legal. Additionally, the op is legal if it does not nest a LoopOp
+ // with reductions.
+ return !teamsOp.getReductionVars().empty() ||
+ ReductionsMoverConverPattern::tryToFindNestedLoopWithReuctions(
+ teamsOp) == nullptr;
+ });
+
target.addDynamicallyLegalOp<mlir::omp::LoopOp>(
[](mlir::omp::LoopOp loopOp) {
return mlir::failed(
diff --git a/flang/test/Lower/OpenMP/loop-directive.f90 b/flang/test/Lower/OpenMP/loop-directive.f90
index 123dfffa350d7..61ac59d577c63 100644
--- a/flang/test/Lower/OpenMP/loop-directive.f90
+++ b/flang/test/Lower/OpenMP/loop-directive.f90
@@ -318,12 +318,12 @@ subroutine loop_parallel_bind_reduction
subroutine loop_teams_loop_reduction
implicit none
integer :: x, i
- ! CHECK: omp.teams {
+ ! CHECK: omp.teams reduction(@add_reduction_i32 %{{.*}}#0 -> %[[TEAMS_RED_ARG:.*]] : !fir.ref<i32>) {
! CHECK: omp.parallel
! CHECK-SAME: private(@{{[^[:space:]]+}} %{{[^[:space:]]+}}#0 -> %[[PRIV_ARG:[^[:space:]]+]] : !fir.ref<i32>) {
! CHECK: omp.distribute {
! CHECK: omp.wsloop
- ! CHECK-SAME: reduction(@add_reduction_i32 %{{.*}}#0 -> %[[RED_ARG:.*]] : !fir.ref<i32>) {
+ ! CHECK-SAME: reduction(@add_reduction_i32 %[[TEAMS_RED_ARG]] -> %[[RED_ARG:.*]] : !fir.ref<i32>) {
! CHECK-NEXT: omp.loop_nest {{.*}} {
! CHECK-NEXT: hlfir.declare %[[PRIV_ARG]] {uniq_name = "_QF{{.*}}Ei"}
! CHECK-NEXT: hlfir.declare %[[RED_ARG]] {uniq_name = "_QF{{.*}}Ex"}
|
@@ -343,6 +343,115 @@ class GenericLoopConversionPattern | |||
} | |||
}; | |||
|
|||
/// According to the spec (v5.2, p340, 36): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The PR needs more testing, working on that. I opened at this point for early feedback in case there are major issues with the approach taken.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The approach seems okay to me, but I need to look at the spec more. The following paragraph is a bit tricky imo.
21b30b7
to
37f2f8e
Compare
The semantics for |
If we do that earlier, I think this will be confusing since it will be a direct violation of what the standard stipulates in the part quoted in the commit message. On the other hand, doing this in the pass keeps the weirdness (special handling) of |
… `teams` ops Fixes a bug in reductions when converting `teams loop` constructs with `reduction` clauses. According to the spec (v5.2, p340, 36): ``` The effect of the reduction clause is as if it is applied to all leaf constructs that permit the clause, except for the following constructs: * .... * The teams construct, when combined with the loop construct. ``` Therefore, for a combined directive similar to: `!$omp teams loop reduction(...)`, the earlier stages of the compiler assign the `reduction` clauses only to the `loop` leaf and not to the `teams` leaf. On the other hand, if we have a combined construct similar to: `!$omp teams distribute parallel do`, the `reduction` clauses are assigned both to the `teams` and the `do` leaves. We need to match this behavior when we convert `teams` op with a nested `loop` op since the target set of constructs/ops will be incorrect without moving the reductions up to the `teams` op as well. This PR introduces a pattern that does exactly this. Given the following input: ``` omp.teams { omp.loop reduction(@red_sym %red_op -> %red_arg : !fir.ref<i32>) { omp.loop_nest ... { ... } } } ``` this pattern updates the `omp.teams` op in-place to: ``` omp.teams reduction(@red_sym %red_op -> %teams_red_arg : !fir.ref<i32>) { omp.loop reduction(@red_sym %teams_red_arg -> %red_arg : !fir.ref<i32>) { omp.loop_nest ... { ... } } } ``` Note the following: * The nested `omp.loop` is not rewritten by this pattern, this happens through `GenericLoopConversionPattern`. * The reduction info are cloned from the nested `omp.loop` op to the parent `omp.teams` op. * The reduction operand of the `omp.loop` op is updated to be the **new** reduction block argument of the `omp.teams` op.
I am wondering, howerver, what we should do in these 2 scearios: subroutine scenario_1
implicit none
integer :: x, i
!$omp loop reduction(+: x) bind(teams)
DO i = 1, 5
call foo()
END DO
end subroutine The above scenario is problematic because, AFAICT, this is allowed by the spec (it is allowed by flang atm) but there is no construct to hoist the reductions to. subroutine scenario_2
implicit none
integer :: x, i
!$omp teams loop reduction(+: x)
DO i = 1, 5
call foo()
END DO
end subroutine This one is easier, I think we just hoist the reductions to the @mjklemm @jsjodin please take a look and let me know what you think. |
loop
ops to parent teams
opsloop
ops to parent teams
ops
37f2f8e
to
05a12c0
Compare
I think we can look at the above 2 scenarios in later PRs if that is ok with you. |
I do not see how we could possibly attach the reduction to the A similar case would be |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me
LLVM Buildbot has detected a new failure on builder Full details are available at: https://lab.llvm.org/buildbot/#/builders/35/builds/8415 Here is the relevant piece of the build log for the reference
|
… `teams` ops (llvm#132003) Fixes a bug in reductions when converting `teams loop` constructs with `reduction` clauses. According to the spec (v5.2, p340, 36): ``` The effect of the reduction clause is as if it is applied to all leaf constructs that permit the clause, except for the following constructs: * .... * The teams construct, when combined with the loop construct. ``` Therefore, for a combined directive similar to: `!$omp teams loop reduction(...)`, the earlier stages of the compiler assign the `reduction` clauses only to the `loop` leaf and not to the `teams` leaf. On the other hand, if we have a combined construct similar to: `!$omp teams distribute parallel do`, the `reduction` clauses are assigned both to the `teams` and the `do` leaves. We need to match this behavior when we convert `teams` op with a nested `loop` op since the target set of constructs/ops will be incorrect without moving the reductions up to the `teams` op as well. This PR introduces a pattern that does exactly this. Given the following input: ``` omp.teams { omp.loop reduction(@red_sym %red_op -> %red_arg : !fir.ref<i32>) { omp.loop_nest ... { ... } } } ``` this pattern updates the `omp.teams` op in-place to: ``` omp.teams reduction(@red_sym %red_op -> %teams_red_arg : !fir.ref<i32>) { omp.loop reduction(@red_sym %teams_red_arg -> %red_arg : !fir.ref<i32>) { omp.loop_nest ... { ... } } } ``` Note the following: * The nested `omp.loop` is not rewritten by this pattern, this happens through `GenericLoopConversionPattern`. * The reduction info are cloned from the nested `omp.loop` op to the parent `omp.teams` op. * The reduction operand of the `omp.loop` op is updated to be the **new** reduction block argument of the `omp.teams` op.
…o parent `teams` ops (llvm#132003)" This reverts commit 9e63b90.
… `teams` ops (llvm#132003) Fixes a bug in reductions when converting `teams loop` constructs with `reduction` clauses. According to the spec (v5.2, p340, 36): ``` The effect of the reduction clause is as if it is applied to all leaf constructs that permit the clause, except for the following constructs: * .... * The teams construct, when combined with the loop construct. ``` Therefore, for a combined directive similar to: `!$omp teams loop reduction(...)`, the earlier stages of the compiler assign the `reduction` clauses only to the `loop` leaf and not to the `teams` leaf. On the other hand, if we have a combined construct similar to: `!$omp teams distribute parallel do`, the `reduction` clauses are assigned both to the `teams` and the `do` leaves. We need to match this behavior when we convert `teams` op with a nested `loop` op since the target set of constructs/ops will be incorrect without moving the reductions up to the `teams` op as well. This PR introduces a pattern that does exactly this. Given the following input: ``` omp.teams { omp.loop reduction(@red_sym %red_op -> %red_arg : !fir.ref<i32>) { omp.loop_nest ... { ... } } } ``` this pattern updates the `omp.teams` op in-place to: ``` omp.teams reduction(@red_sym %red_op -> %teams_red_arg : !fir.ref<i32>) { omp.loop reduction(@red_sym %teams_red_arg -> %red_arg : !fir.ref<i32>) { omp.loop_nest ... { ... } } } ``` Note the following: * The nested `omp.loop` is not rewritten by this pattern, this happens through `GenericLoopConversionPattern`. * The reduction info are cloned from the nested `omp.loop` op to the parent `omp.teams` op. * The reduction operand of the `omp.loop` op is updated to be the **new** reduction block argument of the `omp.teams` op.
…ed to `distribute` Follow-up to #132003, in particular, see #132003 (comment). This PR extends reduction support for `loop` directives. Consider the following scenario: ```fortran subroutine bar implicit none integer :: x, i !$omp teams loop reduction(+: x) DO i = 1, 5 call foo() END DO end subroutine ``` Note the following: * According to the spec, the `reduction` clause will be attached to `loop` during earlier stages in the compiler. * Additionally, `loop` cannot be mapped to `distribute parallel for` due to the call to a foreign function inside the loop's body. * Therefore, `loop` must be mapped to `distribute`. * However, `distribute` does not have `reduction` clauses. * As a result, we have to move the `reduction`s from the `loop` to its parent `teams` directive, which is what is done by this PR.
Not saying we should. I am just saying this scenario is allowed by the spec (or at least by flang atm) and therefore we should handle it somehow. Maybe one option would be to just drop the reductions in this case and serialize the loop! But I think we should find a solution for this case. The 2nd scenario is handled in #132920. |
Fixes a bug in reductions when converting
teams loop
constructs withreduction
clauses.According to the spec (v5.2, p340, 36):
Therefore, for a combined directive similar to:
!$omp teams loop reduction(...)
, the earlier stages of the compiler assign thereduction
clauses only to theloop
leaf and not to theteams
leaf.On the other hand, if we have a combined construct similar to:
!$omp teams distribute parallel do
, thereduction
clauses are assigned both to theteams
and thedo
leaves. We need to match this behavior when we convertteams
op with a nestedloop
op since the target set of constructs/ops will be incorrect without moving the reductions up to theteams
op as well.This PR introduces a pattern that does exactly this. Given the following input:
this pattern updates the
omp.teams
op in-place to:Note the following:
omp.loop
is not rewritten by this pattern, this happens throughGenericLoopConversionPattern
.omp.loop
op to the parentomp.teams
op.omp.loop
op is updated to be the new reduction block argument of theomp.teams
op.