-
Notifications
You must be signed in to change notification settings - Fork 14.1k
[flang][OpenMP] Skip runtime mapping with no offload targets #144534
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
When no offload targets are specified flang will ignore "target" constructs, but not "target data" constructs. This patch makes the behavior consistent across all offload-related operations. While ignoring "target" may produce semantically incorrect code, it may still be a useful debugging tool.
@llvm/pr-subscribers-mlir-openmp @llvm/pr-subscribers-flang-openmp Author: Krzysztof Parzyszek (kparzysz) ChangesWhen no offload targets are specified flang will ignore "target" constructs, but not "target data" constructs. This patch makes the behavior consistent across all offload-related operations. While ignoring "target" may produce semantically incorrect code, it may still be a useful debugging tool. Patch is 30.76 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/144534.diff 5 Files Affected:
diff --git a/flang/test/Lower/ignore-target-data.f90 b/flang/test/Lower/ignore-target-data.f90
new file mode 100644
index 0000000000000..a2182ee0f5b0b
--- /dev/null
+++ b/flang/test/Lower/ignore-target-data.f90
@@ -0,0 +1,20 @@
+!RUN: %flang_fc1 -emit-llvm -fopenmp %s -o - | FileCheck %s
+
+!Make sure that there are no calls to the mapper.
+
+!CHECK-NOT: call{{.*}}__tgt_target_data_begin_mapper
+!CHECK-NOT: call{{.*}}__tgt_target_data_end_mapper
+
+program test
+
+call f(1, 2)
+
+contains
+
+subroutine f(x, y)
+ integer :: x, y
+ !$omp target data map(tofrom: x, y)
+ x = x + y
+ !$omp end target data
+end subroutine
+end
diff --git a/mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp b/mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp
index 90ce06a0345c0..5985f2b0f5621 100644
--- a/mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp
+++ b/mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp
@@ -4378,6 +4378,9 @@ convertOmpTargetData(Operation *op, llvm::IRBuilderBase &builder,
llvm::OpenMPIRBuilder *ompBuilder = moduleTranslation.getOpenMPBuilder();
llvm::OpenMPIRBuilder::TargetDataInfo info(/*RequiresDevicePointerInfo=*/true,
/*SeparateBeginEndCalls=*/true);
+ bool isTargetDevice = ompBuilder->Config.isTargetDevice();
+ bool isOffloadEntry =
+ isTargetDevice || !ompBuilder->Config.TargetTriples.empty();
LogicalResult result =
llvm::TypeSwitch<Operation *, LogicalResult>(op)
@@ -4402,6 +4405,8 @@ convertOmpTargetData(Operation *op, llvm::IRBuilderBase &builder,
.Case([&](omp::TargetEnterDataOp enterDataOp) -> LogicalResult {
if (failed(checkImplementationStatus(*enterDataOp)))
return failure();
+ if (!isOffloadEntry)
+ return success();
if (auto ifVar = enterDataOp.getIfExpr())
ifCond = moduleTranslation.lookupValue(ifVar);
@@ -4422,6 +4427,8 @@ convertOmpTargetData(Operation *op, llvm::IRBuilderBase &builder,
.Case([&](omp::TargetExitDataOp exitDataOp) -> LogicalResult {
if (failed(checkImplementationStatus(*exitDataOp)))
return failure();
+ if (!isOffloadEntry)
+ return success();
if (auto ifVar = exitDataOp.getIfExpr())
ifCond = moduleTranslation.lookupValue(ifVar);
@@ -4442,6 +4449,8 @@ convertOmpTargetData(Operation *op, llvm::IRBuilderBase &builder,
.Case([&](omp::TargetUpdateOp updateDataOp) -> LogicalResult {
if (failed(checkImplementationStatus(*updateDataOp)))
return failure();
+ if (!isOffloadEntry)
+ return success();
if (auto ifVar = updateDataOp.getIfExpr())
ifCond = moduleTranslation.lookupValue(ifVar);
@@ -4467,6 +4476,8 @@ convertOmpTargetData(Operation *op, llvm::IRBuilderBase &builder,
if (failed(result))
return failure();
+ if (!isOffloadEntry)
+ return success();
using InsertPointTy = llvm::OpenMPIRBuilder::InsertPointTy;
MapInfoData mapData;
diff --git a/mlir/test/Target/LLVMIR/omptarget-llvm.mlir b/mlir/test/Target/LLVMIR/omptarget-llvm.mlir
index 971bea2068544..e6ea3aaeec656 100644
--- a/mlir/test/Target/LLVMIR/omptarget-llvm.mlir
+++ b/mlir/test/Target/LLVMIR/omptarget-llvm.mlir
@@ -1,15 +1,17 @@
// RUN: mlir-translate -mlir-to-llvmir -split-input-file %s | FileCheck %s
-llvm.func @_QPopenmp_target_data() {
- %0 = llvm.mlir.constant(1 : i64) : i64
- %1 = llvm.alloca %0 x i32 {bindc_name = "i", in_type = i32, operand_segment_sizes = array<i32: 0, 0>, uniq_name = "_QFopenmp_target_dataEi"} : (i64) -> !llvm.ptr
- %2 = omp.map.info var_ptr(%1 : !llvm.ptr, i32) map_clauses(tofrom) capture(ByRef) -> !llvm.ptr {name = ""}
- omp.target_data map_entries(%2 : !llvm.ptr) {
- %3 = llvm.mlir.constant(99 : i32) : i32
- llvm.store %3, %1 : i32, !llvm.ptr
- omp.terminator
+module attributes {omp.target_triples = ["amdgcn-amd-amdhsa"]} {
+ llvm.func @_QPopenmp_target_data() {
+ %0 = llvm.mlir.constant(1 : i64) : i64
+ %1 = llvm.alloca %0 x i32 {bindc_name = "i", in_type = i32, operand_segment_sizes = array<i32: 0, 0>, uniq_name = "_QFopenmp_target_dataEi"} : (i64) -> !llvm.ptr
+ %2 = omp.map.info var_ptr(%1 : !llvm.ptr, i32) map_clauses(tofrom) capture(ByRef) -> !llvm.ptr {name = ""}
+ omp.target_data map_entries(%2 : !llvm.ptr) {
+ %3 = llvm.mlir.constant(99 : i32) : i32
+ llvm.store %3, %1 : i32, !llvm.ptr
+ omp.terminator
+ }
+ llvm.return
}
- llvm.return
}
// CHECK: @.offload_sizes = private unnamed_addr constant [1 x i64] [i64 4]
@@ -38,23 +40,25 @@ llvm.func @_QPopenmp_target_data() {
// -----
-llvm.func @_QPopenmp_target_data_region(%0 : !llvm.ptr) {
- %1 = llvm.mlir.constant(1023 : index) : i64
- %2 = llvm.mlir.constant(0 : index) : i64
- %3 = llvm.mlir.constant(1024 : index) : i64
- %4 = llvm.mlir.constant(1 : index) : i64
- %5 = omp.map.bounds lower_bound(%2 : i64) upper_bound(%1 : i64) extent(%3 : i64) stride(%4 : i64) start_idx(%4 : i64)
- %6 = omp.map.info var_ptr(%0 : !llvm.ptr, !llvm.array<1024 x i32>) map_clauses(from) capture(ByRef) bounds(%5) -> !llvm.ptr {name = ""}
- omp.target_data map_entries(%6 : !llvm.ptr) {
- %7 = llvm.mlir.constant(99 : i32) : i32
- %8 = llvm.mlir.constant(1 : i64) : i64
- %9 = llvm.mlir.constant(1 : i64) : i64
- %10 = llvm.mlir.constant(0 : i64) : i64
- %11 = llvm.getelementptr %0[0, %10] : (!llvm.ptr, i64) -> !llvm.ptr, !llvm.array<1024 x i32>
- llvm.store %7, %11 : i32, !llvm.ptr
- omp.terminator
+module attributes {omp.target_triples = ["amdgcn-amd-amdhsa"]} {
+ llvm.func @_QPopenmp_target_data_region(%0 : !llvm.ptr) {
+ %1 = llvm.mlir.constant(1023 : index) : i64
+ %2 = llvm.mlir.constant(0 : index) : i64
+ %3 = llvm.mlir.constant(1024 : index) : i64
+ %4 = llvm.mlir.constant(1 : index) : i64
+ %5 = omp.map.bounds lower_bound(%2 : i64) upper_bound(%1 : i64) extent(%3 : i64) stride(%4 : i64) start_idx(%4 : i64)
+ %6 = omp.map.info var_ptr(%0 : !llvm.ptr, !llvm.array<1024 x i32>) map_clauses(from) capture(ByRef) bounds(%5) -> !llvm.ptr {name = ""}
+ omp.target_data map_entries(%6 : !llvm.ptr) {
+ %7 = llvm.mlir.constant(99 : i32) : i32
+ %8 = llvm.mlir.constant(1 : i64) : i64
+ %9 = llvm.mlir.constant(1 : i64) : i64
+ %10 = llvm.mlir.constant(0 : i64) : i64
+ %11 = llvm.getelementptr %0[0, %10] : (!llvm.ptr, i64) -> !llvm.ptr, !llvm.array<1024 x i32>
+ llvm.store %7, %11 : i32, !llvm.ptr
+ omp.terminator
+ }
+ llvm.return
}
- llvm.return
}
// CHECK: @.offload_sizes = private unnamed_addr constant [1 x i64] [i64 4096]
@@ -85,50 +89,52 @@ llvm.func @_QPopenmp_target_data_region(%0 : !llvm.ptr) {
// -----
-llvm.func @_QPomp_target_enter_exit(%1 : !llvm.ptr, %3 : !llvm.ptr) {
- %4 = llvm.mlir.constant(1 : i64) : i64
- %5 = llvm.alloca %4 x i32 {bindc_name = "dvc", in_type = i32, operandSegmentSizes = array<i32: 0, 0>, uniq_name = "_QFomp_target_enter_exitEdvc"} : (i64) -> !llvm.ptr
- %6 = llvm.mlir.constant(1 : i64) : i64
- %7 = llvm.alloca %6 x i32 {bindc_name = "i", in_type = i32, operandSegmentSizes = array<i32: 0, 0>, uniq_name = "_QFomp_target_enter_exitEi"} : (i64) -> !llvm.ptr
- %8 = llvm.mlir.constant(5 : i32) : i32
- llvm.store %8, %7 : i32, !llvm.ptr
- %9 = llvm.mlir.constant(2 : i32) : i32
- llvm.store %9, %5 : i32, !llvm.ptr
- %10 = llvm.load %7 : !llvm.ptr -> i32
- %11 = llvm.mlir.constant(10 : i32) : i32
- %12 = llvm.icmp "slt" %10, %11 : i32
- %13 = llvm.load %5 : !llvm.ptr -> i32
- %14 = llvm.mlir.constant(1023 : index) : i64
- %15 = llvm.mlir.constant(0 : index) : i64
- %16 = llvm.mlir.constant(1024 : index) : i64
- %17 = llvm.mlir.constant(1 : index) : i64
- %18 = omp.map.bounds lower_bound(%15 : i64) upper_bound(%14 : i64) extent(%16 : i64) stride(%17 : i64) start_idx(%17 : i64)
- %map1 = omp.map.info var_ptr(%1 : !llvm.ptr, !llvm.array<1024 x i32>) map_clauses(to) capture(ByRef) bounds(%18) -> !llvm.ptr {name = ""}
- %19 = llvm.mlir.constant(511 : index) : i64
- %20 = llvm.mlir.constant(0 : index) : i64
- %21 = llvm.mlir.constant(512 : index) : i64
- %22 = llvm.mlir.constant(1 : index) : i64
- %23 = omp.map.bounds lower_bound(%20 : i64) upper_bound(%19 : i64) extent(%21 : i64) stride(%22 : i64) start_idx(%22 : i64)
- %map2 = omp.map.info var_ptr(%3 : !llvm.ptr, !llvm.array<512 x i32>) map_clauses(exit_release_or_enter_alloc) capture(ByRef) bounds(%23) -> !llvm.ptr {name = ""}
- omp.target_enter_data if(%12) device(%13 : i32) map_entries(%map1, %map2 : !llvm.ptr, !llvm.ptr)
- %24 = llvm.load %7 : !llvm.ptr -> i32
- %25 = llvm.mlir.constant(10 : i32) : i32
- %26 = llvm.icmp "sgt" %24, %25 : i32
- %27 = llvm.load %5 : !llvm.ptr -> i32
- %28 = llvm.mlir.constant(1023 : index) : i64
- %29 = llvm.mlir.constant(0 : index) : i64
- %30 = llvm.mlir.constant(1024 : index) : i64
- %31 = llvm.mlir.constant(1 : index) : i64
- %32 = omp.map.bounds lower_bound(%29 : i64) upper_bound(%28 : i64) extent(%30 : i64) stride(%31 : i64) start_idx(%31 : i64)
- %map3 = omp.map.info var_ptr(%1 : !llvm.ptr, !llvm.array<1024 x i32>) map_clauses(from) capture(ByRef) bounds(%32) -> !llvm.ptr {name = ""}
- %33 = llvm.mlir.constant(511 : index) : i64
- %34 = llvm.mlir.constant(0 : index) : i64
- %35 = llvm.mlir.constant(512 : index) : i64
- %36 = llvm.mlir.constant(1 : index) : i64
- %37 = omp.map.bounds lower_bound(%34 : i64) upper_bound(%33 : i64) extent(%35 : i64) stride(%36 : i64) start_idx(%36 : i64)
- %map4 = omp.map.info var_ptr(%3 : !llvm.ptr, !llvm.array<512 x i32>) map_clauses(exit_release_or_enter_alloc) capture(ByRef) bounds(%37) -> !llvm.ptr {name = ""}
- omp.target_exit_data if(%26) device(%27 : i32) map_entries(%map3, %map4 : !llvm.ptr, !llvm.ptr)
- llvm.return
+module attributes {omp.target_triples = ["amdgcn-amd-amdhsa"]} {
+ llvm.func @_QPomp_target_enter_exit(%1 : !llvm.ptr, %3 : !llvm.ptr) {
+ %4 = llvm.mlir.constant(1 : i64) : i64
+ %5 = llvm.alloca %4 x i32 {bindc_name = "dvc", in_type = i32, operandSegmentSizes = array<i32: 0, 0>, uniq_name = "_QFomp_target_enter_exitEdvc"} : (i64) -> !llvm.ptr
+ %6 = llvm.mlir.constant(1 : i64) : i64
+ %7 = llvm.alloca %6 x i32 {bindc_name = "i", in_type = i32, operandSegmentSizes = array<i32: 0, 0>, uniq_name = "_QFomp_target_enter_exitEi"} : (i64) -> !llvm.ptr
+ %8 = llvm.mlir.constant(5 : i32) : i32
+ llvm.store %8, %7 : i32, !llvm.ptr
+ %9 = llvm.mlir.constant(2 : i32) : i32
+ llvm.store %9, %5 : i32, !llvm.ptr
+ %10 = llvm.load %7 : !llvm.ptr -> i32
+ %11 = llvm.mlir.constant(10 : i32) : i32
+ %12 = llvm.icmp "slt" %10, %11 : i32
+ %13 = llvm.load %5 : !llvm.ptr -> i32
+ %14 = llvm.mlir.constant(1023 : index) : i64
+ %15 = llvm.mlir.constant(0 : index) : i64
+ %16 = llvm.mlir.constant(1024 : index) : i64
+ %17 = llvm.mlir.constant(1 : index) : i64
+ %18 = omp.map.bounds lower_bound(%15 : i64) upper_bound(%14 : i64) extent(%16 : i64) stride(%17 : i64) start_idx(%17 : i64)
+ %map1 = omp.map.info var_ptr(%1 : !llvm.ptr, !llvm.array<1024 x i32>) map_clauses(to) capture(ByRef) bounds(%18) -> !llvm.ptr {name = ""}
+ %19 = llvm.mlir.constant(511 : index) : i64
+ %20 = llvm.mlir.constant(0 : index) : i64
+ %21 = llvm.mlir.constant(512 : index) : i64
+ %22 = llvm.mlir.constant(1 : index) : i64
+ %23 = omp.map.bounds lower_bound(%20 : i64) upper_bound(%19 : i64) extent(%21 : i64) stride(%22 : i64) start_idx(%22 : i64)
+ %map2 = omp.map.info var_ptr(%3 : !llvm.ptr, !llvm.array<512 x i32>) map_clauses(exit_release_or_enter_alloc) capture(ByRef) bounds(%23) -> !llvm.ptr {name = ""}
+ omp.target_enter_data if(%12) device(%13 : i32) map_entries(%map1, %map2 : !llvm.ptr, !llvm.ptr)
+ %24 = llvm.load %7 : !llvm.ptr -> i32
+ %25 = llvm.mlir.constant(10 : i32) : i32
+ %26 = llvm.icmp "sgt" %24, %25 : i32
+ %27 = llvm.load %5 : !llvm.ptr -> i32
+ %28 = llvm.mlir.constant(1023 : index) : i64
+ %29 = llvm.mlir.constant(0 : index) : i64
+ %30 = llvm.mlir.constant(1024 : index) : i64
+ %31 = llvm.mlir.constant(1 : index) : i64
+ %32 = omp.map.bounds lower_bound(%29 : i64) upper_bound(%28 : i64) extent(%30 : i64) stride(%31 : i64) start_idx(%31 : i64)
+ %map3 = omp.map.info var_ptr(%1 : !llvm.ptr, !llvm.array<1024 x i32>) map_clauses(from) capture(ByRef) bounds(%32) -> !llvm.ptr {name = ""}
+ %33 = llvm.mlir.constant(511 : index) : i64
+ %34 = llvm.mlir.constant(0 : index) : i64
+ %35 = llvm.mlir.constant(512 : index) : i64
+ %36 = llvm.mlir.constant(1 : index) : i64
+ %37 = omp.map.bounds lower_bound(%34 : i64) upper_bound(%33 : i64) extent(%35 : i64) stride(%36 : i64) start_idx(%36 : i64)
+ %map4 = omp.map.info var_ptr(%3 : !llvm.ptr, !llvm.array<512 x i32>) map_clauses(exit_release_or_enter_alloc) capture(ByRef) bounds(%37) -> !llvm.ptr {name = ""}
+ omp.target_exit_data if(%26) device(%27 : i32) map_entries(%map3, %map4 : !llvm.ptr, !llvm.ptr)
+ llvm.return
+ }
}
// CHECK: @.offload_sizes = private unnamed_addr constant [2 x i64] [i64 4096, i64 2048]
@@ -205,18 +211,20 @@ llvm.func @_QPomp_target_enter_exit(%1 : !llvm.ptr, %3 : !llvm.ptr) {
// -----
-llvm.func @_QPopenmp_target_use_dev_ptr() {
- %0 = llvm.mlir.constant(1 : i64) : i64
- %a = llvm.alloca %0 x !llvm.ptr : (i64) -> !llvm.ptr
- %map1 = omp.map.info var_ptr(%a : !llvm.ptr, !llvm.ptr) map_clauses(from) capture(ByRef) -> !llvm.ptr {name = ""}
- %map2 = omp.map.info var_ptr(%a : !llvm.ptr, !llvm.ptr) map_clauses(from) capture(ByRef) -> !llvm.ptr {name = ""}
- omp.target_data map_entries(%map1 : !llvm.ptr) use_device_ptr(%map2 -> %arg0 : !llvm.ptr) {
- %1 = llvm.mlir.constant(10 : i32) : i32
- %2 = llvm.load %arg0 : !llvm.ptr -> !llvm.ptr
- llvm.store %1, %2 : i32, !llvm.ptr
- omp.terminator
+module attributes {omp.target_triples = ["amdgcn-amd-amdhsa"]} {
+ llvm.func @_QPopenmp_target_use_dev_ptr() {
+ %0 = llvm.mlir.constant(1 : i64) : i64
+ %a = llvm.alloca %0 x !llvm.ptr : (i64) -> !llvm.ptr
+ %map1 = omp.map.info var_ptr(%a : !llvm.ptr, !llvm.ptr) map_clauses(from) capture(ByRef) -> !llvm.ptr {name = ""}
+ %map2 = omp.map.info var_ptr(%a : !llvm.ptr, !llvm.ptr) map_clauses(from) capture(ByRef) -> !llvm.ptr {name = ""}
+ omp.target_data map_entries(%map1 : !llvm.ptr) use_device_ptr(%map2 -> %arg0 : !llvm.ptr) {
+ %1 = llvm.mlir.constant(10 : i32) : i32
+ %2 = llvm.load %arg0 : !llvm.ptr -> !llvm.ptr
+ llvm.store %1, %2 : i32, !llvm.ptr
+ omp.terminator
+ }
+ llvm.return
}
- llvm.return
}
// CHECK: @.offload_sizes = private unnamed_addr constant [1 x i64] [i64 8]
@@ -249,18 +257,20 @@ llvm.func @_QPopenmp_target_use_dev_ptr() {
// -----
-llvm.func @_QPopenmp_target_use_dev_addr() {
- %0 = llvm.mlir.constant(1 : i64) : i64
- %a = llvm.alloca %0 x !llvm.ptr : (i64) -> !llvm.ptr
- %map = omp.map.info var_ptr(%a : !llvm.ptr, !llvm.ptr) map_clauses(from) capture(ByRef) -> !llvm.ptr {name = ""}
- %map2 = omp.map.info var_ptr(%a : !llvm.ptr, !llvm.ptr) map_clauses(from) capture(ByRef) -> !llvm.ptr {name = ""}
- omp.target_data map_entries(%map : !llvm.ptr) use_device_addr(%map2 -> %arg0 : !llvm.ptr) {
- %1 = llvm.mlir.constant(10 : i32) : i32
- %2 = llvm.load %arg0 : !llvm.ptr -> !llvm.ptr
- llvm.store %1, %2 : i32, !llvm.ptr
- omp.terminator
+module attributes {omp.target_triples = ["amdgcn-amd-amdhsa"]} {
+ llvm.func @_QPopenmp_target_use_dev_addr() {
+ %0 = llvm.mlir.constant(1 : i64) : i64
+ %a = llvm.alloca %0 x !llvm.ptr : (i64) -> !llvm.ptr
+ %map = omp.map.info var_ptr(%a : !llvm.ptr, !llvm.ptr) map_clauses(from) capture(ByRef) -> !llvm.ptr {name = ""}
+ %map2 = omp.map.info var_ptr(%a : !llvm.ptr, !llvm.ptr) map_clauses(from) capture(ByRef) -> !llvm.ptr {name = ""}
+ omp.target_data map_entries(%map : !llvm.ptr) use_device_addr(%map2 -> %arg0 : !llvm.ptr) {
+ %1 = llvm.mlir.constant(10 : i32) : i32
+ %2 = llvm.load %arg0 : !llvm.ptr -> !llvm.ptr
+ llvm.store %1, %2 : i32, !llvm.ptr
+ omp.terminator
+ }
+ llvm.return
}
- llvm.return
}
// CHECK: @.offload_sizes = private unnamed_addr constant [1 x i64] [i64 8]
@@ -291,17 +301,19 @@ llvm.func @_QPopenmp_target_use_dev_addr() {
// -----
-llvm.func @_QPopenmp_target_use_dev_addr_no_ptr() {
- %0 = llvm.mlir.constant(1 : i64) : i64
- %a = llvm.alloca %0 x i32 : (i64) -> !llvm.ptr
- %map = omp.map.info var_ptr(%a : !llvm.ptr, i32) map_clauses(tofrom) capture(ByRef) -> !llvm.ptr {name = ""}
- %map2 = omp.map.info var_ptr(%a : !llvm.ptr, i32) map_clauses(tofrom) capture(ByRef) -> !llvm.ptr {name = ""}
- omp.target_data map_entries(%map : !llvm.ptr) use_device_addr(%map2 -> %arg0 : !llvm.ptr) {
- %1 = llvm.mlir.constant(10 : i32) : i32
- llvm.store %1, %arg0 : i32, !llvm.ptr
- omp.terminator
+module attributes {omp.target_triples = ["amdgcn-amd-amdhsa"]} {
+ llvm.func @_QPopenmp_target_use_dev_addr_no_ptr() {
+ %0 = llvm.mlir.constant(1 : i64) : i64
+ %a = llvm.alloca %0 x i32 : (i64) -> !llvm.ptr
+ %map = omp.map.info var_ptr(%a : !llvm.ptr, i32) map_clauses(tofrom) capture(ByRef) -> !llvm.ptr {name = ""}
+ %map2 = omp.map.info var_ptr(%a : !llvm.ptr, i32) map_clauses(tofrom) capture(ByRef) -> !llvm.ptr {name = ""}
+ omp.target_data map_entries(%map : !llvm.ptr) use_device_addr(%map2 -> %arg0 : !llvm.ptr) {
+ %1 = llvm.mlir.constant(10 : i32) : i32
+ llvm.store %1, %arg0 : i32, !llvm.ptr
+ omp.terminator
+ }
+ llvm.return
}
- llvm.return
}
// CHECK: @.offload_sizes = private unnamed_addr constant [1 x i64] [i64 4]
@@ -331,23 +343,25 @@ llvm.func @_QPopenmp_target_use_dev_addr_no_ptr() {
// -----
-llvm.func @_QPopenmp_target_use_dev_addr_nomap() {
- %0 = llvm.mlir.constant(1 : i64) : i64
- %a = llvm.alloca %0 x !llvm.ptr : (i64) -> !llvm.ptr
- %1 = llvm.mlir.constant(1 : i64) : i64
- %b = llvm.alloca %0 x !llvm.ptr : (i64) -> !llvm.ptr
- %map = omp.map.info var_ptr(%b : !llvm.ptr, !llvm.ptr) map_clauses(from) capture(ByRef) -> !llvm.ptr {name = ""}
- %map2 = omp.map.info var_ptr(%a : !llvm.ptr, !llvm.ptr) map_clauses(tofrom) capture(ByRef) -> !llvm.ptr {name = ""}
- omp.target_data map_entries(%map : !llvm.ptr) use_device_addr(%map2 -> %arg0 : !llvm.ptr) {
- %2 = llvm.mlir.constant(10 : i32) : i32
- %3 = llvm.load %arg0 : !llvm.ptr -> !llvm.ptr
- llvm.store %2, %3 : i32, !llvm.ptr
- %4 = llvm.mlir.constant(20 : i32) : i32
- %5 = llvm.load %b : !llvm.ptr -> !llvm.ptr
- llvm.store %4, %5 : i32, !llvm.ptr
- omp.terminator
+module attributes {omp.target_triples = ["amdgcn-amd-amdhsa"]} {
+ llvm.func @_QPopenmp_target_use_dev_addr_nomap() {
+ %0 = llvm.mlir.constant(1 : i64) : i64
+ %a = llvm.alloca %0 x !llvm.ptr : (i64) -> !llvm.ptr
+ %1 = llvm.mlir.constant(1 : i64) : i64
+ %b = llvm.alloca %0 x !llvm.ptr : (i64) -> !llvm.ptr
+ %map = omp.map.info var_ptr(%b : !llvm.ptr, !llvm.ptr) map_clauses(from) capture(ByRef) -> !llvm.ptr {name = ""}
+ %map2 = omp.map.info var_ptr(%a : !llvm.ptr, !llvm.ptr) map_clauses(tofrom) capture(ByRef) -> !llvm.ptr {name = ""}
+ omp.target_data map_entries(%map : !llvm.ptr) use_device_addr(%map2 -> %arg0 : !llvm.ptr) {
+ %2 = llvm.mlir.constant(10 : i32) : i32
+ %3 = llvm.load %arg0 : !llvm.ptr -> !llvm.ptr
+ llvm.store %2, %3 : i32, !llvm.ptr
+ %4 = llvm.mlir.constant(20 : i32) : i32
+ %5 = llvm.load %b : !llvm.ptr -> !llvm.ptr
+ llvm.store %4, %5 : i32, !llvm.ptr
+ omp.terminator
+ }
+ llvm.return
}
- llvm.return
}
// CHECK: @.offload_sizes = private unnamed_addr constant [2 x i64] [i64 8, i64 0]
@@ -387,25 +401,27 @@ llvm.func @_QPopenmp_target_use_dev_addr_nomap() {
// -----
-llvm.func @_QPopenmp_target_use_dev_both() {
- %0 = llvm.mlir.constant(1 : i64) : i64
- %a = llvm.alloca %0 x !llvm.ptr : (i64) -> !llvm.ptr
- %1 = llvm.mlir.constant(1 : i64) : i64
- %b = llvm.alloca %0 x !llvm.ptr : (i64) -> !llvm.ptr
- %map = omp.map.info var_ptr(%a : !llvm.ptr, !llvm.ptr) map_clauses(tofrom) capture(ByRef) -> !llvm.ptr {name = ""}
- %map1...
[truncated]
|
In the 3 mlir tests I added a module with an "omp.target_triples" attribute to make sure that the expected code is still generated. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you Krzysztof!
When no offload targets are specified flang will ignore "target" constructs, but not "target data" constructs.
To point something out, this is not exactly what happens. Host implementations of target constructs will still be produced regardless of whether offload targets were specified. The difference when they aren't is that no kernel launches (i.e. __tgt_target_kernel
plus kernel argument structure setup) are produced in the host, but rather host fallback code is branched to directly.
Having said that, I can't imagine why we would need to keep "target data"-related runtime calls when there's no offloading target, so the approach sounds good to me. Maybe it would be good for someone more familiar with current map support to confirm the removal of __tgt_target_data_{begin,end}_mapper
is fine when running only on the host (CC: @agozillon, @TIFitis).
if (!isOffloadEntry) | ||
return success(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: I think it would be more readable if we checked this condition and exited early before this TypeSwitch
, rather than checking on each case and then later after it was executed.
Also, shouldn't we still process the body of omp.target_data
? That code still should run on the host, AFAIU.
if (!isOffloadEntry && !isa<omp::TargetDataOp>(op))
return success();
LogicalResult result = llvm::TypeSwitch...;
if (failed(result))
return failure();
using InsertPointTy = ...
Well... That was the intent here. I dug up an old patch I had, so it's possible that something is missing. I'll take a closer look at it. |
When no offload targets are specified flang will ignore "target" constructs, but not "target data" constructs. This patch makes the behavior consistent across all offload-related operations.
While ignoring "target" may produce semantically incorrect code, it may still be a useful debugging tool.