Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[flang] Use fir.declare/fir.dummy_scope for TBAA tags attachments. #92472

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

vzakhari
Copy link
Contributor

With MLIR inlining (e.g. flang-new -mmlir -inline-all=true)
the current TBAA tags attachment is suboptimal, because
we may lose information about the callee's dummy arguments
(by bypassing fir.declare in AliasAnalysis::getSource).
This is a conservative first step to improve the situation.
This patch makes AddAliasTagsPass to account for fir.dummy_scope
hierarchy after MLIR inlining and use it to place the TBAA tags
into TBAA trees corresponding to different function scopes.
The pass uses special mode of AliasAnalysis to find the instantiation
point of a Fortran variable (a [hl]fir.decalre) when searching
for the source of a memory reference. In this mode, AliasAnalysis
will always stop at fir.declare operations that have dummy_scope
operands - there should not be a reason to past throught it
for the purpose of TBAA tags attachment.

With MLIR inlining (e.g. `flang-new -mmlir -inline-all=true`)
the current TBAA tags attachment is suboptimal, because
we may lose information about the callee's dummy arguments
(by bypassing fir.declare in AliasAnalysis::getSource).
This is a conservative first step to improve the situation.
This patch makes AddAliasTagsPass to account for fir.dummy_scope
hierarchy after MLIR inlining and use it to place the TBAA tags
into TBAA trees corresponding to different function scopes.
The pass uses special mode of AliasAnalysis to find the instantiation
point of a Fortran variable (a [hl]fir.decalre) when searching
for the source of a memory reference. In this mode, AliasAnalysis
will always stop at fir.declare operations that have dummy_scope
operands - there should not be a reason to past throught it
for the purpose of TBAA tags attachment.
@llvmbot llvmbot added flang Flang issues not falling into any other category flang:fir-hlfir flang:codegen labels May 16, 2024
@llvmbot
Copy link
Collaborator

llvmbot commented May 16, 2024

@llvm/pr-subscribers-flang-fir-hlfir

@llvm/pr-subscribers-flang-codegen

Author: Slava Zakharin (vzakhari)

Changes

With MLIR inlining (e.g. flang-new -mmlir -inline-all=true)
the current TBAA tags attachment is suboptimal, because
we may lose information about the callee's dummy arguments
(by bypassing fir.declare in AliasAnalysis::getSource).
This is a conservative first step to improve the situation.
This patch makes AddAliasTagsPass to account for fir.dummy_scope
hierarchy after MLIR inlining and use it to place the TBAA tags
into TBAA trees corresponding to different function scopes.
The pass uses special mode of AliasAnalysis to find the instantiation
point of a Fortran variable (a [hl]fir.decalre) when searching
for the source of a memory reference. In this mode, AliasAnalysis
will always stop at fir.declare operations that have dummy_scope
operands - there should not be a reason to past throught it
for the purpose of TBAA tags attachment.


Patch is 47.47 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/92472.diff

8 Files Affected:

  • (modified) flang/include/flang/Optimizer/Analysis/AliasAnalysis.h (+15-1)
  • (modified) flang/include/flang/Optimizer/Analysis/TBAAForest.h (+14)
  • (modified) flang/lib/Optimizer/Analysis/AliasAnalysis.cpp (+39-4)
  • (modified) flang/lib/Optimizer/CodeGen/TBAABuilder.cpp (+32)
  • (modified) flang/lib/Optimizer/Transforms/AddAliasTags.cpp (+94-7)
  • (added) flang/test/Transforms/tbaa-with-dummy-scope.fir (+238)
  • (modified) flang/test/Transforms/tbaa.fir (+22-14)
  • (modified) flang/test/Transforms/tbaa2.fir (+8-6)
diff --git a/flang/include/flang/Optimizer/Analysis/AliasAnalysis.h b/flang/include/flang/Optimizer/Analysis/AliasAnalysis.h
index 40fd1705115a0..8cb6e92e41d97 100644
--- a/flang/include/flang/Optimizer/Analysis/AliasAnalysis.h
+++ b/flang/include/flang/Optimizer/Analysis/AliasAnalysis.h
@@ -120,6 +120,17 @@ struct AliasAnalysis {
       /// Source definition of a value.
       SourceUnion u;
 
+      /// A value definition denoting the place where the corresponding
+      /// source variable was instantiated by the front-end.
+      /// Currently, it is the result of [hl]fir.declare of the source,
+      /// if we can reach it.
+      /// It helps to identify the scope where the corresponding variable
+      /// was defined in the original Fortran source, e.g. when MLIR
+      /// inlining happens an inlined fir.declare of the callee's
+      /// dummy argument identifies the scope where the source
+      /// may be treated as a dummy argument.
+      mlir::Value instantiationPoint;
+
       /// Whether the source was reached following data or box reference
       bool isData{false};
     };
@@ -168,7 +179,10 @@ struct AliasAnalysis {
   mlir::ModRefResult getModRef(mlir::Operation *op, mlir::Value location);
 
   /// Return the memory source of a value.
-  Source getSource(mlir::Value);
+  /// If getInstantiationPoint is true, the search for the source
+  /// will stop at [hl]fir.declare if it represents a dummy
+  /// argument declaration (i.e. it has the dummy_scope operand).
+  Source getSource(mlir::Value, bool getInstantiationPoint = false);
 };
 
 inline bool operator==(const AliasAnalysis::Source::SourceOrigin &lhs,
diff --git a/flang/include/flang/Optimizer/Analysis/TBAAForest.h b/flang/include/flang/Optimizer/Analysis/TBAAForest.h
index 619ed4939c51c..f95fcffd96e93 100644
--- a/flang/include/flang/Optimizer/Analysis/TBAAForest.h
+++ b/flang/include/flang/Optimizer/Analysis/TBAAForest.h
@@ -92,6 +92,20 @@ class TBAAForrest {
     }
     return getFuncTree(func.getSymNameAttr());
   }
+  // Returns the TBAA tree associated with the scope enclosed
+  // within the given function. With MLIR inlining, there may
+  // be multiple scopes within a single function. It is the caller's
+  // responsibility to provide unique name for the scope.
+  // If the scope string is empty, returns the TBAA tree for the
+  // "root" scope of the given function.
+  inline const TBAATree &getFuncTreeWithScope(mlir::func::FuncOp func,
+                                              llvm::StringRef scope) {
+    mlir::StringAttr name = func.getSymNameAttr();
+    if (!scope.empty())
+      name = mlir::StringAttr::get(name.getContext(),
+                                   llvm::Twine(name) + " - " + scope);
+    return getFuncTree(name);
+  }
 
 private:
   const TBAATree &getFuncTree(mlir::StringAttr symName) {
diff --git a/flang/lib/Optimizer/Analysis/AliasAnalysis.cpp b/flang/lib/Optimizer/Analysis/AliasAnalysis.cpp
index 9d0d706a85c5e..354291cc1d84c 100644
--- a/flang/lib/Optimizer/Analysis/AliasAnalysis.cpp
+++ b/flang/lib/Optimizer/Analysis/AliasAnalysis.cpp
@@ -30,8 +30,15 @@ using namespace mlir;
 
 static bool isDummyArgument(mlir::Value v) {
   auto blockArg{mlir::dyn_cast<mlir::BlockArgument>(v)};
-  if (!blockArg)
+  if (!blockArg) {
+    auto defOp = v.getDefiningOp();
+    if (defOp) {
+      if (auto declareOp = mlir::dyn_cast<fir::DeclareOp>(defOp))
+        if (declareOp.getDummyScope())
+          return true;
+    }
     return false;
+  }
 
   auto *owner{blockArg.getOwner()};
   return owner->isEntryBlock() &&
@@ -105,6 +112,9 @@ bool AliasAnalysis::Source::isRecordWithPointerComponent() const {
 }
 
 AliasResult AliasAnalysis::alias(Value lhs, Value rhs) {
+  // TODO: alias() has to be aware of the function scopes.
+  // After MLIR inlining, the current implementation may
+  // not recognize non-aliasing entities.
   auto lhsSrc = getSource(lhs);
   auto rhsSrc = getSource(rhs);
   bool approximateSource = lhsSrc.approximateSource || rhsSrc.approximateSource;
@@ -242,7 +252,8 @@ getAttrsFromVariable(fir::FortranVariableOpInterface var) {
   return attrs;
 }
 
-AliasAnalysis::Source AliasAnalysis::getSource(mlir::Value v) {
+AliasAnalysis::Source AliasAnalysis::getSource(mlir::Value v,
+                                               bool getInstantiationPoint) {
   auto *defOp = v.getDefiningOp();
   SourceKind type{SourceKind::Unknown};
   mlir::Type ty;
@@ -254,6 +265,7 @@ AliasAnalysis::Source AliasAnalysis::getSource(mlir::Value v) {
   bool followingData = !isBoxRef;
   mlir::SymbolRefAttr global;
   Source::Attributes attributes;
+  mlir::Value instantiationPoint;
   while (defOp && !breakFromLoop) {
     ty = defOp->getResultTypes()[0];
     llvm::TypeSwitch<Operation *>(defOp)
@@ -344,6 +356,21 @@ AliasAnalysis::Source AliasAnalysis::getSource(mlir::Value v) {
             breakFromLoop = true;
             return;
           }
+          if (getInstantiationPoint) {
+            // Fetch only the innermost instantiation point.
+            if (!instantiationPoint)
+              instantiationPoint = op->getResult(0);
+
+            if (op.getDummyScope()) {
+              // Do not track past DeclareOp that has the dummy_scope
+              // operand. This DeclareOp is known to represent
+              // a dummy argument for some runtime instantiation
+              // of a procedure.
+              type = SourceKind::Argument;
+              breakFromLoop = true;
+              return;
+            }
+          }
           // TODO: Look for the fortran attributes present on the operation
           // Track further through the operand
           v = op.getMemref();
@@ -382,9 +409,17 @@ AliasAnalysis::Source AliasAnalysis::getSource(mlir::Value v) {
     }
 
   if (type == SourceKind::Global) {
-    return {{global, followingData}, type, ty, attributes, approximateSource};
+    return {{global, instantiationPoint, followingData},
+            type,
+            ty,
+            attributes,
+            approximateSource};
   }
-  return {{v, followingData}, type, ty, attributes, approximateSource};
+  return {{v, instantiationPoint, followingData},
+          type,
+          ty,
+          attributes,
+          approximateSource};
 }
 
 } // namespace fir
diff --git a/flang/lib/Optimizer/CodeGen/TBAABuilder.cpp b/flang/lib/Optimizer/CodeGen/TBAABuilder.cpp
index a21384e8d5946..7a3f3b5b65d2d 100644
--- a/flang/lib/Optimizer/CodeGen/TBAABuilder.cpp
+++ b/flang/lib/Optimizer/CodeGen/TBAABuilder.cpp
@@ -52,6 +52,38 @@ TBAABuilder::TBAABuilder(MLIRContext *context, bool applyTBAA,
                          bool forceUnifiedTree)
     : enableTBAA(applyTBAA && !disableTBAA),
       trees(/*separatePerFunction=*/perFunctionTBAATrees && !forceUnifiedTree) {
+  // TODO: the TBAA tags created here are rooted in the root scope
+  // of the enclosing function. This does not work best with MLIR inlining.
+  // A better approach is to root them according to the scopes they belong to
+  // and that were used by AddAliasTagsPass to create TBAA tags before
+  // the CodeGen. For example:
+  //   subroutine caller(a, b, ptr)
+  //     real, target :: a(:), b(:)
+  //     integer, pointer :: ptr(:)
+  //     call callee(a, b, ptr)
+  //   end
+  //   subroutine callee(a, b, ptr)
+  //     real :: a(:), b(:)
+  //     integer, pointer :: ptr(:)
+  //     do i=...
+  //       a(ptr(i)) = b(ptr(i))
+  //     end do
+  //   end
+  //
+  // When callee is inlined, the dummy arguments 'a' and 'b' will
+  // be rooted in TBAA tree corresponding to the `call callee` call site,
+  // saying that the references to 'a' and 'b' cannot alias each other.
+  // These tags will be created by AddAliasTagsPass, but it will not be able
+  // to create any tags for 'ptr' references.
+  // During the CodeGen, we create 'any data access' tags for the
+  // 'ptr' acceses. If they are rooted within the root scope of `caller`,
+  // they end up in a different TBAA tree with the 'a' and 'b' access
+  // tags, so 'ptr', 'a' and 'b' references MayAlias. Moreover,
+  // the box access of 'ptr' will also be in a different TBAA tree
+  // with 'a' and 'b' tags, meaning they can also alias.
+  // This will prevent LLVM vectorization even with memory conflict checks.
+  // It seems that we'd better move all TBAA tags assignment to
+  // AddAliasTagsPass, which can at least rely on the dummy arguments scopes.
   if (!enableTBAA)
     return;
 }
diff --git a/flang/lib/Optimizer/Transforms/AddAliasTags.cpp b/flang/lib/Optimizer/Transforms/AddAliasTags.cpp
index caece93840043..3cd6b5057b6d1 100644
--- a/flang/lib/Optimizer/Transforms/AddAliasTags.cpp
+++ b/flang/lib/Optimizer/Transforms/AddAliasTags.cpp
@@ -17,9 +17,11 @@
 #include "flang/Optimizer/Dialect/FIRDialect.h"
 #include "flang/Optimizer/Dialect/FirAliasTagOpInterface.h"
 #include "flang/Optimizer/Transforms/Passes.h"
+#include "mlir/IR/Dominance.h"
 #include "mlir/Pass/Pass.h"
 #include "llvm/ADT/DenseMap.h"
 #include "llvm/ADT/StringRef.h"
+#include "llvm/ADT/Twine.h"
 #include "llvm/Support/CommandLine.h"
 #include "llvm/Support/Debug.h"
 #include "llvm/Support/raw_ostream.h"
@@ -54,10 +56,12 @@ namespace {
 /// Shared state per-module
 class PassState {
 public:
+  PassState(mlir::DominanceInfo &domInfo) : domInfo(domInfo) {}
   /// memoised call to fir::AliasAnalysis::getSource
   inline const fir::AliasAnalysis::Source &getSource(mlir::Value value) {
     if (!analysisCache.contains(value))
-      analysisCache.insert({value, analysis.getSource(value)});
+      analysisCache.insert(
+          {value, analysis.getSource(value, /*getInstantiationPoint=*/true)});
     return analysisCache[value];
   }
 
@@ -65,13 +69,72 @@ class PassState {
   inline const fir::TBAATree &getFuncTree(mlir::func::FuncOp func) {
     return forrest[func];
   }
+  inline const fir::TBAATree &getFuncTreeWithScope(mlir::func::FuncOp func,
+                                                   fir::DummyScopeOp scope) {
+    auto &scopeMap = scopeNames.at(func);
+    return forrest.getFuncTreeWithScope(func, scopeMap.lookup(scope));
+  }
+
+  void processFunctionScopes(mlir::func::FuncOp func);
+  fir::DummyScopeOp getDeclarationScope(fir::DeclareOp declareOp);
 
 private:
+  mlir::DominanceInfo &domInfo;
   fir::AliasAnalysis analysis;
   llvm::DenseMap<mlir::Value, fir::AliasAnalysis::Source> analysisCache;
   fir::TBAAForrest forrest;
+  // Unique names for fir.dummy_scope operations within
+  // the given function.
+  llvm::DenseMap<mlir::func::FuncOp,
+                 llvm::DenseMap<fir::DummyScopeOp, std::string>>
+      scopeNames;
+  // A map providing a vector of fir.dummy_scope operations
+  // for the given function. The vectors are sorted according
+  // to the dominance information.
+  llvm::DenseMap<mlir::func::FuncOp, llvm::SmallVector<fir::DummyScopeOp, 16>>
+      sortedScopeOperations;
 };
 
+// Process fir.dummy_scope operations in the given func:
+// sort them according to the dominance information, and
+// associate a unique (within the current function) scope name
+// with each of them.
+void PassState::processFunctionScopes(mlir::func::FuncOp func) {
+  if (scopeNames.contains(func))
+    return;
+
+  auto &scopeMap = scopeNames.getOrInsertDefault(func);
+  auto &scopeOps = sortedScopeOperations.getOrInsertDefault(func);
+  func.walk([&](fir::DummyScopeOp op) { scopeOps.push_back(op); });
+  llvm::stable_sort(scopeOps, [&](const fir::DummyScopeOp &op1,
+                                  const fir::DummyScopeOp &op2) {
+    return domInfo.properlyDominates(&*op1, &*op2);
+  });
+  unsigned scopeId = 0;
+  for (auto scope : scopeOps) {
+    if (scopeId != 0) {
+      std::string name = (llvm::Twine("Scope ") + llvm::Twine(scopeId)).str();
+      LLVM_DEBUG(llvm::dbgs() << "Creating scope '" << name << "':\n"
+                              << scope << "\n");
+      scopeMap.insert({scope, std::move(name)});
+    }
+    ++scopeId;
+  }
+}
+
+// For the given fir.declare returns the dominating fir.dummy_scope
+// operation.
+fir::DummyScopeOp PassState::getDeclarationScope(fir::DeclareOp declareOp) {
+  auto func = declareOp->getParentOfType<mlir::func::FuncOp>();
+  assert(func && "fir.declare does not have parent func.func");
+  auto &scopeOps = sortedScopeOperations.at(func);
+  for (auto II = scopeOps.rbegin(), IE = scopeOps.rend(); II != IE; ++II) {
+    if (domInfo.dominates(&**II, &*declareOp))
+      return *II;
+  }
+  return nullptr;
+}
+
 class AddAliasTagsPass : public fir::impl::AddAliasTagsBase<AddAliasTagsPass> {
 public:
   void runOnOperation() override;
@@ -85,6 +148,9 @@ class AddAliasTagsPass : public fir::impl::AddAliasTagsBase<AddAliasTagsPass> {
 } // namespace
 
 static fir::DeclareOp getDeclareOp(mlir::Value arg) {
+  if (auto declare =
+          mlir::dyn_cast_or_null<fir::DeclareOp>(arg.getDefiningOp()))
+    return declare;
   for (mlir::Operation *use : arg.getUsers())
     if (fir::DeclareOp declare = mlir::dyn_cast<fir::DeclareOp>(use))
       return declare;
@@ -94,7 +160,7 @@ static fir::DeclareOp getDeclareOp(mlir::Value arg) {
 /// Get the name of a function argument using the "fir.bindc_name" attribute,
 /// or ""
 static std::string getFuncArgName(mlir::Value arg) {
-  // first try getting the name from the hlfir.declare
+  // first try getting the name from the fir.declare
   if (fir::DeclareOp declare = getDeclareOp(arg))
     return declare.getUniqName().str();
 
@@ -139,6 +205,23 @@ void AddAliasTagsPass::runOnAliasInterface(fir::FirAliasTagOpInterface op,
     return;
   }
 
+  // Process the scopes, if not processed yet.
+  state.processFunctionScopes(func);
+
+  fir::DummyScopeOp scopeOp;
+  if (auto declVal = source.origin.instantiationPoint) {
+    // If the source is a dummy argument within some fir.dummy_scope,
+    // then find the corresponding innermost scope to be used for finding
+    // the right TBAA tree.
+    auto declareOp =
+        mlir::dyn_cast_or_null<fir::DeclareOp>(declVal.getDefiningOp());
+    assert(declareOp && "Instantiation point must be fir.declare");
+    if (auto dummyScope = declareOp.getDummyScope())
+      scopeOp = mlir::cast<fir::DummyScopeOp>(dummyScope.getDefiningOp());
+    if (!scopeOp)
+      scopeOp = state.getDeclarationScope(declareOp);
+  }
+
   mlir::LLVM::TBAATagAttr tag;
   // TBAA for dummy arguments
   if (enableDummyArgs &&
@@ -147,7 +230,8 @@ void AddAliasTagsPass::runOnAliasInterface(fir::FirAliasTagOpInterface op,
                << "Found reference to dummy argument at " << *op << "\n");
     std::string name = getFuncArgName(source.origin.u.get<mlir::Value>());
     if (!name.empty())
-      tag = state.getFuncTree(func).dummyArgDataTree.getTag(name);
+      tag = state.getFuncTreeWithScope(func, scopeOp)
+                .dummyArgDataTree.getTag(name);
     else
       LLVM_DEBUG(llvm::dbgs().indent(2)
                  << "WARN: couldn't find a name for dummy argument " << *op
@@ -161,7 +245,7 @@ void AddAliasTagsPass::runOnAliasInterface(fir::FirAliasTagOpInterface op,
     const char *name = glbl.getRootReference().data();
     LLVM_DEBUG(llvm::dbgs().indent(2) << "Found reference to global " << name
                                       << " at " << *op << "\n");
-    tag = state.getFuncTree(func).globalDataTree.getTag(name);
+    tag = state.getFuncTreeWithScope(func, scopeOp).globalDataTree.getTag(name);
 
     // TBAA for SourceKind::Direct
   } else if (enableDirect &&
@@ -172,7 +256,8 @@ void AddAliasTagsPass::runOnAliasInterface(fir::FirAliasTagOpInterface op,
       const char *name = glbl.getRootReference().data();
       LLVM_DEBUG(llvm::dbgs().indent(2) << "Found reference to direct " << name
                                         << " at " << *op << "\n");
-      tag = state.getFuncTree(func).directDataTree.getTag(name);
+      tag =
+          state.getFuncTreeWithScope(func, scopeOp).directDataTree.getTag(name);
     } else {
       // SourceKind::Direct is likely to be extended to cases which are not a
       // SymbolRefAttr in the future
@@ -193,7 +278,8 @@ void AddAliasTagsPass::runOnAliasInterface(fir::FirAliasTagOpInterface op,
     if (name) {
       LLVM_DEBUG(llvm::dbgs().indent(2) << "Found reference to allocation "
                                         << name << " at " << *op << "\n");
-      tag = state.getFuncTree(func).allocatedDataTree.getTag(*name);
+      tag = state.getFuncTreeWithScope(func, scopeOp)
+                .allocatedDataTree.getTag(*name);
     } else {
       LLVM_DEBUG(llvm::dbgs().indent(2)
                  << "WARN: couldn't find a name for allocation " << *op
@@ -219,7 +305,8 @@ void AddAliasTagsPass::runOnOperation() {
   // Instead this pass stores state per mlir::ModuleOp (which is what MLIR
   // thinks the pass operates on), then the real work of the pass is done in
   // runOnAliasInterface
-  PassState state;
+  auto &domInfo = getAnalysis<mlir::DominanceInfo>();
+  PassState state(domInfo);
 
   mlir::ModuleOp mod = getOperation();
   mod.walk(
diff --git a/flang/test/Transforms/tbaa-with-dummy-scope.fir b/flang/test/Transforms/tbaa-with-dummy-scope.fir
new file mode 100644
index 0000000000000..738692814cde5
--- /dev/null
+++ b/flang/test/Transforms/tbaa-with-dummy-scope.fir
@@ -0,0 +1,238 @@
+// RUN: fir-opt --fir-add-alias-tags --split-input-file %s | FileCheck %s
+
+// subroutine test(x, y)
+//   real, target :: x, y
+//   x = y ! the load/store do not have TBAA due to TARGET
+//   call inner(x, y) ! the inlined load/store go to Scope 1
+//   call inner(x, y) ! the inlined load/store go to Scope 2
+// contains
+//   subroutine inner(x, y)
+//     real :: x, y
+//     x = y
+//   end subroutine inner
+// end subroutine test
+
+// CHECK: #[[$ATTR_0:.+]] = #llvm.tbaa_root<id = "Flang function root test1 - Scope 1">
+// CHECK: #[[$ATTR_1:.+]] = #llvm.tbaa_root<id = "Flang function root test1 - Scope 2">
+// CHECK: #[[$ATTR_2:.+]] = #llvm.tbaa_type_desc<id = "any access", members = {<#tbaa_root, 0>}>
+// CHECK: #[[$ATTR_3:.+]] = #llvm.tbaa_type_desc<id = "any access", members = {<#tbaa_root1, 0>}>
+// CHECK: #[[$ATTR_4:.+]] = #llvm.tbaa_type_desc<id = "any data access", members = {<#tbaa_type_desc, 0>}>
+// CHECK: #[[$ATTR_5:.+]] = #llvm.tbaa_type_desc<id = "any data access", members = {<#tbaa_type_desc1, 0>}>
+// CHECK: #[[$ATTR_6:.+]] = #llvm.tbaa_type_desc<id = "dummy arg data", members = {<#tbaa_type_desc2, 0>}>
+// CHECK: #[[$ATTR_7:.+]] = #llvm.tbaa_type_desc<id = "dummy arg data", members = {<#tbaa_type_desc3, 0>}>
+// CHECK: #[[$ATTR_8:.+]] = #llvm.tbaa_type_desc<id = "dummy arg data/_QFtestFinnerEy", members = {<#tbaa_type_desc4, 0>}>
+// CHECK: #[[$ATTR_9:.+]] = #llvm.tbaa_type_desc<id = "dummy arg data/_QFtestFinnerEx", members = {<#tbaa_type_desc4, 0>}>
+// CHECK: #[[$ATTR_10:.+]] = #llvm.tbaa_type_desc<id = "dummy arg data/_QFtestFinnerEy", members = {<#tbaa_type_desc5, 0>}>
+// CHECK: #[[$ATTR_11:.+]] = #llvm.tbaa_type_desc<id = "dummy arg data/_QFtestFinnerEx", members = {<#tbaa_type_desc5, 0>}>
+// CHECK: #[[$ATTR_12:.+]] = #llvm.tbaa_tag<base_type = #tbaa_type_desc6, access_type = #tbaa_type_desc6, offset = 0>
+// CHECK: #[[$ATTR_13:.+]] = #llvm.tbaa_tag<base_type = #tbaa_type_desc7, access_type = #tbaa_type_desc7, offset = 0>
+// CHECK: #[[$ATTR_14:.+]] = #llvm.tbaa_tag<base_type = #tbaa_type_desc8, access_type = #tbaa_type_desc8, offset = 0>
+// CHECK: #[[$ATTR_15:.+]] = #llvm.tbaa_tag<base_type = #tbaa_type_desc9, access_type = #tbaa_type_desc9, offset = 0>
+// CHECK:   func.func @test1(
+// CHECK:           %[[VAL_5:.*]] = fir.load %{{.*}} : !fir.ref<f32>
+// CHECK:           fir.store %{{.*}} : !fir.ref<f32>
+// CHECK:           %[[VAL_6:.*]] = fir.dummy_scope : !fir.dscope
+// CHECK:           %[[VAL_9:.*]] = fir.load %{{.*}} {tbaa = [#[[$ATTR_12]]]} : !fir.ref<f32>
+// CHECK:           fir.store %{{.*}} {tbaa = [#[[$ATTR_13]]]} : !fir.ref<f32>
+// CHECK:           %[[VAL_10:.*]] = fir.dummy_scope : !fir.dscope
+// CHECK:           %[[VAL_13:.*]] = fir.load %{{.*}} {tbaa = [#[[$ATTR_14]]]} : !fir.ref<f32>
+// CHECK:           fir.store %{{.*}} {tbaa = [#[[$ATTR_15]]]} : !fir.ref<f32>
+func.func @test1(%arg0: !fir.ref<f32> {fir.bindc_name = "x", fir.target}, %arg1: !fir.ref<f32> {fir.bindc_name = "y", fir.target}) {
+  %scope_out = fir.dummy_scope : !fir.dscope
+  %0 = fir.declare %arg0 dummy_scope %scope_out {fortran_attrs = #fir.var_attrs<target>, uniq_name = "_QFtestEx"} : (!f...
[truncated]

JumpMasterJJ added a commit that referenced this pull request May 17, 2024
…ion (#92571)

This is same as #90578 with an
added fix. This PR updated tests of etime intrinsic due to Lowering
changes for assigning dummy_scope to hlfir.declare. Referring to
#92472 and
#90989
@vzakhari
Copy link
Contributor Author

Friendly ping...

Copy link
Contributor

@tblah tblah left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this. Sorry about the delay, I somehow forgot to press "Submit review"

if (defOp) {
if (auto declareOp = mlir::dyn_cast<fir::DeclareOp>(defOp))
if (declareOp.getDummyScope())
return true;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will work for TBAA because of the separate trees.

I'm worried users of fir::AliasAnalysis::alias might get confused here. It will see a dummy argument and then say it doesn't alias, but that dummy argument will definitely alias with something in the function into which it was inlined.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure I understand the concern. AliasAnalysis::alias works on two memory references that are being accessed at a certain point in MLIR - from my point of view, the reply is only valid for these two memory references at this point in MLIR.

Copy link
Contributor

@tblah tblah May 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Slightly modifying the sample from the test so that %arg0 is now a local allocation:

func.func @test1(%arg1: !fir.ref<f32> {fir.bindc_name = "y", fir.target}) {
  %alloca = fir.alloca i32
  %scope_out = fir.dummy_scope : !fir.dscope
  %0 = fir.declare %alloca {uniq_name = "_QFtestEx"} : (!fir.ref<f32>, !fir.dscope) -> !fir.ref<f32>
  %1 = fir.declare %arg1 dummy_scope %scope_out {fortran_attrs = #fir.var_attrs<target>, uniq_name = "_QFtestEy"} : (!fir.ref<f32>, !fir.dscope) -> !fir.ref<f32>
  %2 = fir.load %1 : !fir.ref<f32>
  fir.store %2 to %0 : !fir.ref<f32>
  %scope_in1 = fir.dummy_scope : !fir.dscope
  %3 = fir.declare %0 dummy_scope %scope_in1 {uniq_name = "_QFtestFinnerEx"} : (!fir.ref<f32>, !fir.dscope) -> !fir.ref<f32>
  %4 = fir.declare %1 dummy_scope %scope_in1 {uniq_name = "_QFtestFinnerEy"} : (!fir.ref<f32>, !fir.dscope) -> !fir.ref<f32>
  %5 = fir.load %4 : !fir.ref<f32>
  fir.store %5 to %3 : !fir.ref<f32>
  %scope_in2 = fir.dummy_scope : !fir.dscope
  %6 = fir.declare %0 dummy_scope %scope_in2 {uniq_name = "_QFtestFinnerEx"} : (!fir.ref<f32>, !fir.dscope) -> !fir.ref<f32>
  %7 = fir.declare %1 dummy_scope %scope_in2 {uniq_name = "_QFtestFinnerEy"} : (!fir.ref<f32>, !fir.dscope) -> !fir.ref<f32>
  %8 = fir.load %7 : !fir.ref<f32>
  fir.store %8 to %6 : !fir.ref<f32>
  return
}

%3 and %0 should alias. But I think the current logic will determine that %3 is a dummy argument due to its alias scope. Dummy arguments do not alias with local allocations and so we would incorrectly assume that they do not alias.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahh hang on. getInstantiationPoint will be false and so getSource will track through the fir.declare in my example.

getSource does use isDummyArgument in other places though, and I am worried about a similar confusion. For example, if an unknown operation leads to an unknown source, then the above mechanism could occur (I think maybe this could happen when using mlir inlining inside of an openmp or openacc container operation).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The usage of isDummyArgument at line 307 allows propagating the source search past the indirection (the load). My change does allow propagating past fir.declare (when getInstantiationPoint is false) in this case, but I cannot figure out an example where this would produce incorrect results in the end.

The isDummyArgument usage at the end of getSource should also not make a difference when getInstantionPoint is false, because we will just pass through any fir.declare we meet on the path.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am worried about L372.

If for some reason we have an unknown source (e.g. due to a block argument or some unsupported operation), we will return that it is an argument if isDummyArgument returns true. That would then follow the logic I outlined above in fir::AliasAnalysis::alias leading to strong assumptions that it doesn't alias with other things in the same function.

Maybe isDummyArgument needs something like getInstantiationPoint to control whether inlined dummy arguments are treated as dummy arguments. Alternatively fir::AliasAnalysis::alias needs to understand dummy scopes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for being patient with me, Tom!

Please note that isDummyArgument at L374 can only return true that it was not returning before my changes if v is fir.declare with a dummy_scope. But that is impossible, because we recognize fir.declare as a pass-through operation, so if we meet fir.declare during the walk, v will be set to its memref that is not fir.declare. This is true when getInstatiationPoint is false, i.e. for the default/normal usage of FIR alias analysis.

When getInstantiationPoint is true, we set the SourceKind::Argument when we meet fir.declare with a dummy_scope and exit right away. I guess it does make sense for the users of the alias analysis to use isDummyArgument in this case, since the source type is already known. I just wanted to make sure isDummyArgument returns consistent information in this case.

Do you prefer removing the new code in isDummyArgument?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your explanation and for baring with me while I figure out that this is correct. I think this is good as it is.

Comment on lines +85 to +86
// It seems that we'd better move all TBAA tags assignment to
// AddAliasTagsPass, which can at least rely on the dummy arguments scopes.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem is that box loads and stores are implicit in FIR. There is no operation to tag until after codegen.

I would be happy to see an end to this so that most uses of !fir.box<> become !fir.ref<!fir.box<>>. It makes IR a bit more verbose but I think it would make it a lot more obvious at a glance where we are generating unnecessary pointer nesting.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the box loads/stores for SSA boxes are implicit. Since the loads/stores only make problems after lowering to LLVM dialect, we can try to attack it from multiple directions:

  • A local box load/store may be well disambiguated by LLVM alias analysis (without any TBAA) if we make sure that LLVM sees that its address is not escaping. For example, this means that when an SSA box is passed to a function call, the resulting LLVM pointer argument has nocapture attribute.
  • The SSA boxes that are function arguments, are converted to pointer function arguments, but these pointer arguments, by construction, are know not to alias anything within the function scope, so the corresponding function arguments may all have noalias attribute.

Just these two approaches may resolve a lot of LLVM aliasing issues for the SSA boxes. If this is not enough we can also think about attaching TBAA tags to operations that produce unique SSA boxes (e.g. embox/rebox) and read from SSA boxes (e.g. box_addr), and then propagate these tags to the corresponding loads/stores in the CodeGen. I am not in favor of this, because we will have to add TBAA attribute to many more FIR operations.

I think nocapture and noalias should work for many cases of SSA boxes, and for the explicit loads/stores of the boxes we can attach TBAA descriptor member tags in AddAliasTagsPass (with proper scoping) and propagate them in the CodeGen.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your idea sounds good to me

@Renaud-K Renaud-K requested a review from jdenny-ornl May 22, 2024 19:11
Copy link
Contributor Author

@vzakhari vzakhari left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the review, Tom!

I think I did not get your concern about isDummyArgument - could you please elaborate?

if (defOp) {
if (auto declareOp = mlir::dyn_cast<fir::DeclareOp>(defOp))
if (declareOp.getDummyScope())
return true;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure I understand the concern. AliasAnalysis::alias works on two memory references that are being accessed at a certain point in MLIR - from my point of view, the reply is only valid for these two memory references at this point in MLIR.

Comment on lines +85 to +86
// It seems that we'd better move all TBAA tags assignment to
// AddAliasTagsPass, which can at least rely on the dummy arguments scopes.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the box loads/stores for SSA boxes are implicit. Since the loads/stores only make problems after lowering to LLVM dialect, we can try to attack it from multiple directions:

  • A local box load/store may be well disambiguated by LLVM alias analysis (without any TBAA) if we make sure that LLVM sees that its address is not escaping. For example, this means that when an SSA box is passed to a function call, the resulting LLVM pointer argument has nocapture attribute.
  • The SSA boxes that are function arguments, are converted to pointer function arguments, but these pointer arguments, by construction, are know not to alias anything within the function scope, so the corresponding function arguments may all have noalias attribute.

Just these two approaches may resolve a lot of LLVM aliasing issues for the SSA boxes. If this is not enough we can also think about attaching TBAA tags to operations that produce unique SSA boxes (e.g. embox/rebox) and read from SSA boxes (e.g. box_addr), and then propagate these tags to the corresponding loads/stores in the CodeGen. I am not in favor of this, because we will have to add TBAA attribute to many more FIR operations.

I think nocapture and noalias should work for many cases of SSA boxes, and for the explicit loads/stores of the boxes we can attach TBAA descriptor member tags in AddAliasTagsPass (with proper scoping) and propagate them in the CodeGen.

Copy link
Contributor

@Renaud-K Renaud-K left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me.
Thanks for addressing other reviewers' comments.

Copy link
Contributor

@tblah tblah left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for all of the explanations. LGTM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
flang:codegen flang:fir-hlfir flang Flang issues not falling into any other category
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants