Skip to content

[analyzer] Introduce the check::BlockEntrance checker callback #140924

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

balazs-benics-sonarsource
Copy link
Contributor

Tranersing the CFG blocks of a function is a fundamental operation. Many C++ constructs can create splits in the control-flow, such as if, for, and similar control structures or ternary expressions, gnu conditionals, gotos, switches and possibly more.

Checkers should be able to get notifications about entering or leaving a CFG block of interest.

Note that in the ExplodedGraph there is always a BlockEntrance ProgramPoint right after the BlockEdge ProgramPoint. I considered naming this callback check::BlockEdge, but then that may leave the observer of the graph puzzled to see BlockEdge points followed more BlockEdge nodes describing the same CFG transition. This confusion could also apply to Bug Report Visitors too.

Because of this, I decided to hook BlockEntrance ProgramPoints instead. The same confusion applies here, but I find this still a better place TBH. There would only appear only one BlockEntrance ProgramPoint in the graph if no checkers modify the state or emit a bug report. Otherwise they modify some GDM (aka. State) thus create a new ExplodedNode with the same BlockEntrance ProgramPoint in the graph.

CPP-6484

Tranersing the CFG blocks of a function is a fundamental operation.
Many C++ constructs can create splits in the control-flow, such as
`if`, `for`, and similar control structures or ternary expressions,
gnu conditionals, gotos, switches and possibly more.

Checkers should be able to get notifications about entering or leaving a
CFG block of interest.

Note that in the ExplodedGraph there is always a BlockEntrance
ProgramPoint right after the BlockEdge ProgramPoint.
I considered naming this callback check::BlockEdge, but then that may
leave the observer of the graph puzzled to see BlockEdge points followed
more BlockEdge nodes describing the same CFG transition.
This confusion could also apply to Bug Report Visitors too.

Because of this, I decided to hook BlockEntrance ProgramPoints instead.
The same confusion applies here, but I find this still a better place
TBH. There would only appear only one BlockEntrance ProgramPoint in the
graph if no checkers modify the state or emit a bug report.
Otherwise they modify some GDM (aka. State) thus create a new
ExplodedNode with the same BlockEntrance ProgramPoint in the graph.

CPP-6484
@llvmbot llvmbot added the clang Clang issues not falling into any other category label May 21, 2025
@llvmbot
Copy link
Member

llvmbot commented May 21, 2025

@llvm/pr-subscribers-clang

@llvm/pr-subscribers-clang-static-analyzer-1

Author: Balázs Benics (balazs-benics-sonarsource)

Changes

Tranersing the CFG blocks of a function is a fundamental operation. Many C++ constructs can create splits in the control-flow, such as if, for, and similar control structures or ternary expressions, gnu conditionals, gotos, switches and possibly more.

Checkers should be able to get notifications about entering or leaving a CFG block of interest.

Note that in the ExplodedGraph there is always a BlockEntrance ProgramPoint right after the BlockEdge ProgramPoint. I considered naming this callback check::BlockEdge, but then that may leave the observer of the graph puzzled to see BlockEdge points followed more BlockEdge nodes describing the same CFG transition. This confusion could also apply to Bug Report Visitors too.

Because of this, I decided to hook BlockEntrance ProgramPoints instead. The same confusion applies here, but I find this still a better place TBH. There would only appear only one BlockEntrance ProgramPoint in the graph if no checkers modify the state or emit a bug report. Otherwise they modify some GDM (aka. State) thus create a new ExplodedNode with the same BlockEntrance ProgramPoint in the graph.

CPP-6484


Patch is 23.05 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/140924.diff

10 Files Affected:

  • (modified) clang/include/clang/StaticAnalyzer/Core/Checker.h (+20)
  • (modified) clang/include/clang/StaticAnalyzer/Core/CheckerManager.h (+13)
  • (modified) clang/include/clang/StaticAnalyzer/Core/PathSensitive/ExprEngine.h (+4)
  • (modified) clang/lib/StaticAnalyzer/Checkers/CheckerDocumentation.cpp (+14-1)
  • (modified) clang/lib/StaticAnalyzer/Core/CheckerManager.cpp (+45-5)
  • (modified) clang/lib/StaticAnalyzer/Core/CoreEngine.cpp (+18-7)
  • (modified) clang/lib/StaticAnalyzer/Core/ExprEngine.cpp (+13)
  • (modified) clang/lib/StaticAnalyzer/Core/ExprEngineC.cpp (+3-3)
  • (added) clang/unittests/StaticAnalyzer/BlockEntranceCallbackTest.cpp (+283)
  • (modified) clang/unittests/StaticAnalyzer/CMakeLists.txt (+1)
diff --git a/clang/include/clang/StaticAnalyzer/Core/Checker.h b/clang/include/clang/StaticAnalyzer/Core/Checker.h
index a54c5bee612f6..1b348dcce5ea7 100644
--- a/clang/include/clang/StaticAnalyzer/Core/Checker.h
+++ b/clang/include/clang/StaticAnalyzer/Core/Checker.h
@@ -221,6 +221,22 @@ class Bind {
   }
 };
 
+class BlockEntrance {
+  template <typename CHECKER>
+  static void _checkBlockEntrance(void *Checker,
+                                  const clang::BlockEntrance &Entrance,
+                                  CheckerContext &C) {
+    ((const CHECKER *)Checker)->checkBlockEntrance(Entrance, C);
+  }
+
+public:
+  template <typename CHECKER>
+  static void _register(CHECKER *checker, CheckerManager &mgr) {
+    mgr._registerForBlockEntrance(CheckerManager::CheckBlockEntranceFunc(
+        checker, _checkBlockEntrance<CHECKER>));
+  }
+};
+
 class EndAnalysis {
   template <typename CHECKER>
   static void _checkEndAnalysis(void *checker, ExplodedGraph &G,
@@ -548,6 +564,8 @@ class CheckerProgramPointTag : public SimpleProgramPointTag {
 template <typename CHECK1, typename... CHECKs>
 class Checker : public CHECK1, public CHECKs..., public CheckerBase {
 public:
+  using BlockEntrance = clang::BlockEntrance;
+
   template <typename CHECKER>
   static void _register(CHECKER *checker, CheckerManager &mgr) {
     CHECK1::_register(checker, mgr);
@@ -558,6 +576,8 @@ class Checker : public CHECK1, public CHECKs..., public CheckerBase {
 template <typename CHECK1>
 class Checker<CHECK1> : public CHECK1, public CheckerBase {
 public:
+  using BlockEntrance = clang::BlockEntrance;
+
   template <typename CHECKER>
   static void _register(CHECKER *checker, CheckerManager &mgr) {
     CHECK1::_register(checker, mgr);
diff --git a/clang/include/clang/StaticAnalyzer/Core/CheckerManager.h b/clang/include/clang/StaticAnalyzer/Core/CheckerManager.h
index 03ffadd346d0b..b5fefdb75401d 100644
--- a/clang/include/clang/StaticAnalyzer/Core/CheckerManager.h
+++ b/clang/include/clang/StaticAnalyzer/Core/CheckerManager.h
@@ -376,6 +376,12 @@ class CheckerManager {
                           const Stmt *S, ExprEngine &Eng,
                           const ProgramPoint &PP);
 
+  /// Run checkers after taking a control flow edge.
+  void runCheckersForBlockEntrance(ExplodedNodeSet &Dst,
+                                   const ExplodedNodeSet &Src,
+                                   const BlockEntrance &Entrance,
+                                   ExprEngine &Eng) const;
+
   /// Run checkers for end of analysis.
   void runCheckersForEndAnalysis(ExplodedGraph &G, BugReporter &BR,
                                  ExprEngine &Eng);
@@ -528,6 +534,9 @@ class CheckerManager {
   using CheckBindFunc =
       CheckerFn<void(SVal location, SVal val, const Stmt *S, CheckerContext &)>;
 
+  using CheckBlockEntranceFunc =
+      CheckerFn<void(const BlockEntrance &, CheckerContext &)>;
+
   using CheckEndAnalysisFunc =
       CheckerFn<void (ExplodedGraph &, BugReporter &, ExprEngine &)>;
 
@@ -589,6 +598,8 @@ class CheckerManager {
 
   void _registerForBind(CheckBindFunc checkfn);
 
+  void _registerForBlockEntrance(CheckBlockEntranceFunc checkfn);
+
   void _registerForEndAnalysis(CheckEndAnalysisFunc checkfn);
 
   void _registerForBeginFunction(CheckBeginFunctionFunc checkfn);
@@ -695,6 +706,8 @@ class CheckerManager {
 
   std::vector<CheckBindFunc> BindCheckers;
 
+  std::vector<CheckBlockEntranceFunc> BlockEntranceCheckers;
+
   std::vector<CheckEndAnalysisFunc> EndAnalysisCheckers;
 
   std::vector<CheckBeginFunctionFunc> BeginFunctionCheckers;
diff --git a/clang/include/clang/StaticAnalyzer/Core/PathSensitive/ExprEngine.h b/clang/include/clang/StaticAnalyzer/Core/PathSensitive/ExprEngine.h
index b8a4dcbc727a6..6370586e218ef 100644
--- a/clang/include/clang/StaticAnalyzer/Core/PathSensitive/ExprEngine.h
+++ b/clang/include/clang/StaticAnalyzer/Core/PathSensitive/ExprEngine.h
@@ -321,6 +321,10 @@ class ExprEngine {
                                NodeBuilderWithSinks &nodeBuilder,
                                ExplodedNode *Pred);
 
+  void runCheckersForBlockEntrance(const NodeBuilderContext &BldCtx,
+                                   const BlockEntrance &Entrance,
+                                   ExplodedNode *Pred, ExplodedNodeSet &Dst);
+
   /// ProcessBranch - Called by CoreEngine. Used to generate successor nodes by
   /// processing the 'effects' of a branch condition. If the branch condition
   /// is a loop condition, IterationsCompletedInLoop is the number of completed
diff --git a/clang/lib/StaticAnalyzer/Checkers/CheckerDocumentation.cpp b/clang/lib/StaticAnalyzer/Checkers/CheckerDocumentation.cpp
index 153a1b1acbfa1..8bfe677693362 100644
--- a/clang/lib/StaticAnalyzer/Checkers/CheckerDocumentation.cpp
+++ b/clang/lib/StaticAnalyzer/Checkers/CheckerDocumentation.cpp
@@ -16,7 +16,6 @@
 #include "clang/StaticAnalyzer/Core/Checker.h"
 #include "clang/StaticAnalyzer/Core/CheckerManager.h"
 #include "clang/StaticAnalyzer/Core/PathSensitive/CheckerContext.h"
-#include "clang/StaticAnalyzer/Core/PathSensitive/ProgramStateTrait.h"
 
 using namespace clang;
 using namespace ento;
@@ -46,6 +45,7 @@ class CheckerDocumentation
           check::EndAnalysis,
           check::EndFunction,
           check::EndOfTranslationUnit,
+          check::BlockEntrance,
           check::Event<ImplicitNullDerefEvent>,
           check::LiveSymbols,
           check::Location,
@@ -166,6 +166,19 @@ class CheckerDocumentation
   /// check::Bind
   void checkBind(SVal Loc, SVal Val, const Stmt *S, CheckerContext &) const {}
 
+  /// Called after a CFG edge is taken within a function.
+  ///
+  /// This callback can be used to obtain information about potential branching
+  /// points or any other constructs that involve traversing a CFG edge.
+  /// Note that when inlining a call, there is no CFG edge between the caller
+  /// and the callee. One will only see the edge between the entry block and
+  /// the body of the function once inlined.
+  ///
+  /// \param E The ProgramPoint that describes the transition.
+  ///
+  /// check::BlockEntrance
+  void checkBlockEntrance(const BlockEntrance &E, CheckerContext &) const {}
+
   /// Called whenever a symbol becomes dead.
   ///
   /// This callback should be used by the checkers to aggressively clean
diff --git a/clang/lib/StaticAnalyzer/Core/CheckerManager.cpp b/clang/lib/StaticAnalyzer/Core/CheckerManager.cpp
index 7ae86f133904b..b77ead60af258 100644
--- a/clang/lib/StaticAnalyzer/Core/CheckerManager.cpp
+++ b/clang/lib/StaticAnalyzer/Core/CheckerManager.cpp
@@ -42,11 +42,11 @@ bool CheckerManager::hasPathSensitiveCheckers() const {
   return IfAnyAreNonEmpty(
       StmtCheckers, PreObjCMessageCheckers, ObjCMessageNilCheckers,
       PostObjCMessageCheckers, PreCallCheckers, PostCallCheckers,
-      LocationCheckers, BindCheckers, EndAnalysisCheckers,
-      BeginFunctionCheckers, EndFunctionCheckers, BranchConditionCheckers,
-      NewAllocatorCheckers, LiveSymbolsCheckers, DeadSymbolsCheckers,
-      RegionChangesCheckers, PointerEscapeCheckers, EvalAssumeCheckers,
-      EvalCallCheckers, EndOfTranslationUnitCheckers);
+      LocationCheckers, BindCheckers, BlockEntranceCheckers,
+      EndAnalysisCheckers, BeginFunctionCheckers, EndFunctionCheckers,
+      BranchConditionCheckers, NewAllocatorCheckers, LiveSymbolsCheckers,
+      DeadSymbolsCheckers, RegionChangesCheckers, PointerEscapeCheckers,
+      EvalAssumeCheckers, EvalCallCheckers, EndOfTranslationUnitCheckers);
 }
 
 void CheckerManager::reportInvalidCheckerOptionValue(
@@ -420,6 +420,42 @@ void CheckerManager::runCheckersForBind(ExplodedNodeSet &Dst,
   expandGraphWithCheckers(C, Dst, Src);
 }
 
+namespace {
+struct CheckBlockEntranceContext {
+  using CheckBlockEntranceFunc = CheckerManager::CheckBlockEntranceFunc;
+  using CheckersTy = std::vector<CheckBlockEntranceFunc>;
+
+  const CheckersTy &Checkers;
+  const BlockEntrance &Entrance;
+  ExprEngine &Eng;
+
+  CheckBlockEntranceContext(const CheckersTy &Checkers,
+                            const BlockEntrance &Entrance, ExprEngine &Eng)
+      : Checkers(Checkers), Entrance(Entrance), Eng(Eng) {}
+
+  auto checkers_begin() const { return Checkers.begin(); }
+  auto checkers_end() const { return Checkers.end(); }
+
+  void runChecker(CheckBlockEntranceFunc CheckFn, NodeBuilder &Bldr,
+                  ExplodedNode *Pred) {
+    llvm::TimeTraceScope TimeScope(
+        checkerScopeName("BlockEntrance", CheckFn.Checker));
+    CheckerContext C(Bldr, Eng, Pred, Entrance.withTag(CheckFn.Checker));
+    CheckFn(Entrance, C);
+  }
+};
+
+} // namespace
+
+void CheckerManager::runCheckersForBlockEntrance(ExplodedNodeSet &Dst,
+                                                 const ExplodedNodeSet &Src,
+                                                 const BlockEntrance &Entrance,
+                                                 ExprEngine &Eng) const {
+  CheckBlockEntranceContext C(BlockEntranceCheckers, Entrance, Eng);
+  llvm::TimeTraceScope TimeScope{"CheckerManager::runCheckersForBlockEntrance"};
+  expandGraphWithCheckers(C, Dst, Src);
+}
+
 void CheckerManager::runCheckersForEndAnalysis(ExplodedGraph &G,
                                                BugReporter &BR,
                                                ExprEngine &Eng) {
@@ -877,6 +913,10 @@ void CheckerManager::_registerForBind(CheckBindFunc checkfn) {
   BindCheckers.push_back(checkfn);
 }
 
+void CheckerManager::_registerForBlockEntrance(CheckBlockEntranceFunc checkfn) {
+  BlockEntranceCheckers.push_back(checkfn);
+}
+
 void CheckerManager::_registerForEndAnalysis(CheckEndAnalysisFunc checkfn) {
   EndAnalysisCheckers.push_back(checkfn);
 }
diff --git a/clang/lib/StaticAnalyzer/Core/CoreEngine.cpp b/clang/lib/StaticAnalyzer/Core/CoreEngine.cpp
index bedb11f8b94a5..bdc2704cc424e 100644
--- a/clang/lib/StaticAnalyzer/Core/CoreEngine.cpp
+++ b/clang/lib/StaticAnalyzer/Core/CoreEngine.cpp
@@ -306,26 +306,37 @@ void CoreEngine::HandleBlockEdge(const BlockEdge &L, ExplodedNode *Pred) {
       }
     }
 
+    ExplodedNodeSet CheckerNodes;
+    BlockEntrance BE(L.getSrc(), L.getDst(), Pred->getLocationContext());
+    ExprEng.runCheckersForBlockEntrance(BuilderCtx, BE, Pred, CheckerNodes);
+
     // Process the final state transition.
-    ExprEng.processEndOfFunction(BuilderCtx, Pred, RS);
+    for (ExplodedNode *P : CheckerNodes) {
+      ExprEng.processEndOfFunction(BuilderCtx, P, RS);
+    }
 
     // This path is done. Don't enqueue any more nodes.
     return;
   }
 
   // Call into the ExprEngine to process entering the CFGBlock.
-  ExplodedNodeSet dstNodes;
   BlockEntrance BE(L.getSrc(), L.getDst(), Pred->getLocationContext());
-  NodeBuilderWithSinks nodeBuilder(Pred, dstNodes, BuilderCtx, BE);
-  ExprEng.processCFGBlockEntrance(L, nodeBuilder, Pred);
+  ExplodedNodeSet DstNodes;
+  NodeBuilderWithSinks NodeBuilder(Pred, DstNodes, BuilderCtx, BE);
+  ExprEng.processCFGBlockEntrance(L, NodeBuilder, Pred);
 
   // Auto-generate a node.
-  if (!nodeBuilder.hasGeneratedNodes()) {
-    nodeBuilder.generateNode(Pred->State, Pred);
+  if (!NodeBuilder.hasGeneratedNodes()) {
+    NodeBuilder.generateNode(Pred->State, Pred);
+  }
+
+  ExplodedNodeSet CheckerNodes;
+  for (auto *N : DstNodes) {
+    ExprEng.runCheckersForBlockEntrance(BuilderCtx, BE, N, CheckerNodes);
   }
 
   // Enqueue nodes onto the worklist.
-  enqueue(dstNodes);
+  enqueue(CheckerNodes);
 }
 
 void CoreEngine::HandleBlockEntrance(const BlockEntrance &L,
diff --git a/clang/lib/StaticAnalyzer/Core/ExprEngine.cpp b/clang/lib/StaticAnalyzer/Core/ExprEngine.cpp
index 1afd4b52eb354..992e451d33081 100644
--- a/clang/lib/StaticAnalyzer/Core/ExprEngine.cpp
+++ b/clang/lib/StaticAnalyzer/Core/ExprEngine.cpp
@@ -2619,6 +2619,19 @@ void ExprEngine::processCFGBlockEntrance(const BlockEdge &L,
   }
 }
 
+void ExprEngine::runCheckersForBlockEntrance(const NodeBuilderContext &BldCtx,
+                                             const BlockEntrance &Entrance,
+                                             ExplodedNode *Pred,
+                                             ExplodedNodeSet &Dst) {
+  llvm::PrettyStackTraceFormat CrashInfo(
+      "Processing block entrance B%d -> B%d",
+      Entrance.getPreviousBlock()->getBlockID(),
+      Entrance.getBlock()->getBlockID());
+  currBldrCtx = &BldCtx;
+  getCheckerManager().runCheckersForBlockEntrance(Dst, Pred, Entrance, *this);
+  currBldrCtx = nullptr;
+}
+
 //===----------------------------------------------------------------------===//
 // Branch processing.
 //===----------------------------------------------------------------------===//
diff --git a/clang/lib/StaticAnalyzer/Core/ExprEngineC.cpp b/clang/lib/StaticAnalyzer/Core/ExprEngineC.cpp
index b91954eda19b8..fa8e669b6bb2f 100644
--- a/clang/lib/StaticAnalyzer/Core/ExprEngineC.cpp
+++ b/clang/lib/StaticAnalyzer/Core/ExprEngineC.cpp
@@ -711,9 +711,10 @@ void ExprEngine::VisitLogicalExpr(const BinaryOperator* B, ExplodedNode *Pred,
   }
 
   ExplodedNode *N = Pred;
-  while (!N->getLocation().getAs<BlockEntrance>()) {
+  while (!N->getLocation().getAs<BlockEdge>()) {
     ProgramPoint P = N->getLocation();
-    assert(P.getAs<PreStmt>()|| P.getAs<PreStmtPurgeDeadSymbols>());
+    assert(P.getAs<PreStmt>() || P.getAs<PreStmtPurgeDeadSymbols>() ||
+           P.getAs<BlockEntrance>());
     (void) P;
     if (N->pred_size() != 1) {
       // We failed to track back where we came from.
@@ -729,7 +730,6 @@ void ExprEngine::VisitLogicalExpr(const BinaryOperator* B, ExplodedNode *Pred,
     return;
   }
 
-  N = *N->pred_begin();
   BlockEdge BE = N->getLocation().castAs<BlockEdge>();
   SVal X;
 
diff --git a/clang/unittests/StaticAnalyzer/BlockEntranceCallbackTest.cpp b/clang/unittests/StaticAnalyzer/BlockEntranceCallbackTest.cpp
new file mode 100644
index 0000000000000..729c6d6ebf3ba
--- /dev/null
+++ b/clang/unittests/StaticAnalyzer/BlockEntranceCallbackTest.cpp
@@ -0,0 +1,283 @@
+//===- unittests/StaticAnalyzer/BlockEntranceCallbackTest.cpp -------------===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+
+#include "CheckerRegistration.h"
+#include "clang/Analysis/AnalysisDeclContext.h"
+#include "clang/Analysis/ProgramPoint.h"
+#include "clang/StaticAnalyzer/Core/BugReporter/BugReporter.h"
+#include "clang/StaticAnalyzer/Core/BugReporter/BugType.h"
+#include "clang/StaticAnalyzer/Core/Checker.h"
+#include "clang/StaticAnalyzer/Core/PathSensitive/CheckerContext.h"
+#include "clang/StaticAnalyzer/Core/PathSensitive/ProgramState_Fwd.h"
+#include "clang/StaticAnalyzer/Frontend/AnalysisConsumer.h"
+#include "clang/StaticAnalyzer/Frontend/CheckerRegistry.h"
+#include "llvm/ADT/STLExtras.h"
+#include "llvm/ADT/StringExtras.h"
+#include "llvm/ADT/StringRef.h"
+#include "llvm/Support/FormatVariadic.h"
+#include "llvm/Support/raw_ostream.h"
+#include "gtest/gtest.h"
+
+#include <initializer_list>
+
+using namespace clang;
+using namespace ento;
+
+namespace {
+
+class BlockEntranceCallbackTester : public Checker<check::BlockEntrance> {
+  const BugType Bug{this, "BlockEntranceTester"};
+
+public:
+  void checkBlockEntrance(const BlockEntrance &Entrance,
+                          CheckerContext &C) const {
+    ExplodedNode *Node = C.generateNonFatalErrorNode(C.getState());
+    if (!Node)
+      return;
+
+    const auto *FD =
+        cast<FunctionDecl>(C.getLocationContext()->getStackFrame()->getDecl());
+
+    std::string Description = llvm::formatv(
+        "Within '{0}' B{1} -> B{2}", FD->getIdentifier()->getName(),
+        Entrance.getPreviousBlock()->getBlockID(),
+        Entrance.getBlock()->getBlockID());
+    auto Report =
+        std::make_unique<PathSensitiveBugReport>(Bug, Description, Node);
+    C.emitReport(std::move(Report));
+  }
+};
+
+void registerBlockEntranceTester(CheckerManager &Mgr) {
+  Mgr.registerChecker<BlockEntranceCallbackTester>();
+}
+
+bool shouldRegisterBlockEntranceTester(const CheckerManager &) { return true; }
+
+void addBlockEntranceTester(AnalysisASTConsumer &AnalysisConsumer,
+                            AnalyzerOptions &AnOpts) {
+  AnOpts.CheckersAndPackages = {{"test.BlockEntranceTester", true}};
+  AnalysisConsumer.AddCheckerRegistrationFn([](CheckerRegistry &Registry) {
+    Registry.addChecker(
+        &registerBlockEntranceTester, &shouldRegisterBlockEntranceTester,
+        "test.BlockEntranceTester", "EmptyDescription", "EmptyDocsUri",
+        /*IsHidden=*/false);
+  });
+}
+
+llvm::SmallVector<StringRef> parseEachDiag(StringRef Diags) {
+  llvm::SmallVector<StringRef> Fragments;
+  llvm::SplitString(Diags, Fragments, "\n");
+  llvm::for_each(Fragments, [](StringRef &Fragment) {
+    Fragment.consume_front("test.BlockEntranceTester: ");
+  });
+  llvm::sort(Fragments);
+  return Fragments;
+}
+
+bool runChecker(const std::string &Code, std::string &Diags) {
+  std::string RawDiags;
+  bool Res =
+      runCheckerOnCode<addBlockEntranceTester>(Code, RawDiags,
+                                               /*OnlyEmitWarnings=*/true);
+  llvm::raw_string_ostream OS(Diags);
+  llvm::interleave(parseEachDiag(RawDiags), OS, "\n");
+  return Res;
+}
+
+std::string expected(SmallVector<StringRef> Diags) {
+  llvm::sort(Diags);
+  std::string Result;
+  llvm::raw_string_ostream OS(Result);
+  llvm::interleave(Diags, OS, "\n");
+  return Result;
+}
+
+TEST(BlockEntranceTester, FromEntryToExit) {
+  constexpr auto Code = R"cpp(
+  void top() {
+    // empty
+  })cpp";
+
+  std::string Diags;
+  EXPECT_TRUE(runChecker(Code, Diags));
+  EXPECT_EQ(expected({"Within 'top' B1 -> B0"}), Diags);
+}
+
+TEST(BlockEntranceTester, SingleOpaqueIfCondition) {
+  constexpr auto Code = R"cpp(
+  bool coin();
+  int glob;
+  void top() {
+    if (coin()) {
+      glob = 1;
+    } else {
+      glob = 2;
+    }
+    glob = 3;
+  })cpp";
+
+  std::string Diags;
+  EXPECT_TRUE(runChecker(Code, Diags));
+  EXPECT_EQ(expected({
+                "Within 'top' B1 -> B0",
+                "Within 'top' B2 -> B1",
+                "Within 'top' B3 -> B1",
+                "Within 'top' B4 -> B2",
+                "Within 'top' B4 -> B3",
+                "Within 'top' B5 -> B4",
+            }),
+            Diags);
+  // entry true                   exit
+  //  B5 -------> B4 --> B2 --> B1 --> B0
+  //  |                         ^
+  //  | false                   |
+  //  v                         |
+  //  B3 -----------------------+
+}
+
+TEST(BlockEntranceTester, TrivialIfCondition) {
+  constexpr auto Code = R"cpp(
+  bool coin();
+  int glob;
+  void top() {
+    int cond = true;
+    if (cond) {
+      glob = 1;
+    } else {
+      glob = 2;
+    }
+    glob = 3;
+  })cpp";
+
+  std::string Diags;
+  EXPECT_TRUE(runChecker(Code, Diags));
+  EXPECT_EQ(expected({
+                "Within 'top' B1 -> B0",
+                "Within 'top' B3 -> B1",
+                "Within 'top' B4 -> B3",
+                "Within 'top' B5 -> B4",
+            }),
+            Diags);
+  // entry  true                         exit
+  // B5 ----------> B4 --> B3 --> B1 --> B0
+}
+
+TEST(BlockEntranceTester, AcrossFunctions) {
+  constexpr auto Code = R"cpp(
+  bool coin();
+  int glob;
+  void nested() { glob = 1; }
+  void top() {
+    glob = 0;
+    nested();
+    glob = 2;
+  })cpp";
+
+  std::string Diags;
+  EXPECT_TRUE(runChecker(Code, Diags));
+  EXPECT_EQ(
+      expected({
+          // Going from the "top" entry artificial node to the "top" body.
+          // Ideally, we shouldn't observe this edge because it's artificial.
+          "Within 'top' B2 -> B1",
+
+          // We encounter the call to "nested()" in the "top" body, thus we have
+          // a "CallEnter" node, but importantly, we also elide the transition
+          // to the "entry" node of "nested()".
+          // We only see the edge from the "nested()" entry to the "nested()"
+          // body:
+          "Within 'nested' B2 -> B1",
+
+          // Once we return from "nested()", we transition to the "exit" node of
+          // "nested()":
+          "Within 'nested' B1 -> B0",
+
+          // We will eventually return to the "top" body, thus we transition to
+          // its "exit" node:
...
[truncated]

Copy link
Contributor

@NagyDonat NagyDonat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll try to review this commit within a few days -- please ping me if I accidentally forget.

I have one immediate question about this area:

Traversing the CFG blocks of a function is a fundamental operation. Many C++ constructs can create splits in the control-flow, such as if, for, and similar control structures or ternary expressions, gnu conditionals, gotos, switches and possibly more.

What is the relationship of this new callback with the BranchCondition callback which is evaluated in ProcessBranch where the checker splits the execution path into multiple branches? It would be nice if you could document the difference between these two callbacks.

@Xazax-hun
Copy link
Collaborator

What is the relationship of this new callback with the BranchCondition callback

+1, I am also interested to learn what is the main motivation behind the new callback. Do you have some example use cases in mind? I think users might get confused which one to use.

@balazs-benics-sonarsource
Copy link
Contributor Author

balazs-benics-sonarsource commented May 22, 2025

What is the relationship of this new callback with the BranchCondition callback which is evaluated in ProcessBranch where the checker splits the execution path into multiple branches? It would be nice if you could document the difference between these two callbacks.

Great question. They are for different purposes, but one should prefer BranchCondition over the strictly more generic BlockEntrance callback. State splits work as in any other checkers. If you split the state in both callback, you will end up with 4 splits for the cases that are if, for, while, statements with conditions and 2 paths for every other control-flow edge, like entering the first basic block from the Entry CFG node, or following a break, continue or goto (unconditional) CFG edge.
EDIT: But remember, these splits happen at different program points in the graph, thus there won't be a single node with all those 4 splits, but rather the branch condition node will split, and then once we reach the block entrance we split again. Just like in any other checker.
Documented this now in c2bd657, and added a unit test for the case when both callbacks are defined in 910abb3.

What is the relationship of this new callback with the BranchCondition callback

+1, I am also interested to learn what is the main motivation behind the new callback. Do you have some example use cases in mind? I think users might get confused which one to use.

My motivation is to drive enabling or disabling an internal checker by entering or leaving a certain CFG block.
Concretely, if a short-circuiting operator leads to the guarded CFG block (aka. the true branch for the && operator and the false branch for the || operator), enable a checker to detect certain constructs in a guarded block. And once we are leaving the guarded block disable the checker. I hope it clarifies the motivation and demonstrates the usefulness of such a callback.

I believe that there could be other similar use cases of this.

@balazs-benics-sonarsource
Copy link
Contributor Author

I know it didn't actually pass a week for a ping, but let me know if its on the horizon.

Copy link
Contributor

@NagyDonat NagyDonat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I finished reviewing the non-test code and overall it LGTM, but I added two minor questions as inline comments. I'll review the tests tomorrow. Edit: I reviewed them today.

Copy link
Contributor

@NagyDonat NagyDonat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also read the tests and the change LGTM if you move the explanations from the very helpful review comments https://github.com/llvm/llvm-project/pull/140924/files#r2107471659 and https://github.com/llvm/llvm-project/pull/140924/files#r2107491565 to source code comments.

In the tests I felt that it'd be a bit hard to decipher the meaning of the block identifiers like B1 etc. -- but when I re-read the file I noticed that you included the very nice helper function dumpCFGAndEgraph (IIUC) for those who will need to debug broken cases in this test file 😄 Perhaps it would be even nicer if you included a commented out call to that function with "// NOTE: Uncomment this if you want to decipher the meaning of 'B0', 'B1', ..." to make its existence and role more obvious.

@balazs-benics-sonarsource
Copy link
Contributor Author

In the tests I felt that it'd be a bit hard to decipher the meaning of the block identifiers like B1 etc. -- but when I re-read the file I noticed that you included the very nice helper function dumpCFGAndEgraph (IIUC) for those who will need to debug broken cases in this test file 😄 Perhaps it would be even nicer if you included a commented out call to that function with "// NOTE: Uncomment this if you want to decipher the meaning of 'B0', 'B1', ..." to make its existence and role more obvious.

Changed the debugging interface a bit, and added comments explaining it in f339eb1

@balazs-benics-sonarsource balazs-benics-sonarsource merged commit 104f5d1 into llvm:main May 27, 2025
11 checks passed
@balazs-benics-sonarsource balazs-benics-sonarsource deleted the bb/upstream-check-block-entrance branch May 27, 2025 08:11
@llvm-ci

This comment was marked as off-topic.

@llvm-ci

This comment was marked as off-topic.

@llvm-ci

This comment was marked as off-topic.

Copy link
Contributor

@NagyDonat NagyDonat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@steakhal Thanks for the updates, I'm completely satisfied with them.

I don't see any connection between this commit and the buildbot failures 🤔 ... they are probably unrelated.

@balazs-benics-sonarsource
Copy link
Contributor Author

@steakhal Thanks for the updates, I'm completely satisfied with them.

I don't see any connection between this commit and the buildbot failures 🤔 ... they are probably unrelated.

About 90% of the time they are unrelated. I don't usually put a confirmation to these messages.

sivan-shani pushed a commit to sivan-shani/llvm-project that referenced this pull request Jun 3, 2025
…140924)

Tranersing the CFG blocks of a function is a fundamental operation. Many
C++ constructs can create splits in the control-flow, such as `if`,
`for`, and similar control structures or ternary expressions, gnu
conditionals, gotos, switches and possibly more.

Checkers should be able to get notifications about entering or leaving a
CFG block of interest.

Note that in the ExplodedGraph there is always a BlockEntrance
ProgramPoint right after the BlockEdge ProgramPoint. I considered naming
this callback check::BlockEdge, but then that may leave the observer of
the graph puzzled to see BlockEdge points followed more BlockEdge nodes
describing the same CFG transition. This confusion could also apply to
Bug Report Visitors too.

Because of this, I decided to hook BlockEntrance ProgramPoints instead.
The same confusion applies here, but I find this still a better place
TBH. There would only appear only one BlockEntrance ProgramPoint in the
graph if no checkers modify the state or emit a bug report. Otherwise
they modify some GDM (aka. State) thus create a new ExplodedNode with
the same BlockEntrance ProgramPoint in the graph.

CPP-6484
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
clang:static analyzer clang Clang issues not falling into any other category
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants