Skip to content

[UBSan] Support src:*=sanitize for multiple ignorelists. #141640

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

qinkunbao
Copy link
Member

@qinkunbao qinkunbao commented May 27, 2025

See: #139128 and #140529 for the background.

The introduction of these new tests (ubsan-src-ignorelist-category.test) -fsanitize-ignorelist=%t/src.ignorelist -fsanitize-ignorelist=%t/src.ignorelist.contradict9 in this PR will not lead to failures in the previous implementation (without this PR). This is because the existing logic distinguishes between Sections in different ignorelists, even if their names are identical. The order of these Sections is preserved using a vector.

qinkunbao added 2 commits May 27, 2025 17:32
Created using spr 1.3.6
@llvmbot llvmbot added clang Clang issues not falling into any other category clang:frontend Language frontend issues, e.g. anything involving "Sema" labels May 27, 2025
@llvmbot
Copy link
Member

llvmbot commented May 27, 2025

@llvm/pr-subscribers-llvm-support

@llvm/pr-subscribers-clang

Author: Qinkun Bao (qinkunbao)

Changes

See: #139128 for the background.


Full diff: https://github.com/llvm/llvm-project/pull/141640.diff

4 Files Affected:

  • (modified) clang/include/clang/Basic/SanitizerSpecialCaseList.h (+9-5)
  • (modified) clang/lib/Basic/NoSanitizeList.cpp (+6-3)
  • (modified) clang/lib/Basic/SanitizerSpecialCaseList.cpp (+9-8)
  • (modified) clang/test/CodeGen/ubsan-src-ignorelist-category.test (+11)
diff --git a/clang/include/clang/Basic/SanitizerSpecialCaseList.h b/clang/include/clang/Basic/SanitizerSpecialCaseList.h
index 25d518e7128cf..a19392399bfd7 100644
--- a/clang/include/clang/Basic/SanitizerSpecialCaseList.h
+++ b/clang/include/clang/Basic/SanitizerSpecialCaseList.h
@@ -19,6 +19,7 @@
 #include "llvm/ADT/StringRef.h"
 #include "llvm/Support/SpecialCaseList.h"
 #include <memory>
+#include <utility>
 #include <vector>
 
 namespace llvm {
@@ -44,20 +45,23 @@ class SanitizerSpecialCaseList : public llvm::SpecialCaseList {
                  StringRef Category = StringRef()) const;
 
   // Query ignorelisted entries if any bit in Mask matches the entry's section.
-  // Return 0 if not found. If found, return the line number (starts with 1).
-  unsigned inSectionBlame(SanitizerMask Mask, StringRef Prefix, StringRef Query,
-                          StringRef Category = StringRef()) const;
+  // Return (0,0 if not found. If found, return the file index number and the
+  // line number (FileIdx, LineNo) (all start with 1).
+  std::pair<unsigned, unsigned>
+  inSectionBlame(SanitizerMask Mask, StringRef Prefix, StringRef Query,
+                 StringRef Category = StringRef()) const;
 
 protected:
   // Initialize SanitizerSections.
   void createSanitizerSections();
 
   struct SanitizerSection {
-    SanitizerSection(SanitizerMask SM, SectionEntries &E)
-        : Mask(SM), Entries(E) {};
+    SanitizerSection(SanitizerMask SM, SectionEntries &E, unsigned idx)
+        : Mask(SM), Entries(E), FileIdx(idx) {};
 
     SanitizerMask Mask;
     SectionEntries &Entries;
+    unsigned FileIdx;
   };
 
   std::vector<SanitizerSection> SanitizerSections;
diff --git a/clang/lib/Basic/NoSanitizeList.cpp b/clang/lib/Basic/NoSanitizeList.cpp
index 4feef5c6ea052..c58a67971dfb6 100644
--- a/clang/lib/Basic/NoSanitizeList.cpp
+++ b/clang/lib/Basic/NoSanitizeList.cpp
@@ -44,14 +44,17 @@ bool NoSanitizeList::containsFunction(SanitizerMask Mask,
 
 bool NoSanitizeList::containsFile(SanitizerMask Mask, StringRef FileName,
                                   StringRef Category) const {
-  unsigned NoSanLine = SSCL->inSectionBlame(Mask, "src", FileName, Category);
+  auto [NoSanFileIdx, NoSanLine] =
+      SSCL->inSectionBlame(Mask, "src", FileName, Category);
   if (NoSanLine == 0)
     return false;
-  unsigned SanLine = SSCL->inSectionBlame(Mask, "src", FileName, "sanitize");
+  auto [SanFileIdx, SanLine] =
+      SSCL->inSectionBlame(Mask, "src", FileName, "sanitize");
   // If we have two cases such as `src:a.cpp=sanitize` and `src:a.cpp`, the
   // current entry override the previous entry.
   if (SanLine > 0)
-    return NoSanLine > SanLine;
+    return std::make_pair(NoSanFileIdx, NoSanLine) >
+           std::make_pair(SanFileIdx, SanLine);
   return true;
 }
 
diff --git a/clang/lib/Basic/SanitizerSpecialCaseList.cpp b/clang/lib/Basic/SanitizerSpecialCaseList.cpp
index 4508d705eb43a..0b2c0f9c4478a 100644
--- a/clang/lib/Basic/SanitizerSpecialCaseList.cpp
+++ b/clang/lib/Basic/SanitizerSpecialCaseList.cpp
@@ -49,28 +49,29 @@ void SanitizerSpecialCaseList::createSanitizerSections() {
 #undef SANITIZER
 #undef SANITIZER_GROUP
 
-    SanitizerSections.emplace_back(Mask, S.Entries);
+    SanitizerSections.emplace_back(Mask, S.Entries, S.FileIdx);
   }
 }
 
 bool SanitizerSpecialCaseList::inSection(SanitizerMask Mask, StringRef Prefix,
                                          StringRef Query,
                                          StringRef Category) const {
-  return inSectionBlame(Mask, Prefix, Query, Category);
+  auto [FileIdx, LineNo] = inSectionBlame(Mask, Prefix, Query, Category);
+  return FileIdx;
 }
 
-unsigned SanitizerSpecialCaseList::inSectionBlame(SanitizerMask Mask,
-                                                  StringRef Prefix,
-                                                  StringRef Query,
-                                                  StringRef Category) const {
+std::pair<unsigned, unsigned>
+SanitizerSpecialCaseList::inSectionBlame(SanitizerMask Mask, StringRef Prefix,
+                                         StringRef Query,
+                                         StringRef Category) const {
   for (auto it = SanitizerSections.crbegin(); it != SanitizerSections.crend();
        ++it) {
     if (it->Mask & Mask) {
       unsigned lineNum =
           SpecialCaseList::inSectionBlame(it->Entries, Prefix, Query, Category);
       if (lineNum > 0)
-        return lineNum;
+        return {it->FileIdx, lineNum};
     }
   }
-  return 0;
+  return {0, 0};
 }
diff --git a/clang/test/CodeGen/ubsan-src-ignorelist-category.test b/clang/test/CodeGen/ubsan-src-ignorelist-category.test
index 55967ec77c836..0ff2ea5c456a1 100644
--- a/clang/test/CodeGen/ubsan-src-ignorelist-category.test
+++ b/clang/test/CodeGen/ubsan-src-ignorelist-category.test
@@ -12,6 +12,9 @@
 // RUN: %clang_cc1 -triple x86_64-linux-gnu -fsanitize=signed-integer-overflow -fsanitize-ignorelist=%t/src.ignorelist.contradict6 -emit-llvm %t/test1.c -o - | FileCheck %s --check-prefixes=CHECK1,SANITIZE
 // RUN: %clang_cc1 -triple x86_64-linux-gnu -fsanitize=signed-integer-overflow -fsanitize-ignorelist=%t/src.ignorelist.contradict7 -emit-llvm %t/test1.c -o - | FileCheck %s --check-prefixes=CHECK1,IGNORE
 // RUN: %clang_cc1 -triple x86_64-linux-gnu -fsanitize=signed-integer-overflow -fsanitize-ignorelist=%t/src.ignorelist.contradict8 -emit-llvm %t/test1.c -o - | FileCheck %s --check-prefixes=CHECK1,SANITIZE
+// RUN: %clang_cc1 -triple x86_64-linux-gnu -fsanitize=signed-integer-overflow -fsanitize-ignorelist=%t/src.ignorelist.contradict8 -fsanitize-ignorelist=%t/src.ignorelist.contradict9 -emit-llvm %t/test1.c -o - | FileCheck %s --check-prefixes=CHECK1,IGNORE
+// RUN: %clang_cc1 -triple x86_64-linux-gnu -fsanitize=signed-integer-overflow -fsanitize-ignorelist=%t/src.ignorelist.contradict9 -fsanitize-ignorelist=%t/src.ignorelist.contradict8 -emit-llvm %t/test1.c -o - | FileCheck %s --check-prefixes=CHECK1,SANITIZE
+
 
 // Verify ubsan only emits checks for files in the allowlist
 
@@ -77,6 +80,14 @@ src:*/tes*1.c=sanitize
 src:*/te*t1.c
 src:*/t*st1.c=sanitize
 
+//--- src.ignorelist.contradict9
+src:*
+src:*/test1.c=sanitize
+src:*/test1.c
+src:*/test1.c=sanitize
+src:*/te*t1.c
+src:*/test*.c
+
 
 //--- test1.c
 // CHECK1-LABEL: define dso_local i32 @add

Created using spr 1.3.6
@qinkunbao qinkunbao requested a review from vitalybuka May 27, 2025 17:43
Created using spr 1.3.6
Copy link
Collaborator

@vitalybuka vitalybuka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM with comments

@qinkunbao qinkunbao changed the base branch from users/qinkunbao/spr/main.ubsan-support-srcsanitize-for-multiple-ignorelists to main May 28, 2025 04:19
Copy link

github-actions bot commented May 28, 2025

✅ With the latest revision this PR passed the C/C++ code formatter.

@qinkunbao
Copy link
Member Author

Looks like conflicts resolving got wrong. I will fix them and address the comments tomorrow

qinkunbao added 6 commits May 28, 2025 04:43
Created using spr 1.3.6
Created using spr 1.3.6
Created using spr 1.3.6
Created using spr 1.3.6
Created using spr 1.3.6
@qinkunbao qinkunbao merged commit 45b874b into main May 28, 2025
8 of 9 checks passed
@qinkunbao qinkunbao deleted the users/qinkunbao/spr/ubsan-support-srcsanitize-for-multiple-ignorelists branch May 28, 2025 15:12
// Return NotFound (0,0) if not found. If found, return the file index number
// and the line number (FileIdx, LineNo) (FileIdx starts with 1 and LineNo
// starts with 0).
std::pair<unsigned, unsigned>
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like this is almost an anti-pattern, using std::pair like this. We have two opaque values that we now have to go back to the function to understand what they mean.

If you had a type (class) with two members (FileIdx and LineNo) then it would make sense at each use as opposed to first and second which is not informative.

I actually don't see the values explicitly used but I am assuming they will be in a later change.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like this is almost an anti-pattern

LLVM code is full of them, and here they should not be exposed beyond NoSanitizeList, so I ignore on review.
But I will not argue if you are asking to improve.

It's used for comparison "return NoSan > San" in NoSanitizeList::containsFile.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I won't insist, but if we won't actually (maybe I am misunderstanding) use the value stored in the pair then a flag makes way more sense. Otherwise, having to use first and second is just prone to error.

I do get that this is local and that we (llvm) is not free of anti-patterns but we should strive to do better in new code. One because future new users might use this to learn and outside developers (even LLMs) will learn this is a good pattern and propagate its use.

sivan-shani pushed a commit to sivan-shani/llvm-project that referenced this pull request Jun 3, 2025
See: llvm#139128 and
llvm#140529 for the background.

The introduction of these new tests (ubsan-src-ignorelist-category.test)
`-fsanitize-ignorelist=%t/src.ignorelist
-fsanitize-ignorelist=%t/src.ignorelist.contradict9` in this PR will not
lead to failures in the previous implementation (without this PR). This
is because the existing logic distinguishes between Sections in
different ignorelists, even if their names are identical. The order of
these Sections is preserved using a `vector`.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
clang:frontend Language frontend issues, e.g. anything involving "Sema" clang Clang issues not falling into any other category llvm:support
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants