Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[clang][DependencyScanning] Track modules that resolve from "shared" locations #130634

Merged
merged 6 commits into from
Mar 17, 2025

Conversation

cyndyishida
Copy link
Member

That patch tracks whether all the file & module dependencies of a module resolve to a sysroot location. This information will later be queried by build systems for determining where to store the accompanying pcms.

That patch tracks whether all the file & module dependencies of a module
resolve to a sysroot location. This information will later be queried by
build systems for determining where to store the accompanying pcms.
@llvmbot llvmbot added the clang Clang issues not falling into any other category label Mar 10, 2025
@llvmbot
Copy link
Member

llvmbot commented Mar 10, 2025

@llvm/pr-subscribers-clang

Author: Cyndy Ishida (cyndyishida)

Changes

That patch tracks whether all the file & module dependencies of a module resolve to a sysroot location. This information will later be queried by build systems for determining where to store the accompanying pcms.


Full diff: https://github.com/llvm/llvm-project/pull/130634.diff

4 Files Affected:

  • (modified) clang/include/clang/Tooling/DependencyScanning/ModuleDepCollector.h (+7)
  • (modified) clang/lib/Tooling/DependencyScanning/ModuleDepCollector.cpp (+23-2)
  • (added) clang/test/ClangScanDeps/modules-in-sysroot.c (+107)
  • (modified) clang/tools/clang-scan-deps/ClangScanDeps.cpp (+2)
diff --git a/clang/include/clang/Tooling/DependencyScanning/ModuleDepCollector.h b/clang/include/clang/Tooling/DependencyScanning/ModuleDepCollector.h
index 20fb4de6a2a73..6187f0168e6d9 100644
--- a/clang/include/clang/Tooling/DependencyScanning/ModuleDepCollector.h
+++ b/clang/include/clang/Tooling/DependencyScanning/ModuleDepCollector.h
@@ -114,6 +114,10 @@ struct ModuleDeps {
   /// Whether this is a "system" module.
   bool IsSystem;
 
+  /// Whether this is a module where it's dependencies resolve within the
+  /// sysroot.
+  bool IsInSysroot;
+
   /// The path to the modulemap file which defines this module.
   ///
   /// This can be used to explicitly build this module. This file will
@@ -219,6 +223,9 @@ class ModuleDepCollectorPP final : public PPCallbacks {
                               llvm::DenseSet<const Module *> &AddedModules);
   void addAffectingClangModule(const Module *M, ModuleDeps &MD,
                           llvm::DenseSet<const Module *> &AddedModules);
+
+  /// Add discovered module dependency for the given module.
+  void addClangModule(const Module *M, const ModuleID ID, ModuleDeps &MD);
 };
 
 /// Collects modular and non-modular dependencies of the main file by attaching
diff --git a/clang/lib/Tooling/DependencyScanning/ModuleDepCollector.cpp b/clang/lib/Tooling/DependencyScanning/ModuleDepCollector.cpp
index 36b75c1016cd8..86eda34472cf0 100644
--- a/clang/lib/Tooling/DependencyScanning/ModuleDepCollector.cpp
+++ b/clang/lib/Tooling/DependencyScanning/ModuleDepCollector.cpp
@@ -698,6 +698,15 @@ ModuleDepCollectorPP::handleTopLevelModule(const Module *M) {
 
   MD.ID.ModuleName = M->getFullModuleName();
   MD.IsSystem = M->IsSystem;
+
+  // Start off with the assumption that this module is in the sysroot when there
+  // is a sysroot provided. As more dependencies are discovered, check if those
+  // come from the provided sysroot.
+  const StringRef CurrSysroot = MDC.ScanInstance.getHeaderSearchOpts().Sysroot;
+  MD.IsInSysroot =
+      !CurrSysroot.empty() &&
+      (llvm::sys::path::root_directory(CurrSysroot) != CurrSysroot);
+
   // For modules which use export_as link name, the linked product that of the
   // corresponding export_as-named module.
   if (!M->UseExportAsModuleLinkName)
@@ -739,6 +748,11 @@ ModuleDepCollectorPP::handleTopLevelModule(const Module *M) {
   MDC.ScanInstance.getASTReader()->visitInputFileInfos(
       *MF, /*IncludeSystem=*/true,
       [&](const serialization::InputFileInfo &IFI, bool IsSystem) {
+        auto FullFilePath = ASTReader::ResolveImportedPath(
+            PathBuf, IFI.UnresolvedImportedFilename, MF->BaseDirectory);
+        if (MD.IsInSysroot)
+          MD.IsInSysroot = FullFilePath->starts_with(CurrSysroot);
+        PathBuf.resize_for_overwrite(256);
         if (!(IFI.TopLevel && IFI.ModuleMap))
           return;
         if (IFI.UnresolvedImportedFilenameAsRequested.ends_with(
@@ -835,6 +849,13 @@ void ModuleDepCollectorPP::addAllSubmoduleDeps(
   });
 }
 
+void ModuleDepCollectorPP::addClangModule(const Module *M, const ModuleID ID,
+                                          ModuleDeps &MD) {
+  MD.ClangModuleDeps.push_back(ID);
+  if (MD.IsInSysroot)
+    MD.IsInSysroot = MDC.ModularDeps[M]->IsInSysroot;
+}
+
 void ModuleDepCollectorPP::addModuleDep(
     const Module *M, ModuleDeps &MD,
     llvm::DenseSet<const Module *> &AddedModules) {
@@ -843,7 +864,7 @@ void ModuleDepCollectorPP::addModuleDep(
         !MDC.isPrebuiltModule(Import)) {
       if (auto ImportID = handleTopLevelModule(Import->getTopLevelModule()))
         if (AddedModules.insert(Import->getTopLevelModule()).second)
-          MD.ClangModuleDeps.push_back(*ImportID);
+          addClangModule(Import->getTopLevelModule(), *ImportID, MD);
     }
   }
 }
@@ -867,7 +888,7 @@ void ModuleDepCollectorPP::addAffectingClangModule(
         !MDC.isPrebuiltModule(Affecting)) {
       if (auto ImportID = handleTopLevelModule(Affecting))
         if (AddedModules.insert(Affecting).second)
-          MD.ClangModuleDeps.push_back(*ImportID);
+          addClangModule(Affecting, *ImportID, MD);
     }
   }
 }
diff --git a/clang/test/ClangScanDeps/modules-in-sysroot.c b/clang/test/ClangScanDeps/modules-in-sysroot.c
new file mode 100644
index 0000000000000..d96aa69c0e8f4
--- /dev/null
+++ b/clang/test/ClangScanDeps/modules-in-sysroot.c
@@ -0,0 +1,107 @@
+// This test verifies modules that are entirely comprised from sysroot inputs are captured in
+// dependency information.
+
+// The first compilation verifies that transitive dependencies on non-sysroot input are captured.
+// The second compilation verifies that external paths are resolved when a vfsoverlay is applied when considering sysroot-ness.
+
+// RUN: rm -rf %t
+// RUN: split-file %s %t
+// RUN: sed -e "s|DIR|%/t|g" %t/compile-commands.json.in > %t/compile-commands.json
+// RUN: sed -e "s|DIR|%/t|g" %t/overlay.json.template > %t/overlay.json
+// RUN: clang-scan-deps -compilation-database %t/compile-commands.json \
+// RUN:   -j 1 -format experimental-full > %t/deps.db
+// RUN: cat %t/deps.db | sed 's:\\\\\?:/:g' | FileCheck %s -DPREFIX=%/t
+
+// CHECK:   "modules": [
+// CHECK-NEXT:     {
+// CHECK:            "is-in-sysroot": true,
+// CHECK:            "name": "A"
+
+// Verify that there are no more occurances of sysroot.
+// CHECK-NOT:            "is-in-sysroot"
+
+// CHECK:            "name": "A"
+// CHECK:            "USE_VFS"
+// CHECK:            "name": "B"
+// CHECK:            "name": "C"
+// CHECK:            "name": "D"
+// CHECK:            "name": "NotInSDK"
+
+//--- compile-commands.json.in
+[
+{
+  "directory": "DIR",
+  "command": "clang -c DIR/client.m -isysroot DIR/MacOSX.sdk -I DIR/BuildDir -fmodules -fmodules-cache-path=DIR/module-cache -fimplicit-module-maps",
+  "file": "DIR/client.m"
+},
+{
+  "directory": "DIR",
+  "command": "clang -c DIR/client.m -isysroot DIR/MacOSX.sdk  -ivfsoverlay DIR/overlay.json -DUSE_VFS -I DIR/BuildDir -fmodules -fmodules-cache-path=DIR/module-cache -fimplicit-module-maps",
+  "file": "DIR/client.m"
+}
+]
+
+//--- overlay.json.template
+{
+  "version": 0,
+  "case-sensitive": "false",
+  "roots": [
+    {
+          "external-contents": "DIR/local/A/A_vfs.h",
+          "name": "DIR/MacOSX.sdk/usr/include/A/A_vfs.h",
+          "type": "file"
+    }
+  ]
+}
+
+//--- MacOSX.sdk/usr/include/A/module.modulemap
+module A {
+  umbrella "."
+}
+
+//--- MacOSX.sdk/usr/include/A/A.h
+#ifdef USE_VFS
+#include <A/A_vfs.h>
+#endif 
+typedef int A_t;
+
+//--- local/A/A_vfs.h
+typedef int typeFromVFS;
+
+//--- MacOSX.sdk/usr/include/B/module.modulemap
+module B [system] {
+  umbrella "."
+}
+
+//--- MacOSX.sdk/usr/include/B/B.h
+#include <C/C.h>
+typedef int B_t;
+
+//--- MacOSX.sdk/usr/include/C/module.modulemap
+module C [system] {
+  umbrella "."
+}
+
+//--- MacOSX.sdk/usr/include/C/C.h
+#include <D/D.h>
+
+//--- MacOSX.sdk/usr/include/D/module.modulemap
+module D [system] {
+  umbrella "."
+}
+
+// Simulate a header that will be resolved in a local directory, from a sysroot header.
+//--- MacOSX.sdk/usr/include/D/D.h
+#include <HeaderNotFoundInSDK.h>
+
+//--- BuildDir/module.modulemap
+module NotInSDK [system] {
+  umbrella "."
+}
+
+//--- BuildDir/HeaderNotFoundInSDK.h
+typedef int local_t;
+
+//--- client.m
+#include <A/A.h>
+#include <B/B.h>
diff --git a/clang/tools/clang-scan-deps/ClangScanDeps.cpp b/clang/tools/clang-scan-deps/ClangScanDeps.cpp
index 3bdeb461e4bfa..f5946b30fb84d 100644
--- a/clang/tools/clang-scan-deps/ClangScanDeps.cpp
+++ b/clang/tools/clang-scan-deps/ClangScanDeps.cpp
@@ -471,6 +471,8 @@ class FullDeps {
         for (auto &&ModID : ModuleIDs) {
           auto &MD = Modules[ModID];
           JOS.object([&] {
+            if (MD.IsInSysroot)
+              JOS.attribute("is-in-sysroot", MD.IsInSysroot);
             JOS.attributeArray("clang-module-deps",
                                toJSONSorted(JOS, MD.ClangModuleDeps));
             JOS.attribute("clang-modulemap-file",

@jansvoboda11
Copy link
Contributor

I don't know what the desired properties of this feature are exactly, but I'm concerned that this does not guarantee the resulting command line only refers to inputs in the sysroot. Imagine that we run this without the header search and VFS pruning optimizations - the command line will likely refer to unused project-specific source/build directories. Even with those optimizations on, I'm not convinced we have great guarantees regarding the sysroot-only nature of the command line. I would like to see stronger guarantee.
One way of doing this is going through the *Options classes that contribute to CompilerInvocation and adding an API that allows for iteration through all input/output paths. We could use it to check the prefixes of all paths exhaustively (in addition to the logic you already implemented).

@cyndyishida
Copy link
Member Author

but I'm concerned that this does not guarantee the resulting command line only refers to inputs in the sysroot. Imagine that we run this without the header search and VFS pruning optimizations - the command line will likely refer to unused project-specific source/build directories.

I would think that if ModuleDepCollector only found dependency inputs that resolve to sysroot locations, the command line for building the specific module wouldn't matter. Of course, I would also expect that Precompile module invocations don't point to paths they don't need, but wouldn't be required for tracking whether a module is in sysroot. Is there a concern that the command lines (e.g. header search paths with different files populated) between scanning and compiling could result in different input files being resolved?

One way of doing this is going through the *Options classes that contribute to CompilerInvocation and adding an API that allows for iteration through all input/output paths.

Do you have an intuition of whether this would be noticeably expensive for the scanner?

Copy link
Collaborator

@benlangmuir benlangmuir left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a similar concern to @jansvoboda11 that we probably need to check the invocation paths as well.

I would think that if ModuleDepCollector only found dependency inputs that resolve to sysroot locations, the command line for building the specific module wouldn't matter.

One concern I have is if we have user search paths, the contents of those paths can change after scanning in a way that silently affects the compilation. Another more theoretical issue is that in general the set of module input files has no guarantee of being complete -- it's derived from the source manager and we don't necessarily have a mechanism to ensure every file that impacted the module was captured.

Copy link
Contributor

@Bigcheese Bigcheese left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall I think this looks good. I'm not sure I share the same concerns. The situation this covers is how likely is it that the build system is going to determine this needs to be rebuilt, and that happens (in most cases) purely based on the reported dependencies. So negative deps or unlisted deps don't change anything here.

@cyndyishida
Copy link
Member Author

I have a similar concern to @jansvoboda11 that we probably need to check the invocation paths as well.

The situation this covers is how likely is it that the build system is going to determine this needs to be rebuilt, and that happens (in most cases) purely based on the reported dependencies.

I can add the checks for inputs through the compiler invocations, with the hope that it will allow for more pcm sharing across build plans (e.g. projects) depending on the same modules. I am a little worried that a more conservative check would mean fewer modules that otherwise could safely be in a shared location don't end up there due to other limitations (e.g. we don't prune out an option that is safe to), but I don't have a sense if that is a practical concern.

* We need to track more than just sysroot, so start with sysroot and
  resource directory as a starting point, but allow the ability to
capture more as they are discovered in the future.

* Check a subset of the compiler invocation of module compilations for
  determining if a module can be shared.
* When a module has a prebuilt module dependency, consider is not
  shareable for now.
@cyndyishida cyndyishida changed the title [clang][DependencyScanning] Track modules that resolve from sysroot. [clang][DependencyScanning] Track modules that resolve from "shared" locations Mar 17, 2025
Copy link
Contributor

@Bigcheese Bigcheese left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm now, just a request to change the name.

@cyndyishida cyndyishida merged commit 584f8cc into llvm:main Mar 17, 2025
6 of 9 checks passed
@cyndyishida cyndyishida deleted the eng/PR-145776839 branch March 17, 2025 22:33
cyndyishida added a commit to cyndyishida/llvm-project that referenced this pull request Mar 17, 2025
…locations (llvm#130634)

That patch tracks whether all the file & module dependencies of a module
resolve to a stable location. This information will later be queried by
build systems for determining where to store the accompanying pcms.

(cherry picked from commit 584f8cc)
cyndyishida added a commit to cyndyishida/llvm-project that referenced this pull request Mar 18, 2025
…locations (llvm#130634)

That patch tracks whether all the file & module dependencies of a module
resolve to a stable location. This information will later be queried by
build systems for determining where to store the accompanying pcms.

(cherry picked from commit 584f8cc)
cyndyishida added a commit to swiftlang/llvm-project that referenced this pull request Mar 19, 2025
…locations (llvm#130634)

That patch tracks whether all the file & module dependencies of a module
resolve to a stable location. This information will later be queried by
build systems for determining where to store the accompanying pcms.

(cherry picked from commit 584f8cc)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
clang Clang issues not falling into any other category
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants