Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[clang][DependencyScanning] Track dependencies from prebuilt modules to determine IsInStableDir #132237

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

cyndyishida
Copy link
Member

When a module is being scanned, it can depend on modules that have already been built from a pch dependency. When this happens, the pcm files are reused for the module dependencies. When this is the case, check if input files recorded from the PCMs come from the provided stable directories transitively since the scanner will not have access to the full set of file dependencies from prebuilt modules.

determine IsInStableDir

When a module is being scanned, it can depend on modules that have
already been built from a pch dependency. When this happens, the
pcm files are reused for the module dependencies. When this is the case,
check if input files recorded from the PCMs come from the provided stable directories transitively,
since the scanner will not have access to the full set of file dependencies from prebuilt modules.
@llvmbot llvmbot added clang Clang issues not falling into any other category clang:modules C++20 modules and Clang Header Modules labels Mar 20, 2025
@llvmbot
Copy link
Member

llvmbot commented Mar 20, 2025

@llvm/pr-subscribers-clang

@llvm/pr-subscribers-clang-modules

Author: Cyndy Ishida (cyndyishida)

Changes

When a module is being scanned, it can depend on modules that have already been built from a pch dependency. When this happens, the pcm files are reused for the module dependencies. When this is the case, check if input files recorded from the PCMs come from the provided stable directories transitively since the scanner will not have access to the full set of file dependencies from prebuilt modules.


Patch is 24.53 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/132237.diff

6 Files Affected:

  • (modified) clang/include/clang/Serialization/ASTReader.h (+11)
  • (modified) clang/include/clang/Tooling/DependencyScanning/ModuleDepCollector.h (+54-5)
  • (modified) clang/lib/Serialization/ASTReader.cpp (+5-1)
  • (modified) clang/lib/Tooling/DependencyScanning/DependencyScanningWorker.cpp (+93-18)
  • (modified) clang/lib/Tooling/DependencyScanning/ModuleDepCollector.cpp (+51-38)
  • (modified) clang/test/ClangScanDeps/prebuilt-modules-in-stable-dirs.c (+19-5)
diff --git a/clang/include/clang/Serialization/ASTReader.h b/clang/include/clang/Serialization/ASTReader.h
index 2779b3d1cf2ea..9e360c9409b89 100644
--- a/clang/include/clang/Serialization/ASTReader.h
+++ b/clang/include/clang/Serialization/ASTReader.h
@@ -237,6 +237,17 @@ class ASTReaderListener {
     return true;
   }
 
+  /// Overloaded member function of \c visitInputFile that should
+  /// be defined when the input file contains both the virtual and external
+  /// paths, for example when deserializing input files from AST files.
+  ///
+  /// \returns true to continue receiving the next input file, false to stop.
+  virtual bool visitInputFile(StringRef FilenameAsRequested, StringRef Filename,
+                              bool isSystem, bool isOverridden,
+                              bool isExplicitModule) {
+    return true;
+  }
+
   /// Returns true if this \c ASTReaderListener wants to receive the
   /// imports of the AST file via \c visitImport, false otherwise.
   virtual bool needsImportVisitation() const { return false; }
diff --git a/clang/include/clang/Tooling/DependencyScanning/ModuleDepCollector.h b/clang/include/clang/Tooling/DependencyScanning/ModuleDepCollector.h
index 422202caddfd4..95fcb25aca450 100644
--- a/clang/include/clang/Tooling/DependencyScanning/ModuleDepCollector.h
+++ b/clang/include/clang/Tooling/DependencyScanning/ModuleDepCollector.h
@@ -33,6 +33,7 @@ namespace dependencies {
 
 class DependencyActionController;
 class DependencyConsumer;
+class PrebuiltModuleASTAttrs;
 
 /// Modular dependency that has already been built prior to the dependency scan.
 struct PrebuiltModuleDep {
@@ -46,6 +47,47 @@ struct PrebuiltModuleDep {
         ModuleMapFile(M->PresumedModuleMapFile) {}
 };
 
+/// Attributes loaded from AST files of prebuilt modules collected prior to
+/// ModuleDepCollector creation.
+using PrebuiltModulesAttrsMap = llvm::StringMap<PrebuiltModuleASTAttrs>;
+class PrebuiltModuleASTAttrs {
+public:
+  /// When a module is discovered to not be in stable directories, traverse &
+  /// update all modules that depend on it.
+  void
+  updateDependentsNotInStableDirs(PrebuiltModulesAttrsMap &PrebuiltModulesMap);
+
+  /// Read-only access to whether the module is made up of dependencies in
+  /// stable directories.
+  bool isInStableDir() const { return IsInStableDirs; }
+
+  /// Read-only access to vfs map files.
+  const llvm::StringSet<> &getVFS() const { return VFSMap; }
+
+  /// Update the VFSMap to the one discovered from serializing the AST file.
+  void setVFS(llvm::StringSet<> &&VFS) { VFSMap = std::move(VFS); }
+
+  /// Add a direct dependent module file, so it can be updated if the current
+  /// module is from stable directores.
+  void addDependent(StringRef ModuleFile) {
+    ModuleFileDependents.insert(ModuleFile);
+  }
+
+  /// Update whether the prebuilt module resolves entirely in a stable
+  /// directories.
+  void setInStableDir(bool V = false) {
+    // Cannot reset attribute once it's false.
+    if (!IsInStableDirs)
+      return;
+    IsInStableDirs = V;
+  }
+
+private:
+  llvm::StringSet<> VFSMap;
+  bool IsInStableDirs = true;
+  std::set<StringRef> ModuleFileDependents;
+};
+
 /// This is used to identify a specific module.
 struct ModuleID {
   /// The name of the module. This may include `:` for C++20 module partitions,
@@ -170,8 +212,6 @@ struct ModuleDeps {
       BuildInfo;
 };
 
-using PrebuiltModuleVFSMapT = llvm::StringMap<llvm::StringSet<>>;
-
 class ModuleDepCollector;
 
 /// Callback that records textual includes and direct modular includes/imports
@@ -241,7 +281,7 @@ class ModuleDepCollector final : public DependencyCollector {
                      CompilerInstance &ScanInstance, DependencyConsumer &C,
                      DependencyActionController &Controller,
                      CompilerInvocation OriginalCI,
-                     PrebuiltModuleVFSMapT PrebuiltModuleVFSMap);
+                     const PrebuiltModulesAttrsMap PrebuiltModulesASTMap);
 
   void attachToPreprocessor(Preprocessor &PP) override;
   void attachToASTReader(ASTReader &R) override;
@@ -261,8 +301,9 @@ class ModuleDepCollector final : public DependencyCollector {
   DependencyConsumer &Consumer;
   /// Callbacks for computing dependency information.
   DependencyActionController &Controller;
-  /// Mapping from prebuilt AST files to their sorted list of VFS overlay files.
-  PrebuiltModuleVFSMapT PrebuiltModuleVFSMap;
+  /// Mapping from prebuilt AST filepaths to their attributes referenced during
+  /// dependency collecting.
+  const PrebuiltModulesAttrsMap PrebuiltModulesASTMap;
   /// Path to the main source file.
   std::string MainFile;
   /// Hash identifying the compilation conditions of the current TU.
@@ -338,6 +379,14 @@ void resetBenignCodeGenOptions(frontend::ActionKind ProgramAction,
 bool isPathInStableDir(const ArrayRef<StringRef> Directories,
                        const StringRef Input);
 
+/// Determine if options collected from a module's
+/// compilation can safely be considered as stable.
+///
+/// \param Directories Paths known to be in a stable location. e.g. Sysroot.
+/// \param HSOpts Header search options derived from the compiler invocation.
+bool areOptionsInStableDir(const ArrayRef<StringRef> Directories,
+                           const HeaderSearchOptions &HSOpts);
+
 } // end namespace dependencies
 } // end namespace tooling
 } // end namespace clang
diff --git a/clang/lib/Serialization/ASTReader.cpp b/clang/lib/Serialization/ASTReader.cpp
index 2c73501821bff..025b5fa9162aa 100644
--- a/clang/lib/Serialization/ASTReader.cpp
+++ b/clang/lib/Serialization/ASTReader.cpp
@@ -5834,9 +5834,13 @@ bool ASTReader::readASTFileControlBlock(
           break;
         case INPUT_FILE:
           bool Overridden = static_cast<bool>(Record[3]);
+          size_t FilenameLen = ModuleDir.size() + Record[7] + 1;
           auto Filename = ResolveImportedPath(PathBuf, Blob, ModuleDir);
+          StringRef FilenameAsRequested = Filename->substr(0, FilenameLen);
+          StringRef ExternalFilename = Filename->substr(FilenameLen);
           shouldContinue = Listener.visitInputFile(
-              *Filename, isSystemFile, Overridden, /*IsExplicitModule=*/false);
+              FilenameAsRequested, ExternalFilename, isSystemFile, Overridden,
+              /*IsExplicitModule=*/false);
           break;
         }
         if (!shouldContinue)
diff --git a/clang/lib/Tooling/DependencyScanning/DependencyScanningWorker.cpp b/clang/lib/Tooling/DependencyScanning/DependencyScanningWorker.cpp
index 697f26ee5d12f..2f7fff3d8dd0c 100644
--- a/clang/lib/Tooling/DependencyScanning/DependencyScanningWorker.cpp
+++ b/clang/lib/Tooling/DependencyScanning/DependencyScanningWorker.cpp
@@ -89,37 +89,99 @@ static bool checkHeaderSearchPaths(const HeaderSearchOptions &HSOpts,
 
 using PrebuiltModuleFilesT = decltype(HeaderSearchOptions::PrebuiltModuleFiles);
 
-/// A listener that collects the imported modules and optionally the input
-/// files.
+/// A listener that collects the imported modules and the input
+/// files. While visiting, collect vfsoverlays and file inputs that determine
+/// whether prebuilt modules fully resolve in stable directories.
 class PrebuiltModuleListener : public ASTReaderListener {
 public:
   PrebuiltModuleListener(PrebuiltModuleFilesT &PrebuiltModuleFiles,
                          llvm::SmallVector<std::string> &NewModuleFiles,
-                         PrebuiltModuleVFSMapT &PrebuiltModuleVFSMap,
+                         PrebuiltModulesAttrsMap &PrebuiltModulesASTMap,
                          const HeaderSearchOptions &HSOpts,
-                         const LangOptions &LangOpts, DiagnosticsEngine &Diags)
+                         const LangOptions &LangOpts, DiagnosticsEngine &Diags,
+                         const llvm::SmallVector<StringRef> &StableDirs)
       : PrebuiltModuleFiles(PrebuiltModuleFiles),
         NewModuleFiles(NewModuleFiles),
-        PrebuiltModuleVFSMap(PrebuiltModuleVFSMap), ExistingHSOpts(HSOpts),
-        ExistingLangOpts(LangOpts), Diags(Diags) {}
+        PrebuiltModulesASTMap(PrebuiltModulesASTMap), ExistingHSOpts(HSOpts),
+        ExistingLangOpts(LangOpts), Diags(Diags), StableDirs(StableDirs) {}
 
   bool needsImportVisitation() const override { return true; }
+  bool needsInputFileVisitation() override { return true; }
+  bool needsSystemInputFileVisitation() override { return true; }
 
+  /// Accumulate the modules are transitively depended on by the initial
+  /// prebuilt module.
   void visitImport(StringRef ModuleName, StringRef Filename) override {
     if (PrebuiltModuleFiles.insert({ModuleName.str(), Filename.str()}).second)
       NewModuleFiles.push_back(Filename.str());
+
+    if (PrebuiltModulesASTMap.try_emplace(Filename).second)
+      PrebuiltModulesASTMap[Filename].setInStableDir(!StableDirs.empty());
+
+    if (auto It = PrebuiltModulesASTMap.find(CurrentFile);
+        It != PrebuiltModulesASTMap.end() && CurrentFile != Filename)
+      PrebuiltModulesASTMap[Filename].addDependent(It->getKey());
+  }
+
+  /// For each input file discovered, check whether it's external path is in a
+  /// stable directory. Traversal is stopped if the current module is not
+  /// considered stable.
+  bool visitInputFile(StringRef FilenameAsRequested, StringRef ExternalFilename,
+                      bool isSystem, bool isOverridden,
+                      bool isExplicitModule) override {
+    if (StableDirs.empty())
+      return false;
+    if (!PrebuiltModulesASTMap.contains(CurrentFile) ||
+        !PrebuiltModulesASTMap[CurrentFile].isInStableDir())
+      return false;
+
+    const StringRef FileToUse =
+        ExternalFilename.empty() ? FilenameAsRequested : ExternalFilename;
+
+    PrebuiltModulesASTMap[CurrentFile].setInStableDir(
+        isPathInStableDir(StableDirs, FileToUse));
+    return PrebuiltModulesASTMap[CurrentFile].isInStableDir();
   }
 
+  /// Update which module that is being actively traversed.
   void visitModuleFile(StringRef Filename,
                        serialization::ModuleKind Kind) override {
+    // If the CurrentFile is not
+    // considered stable, update any of it's transitive dependents.
+    if (PrebuiltModulesASTMap.contains(CurrentFile) &&
+        !PrebuiltModulesASTMap[CurrentFile].isInStableDir())
+      PrebuiltModulesASTMap[CurrentFile].updateDependentsNotInStableDirs(
+          PrebuiltModulesASTMap);
     CurrentFile = Filename;
   }
 
+  /// Check the header search options for a given module when considering
+  /// if the module comes from stable directories.
+  bool ReadHeaderSearchOptions(const HeaderSearchOptions &HSOpts,
+                               StringRef ModuleFilename,
+                               StringRef SpecificModuleCachePath,
+                               bool Complain) override {
+    if (PrebuiltModulesASTMap.try_emplace(CurrentFile).second)
+      PrebuiltModulesASTMap[CurrentFile].setInStableDir(!StableDirs.empty());
+
+    if (PrebuiltModulesASTMap[CurrentFile].isInStableDir())
+      PrebuiltModulesASTMap[CurrentFile].setInStableDir(
+          areOptionsInStableDir(StableDirs, HSOpts));
+
+    return false;
+  }
+
+  /// Accumulate vfsoverlays used to build these prebuilt modules.
   bool ReadHeaderSearchPaths(const HeaderSearchOptions &HSOpts,
                              bool Complain) override {
     std::vector<std::string> VFSOverlayFiles = HSOpts.VFSOverlayFiles;
-    PrebuiltModuleVFSMap.insert(
-        {CurrentFile, llvm::StringSet<>(VFSOverlayFiles)});
+
+    if (PrebuiltModulesASTMap.try_emplace(CurrentFile).second)
+      PrebuiltModulesASTMap[CurrentFile].setInStableDir(!StableDirs.empty());
+
+    PrebuiltModulesASTMap[CurrentFile].setVFS(
+        llvm::StringSet<>(VFSOverlayFiles));
+
     return checkHeaderSearchPaths(
         HSOpts, ExistingHSOpts, Complain ? &Diags : nullptr, ExistingLangOpts);
   }
@@ -127,11 +189,12 @@ class PrebuiltModuleListener : public ASTReaderListener {
 private:
   PrebuiltModuleFilesT &PrebuiltModuleFiles;
   llvm::SmallVector<std::string> &NewModuleFiles;
-  PrebuiltModuleVFSMapT &PrebuiltModuleVFSMap;
+  PrebuiltModulesAttrsMap &PrebuiltModulesASTMap;
   const HeaderSearchOptions &ExistingHSOpts;
   const LangOptions &ExistingLangOpts;
   DiagnosticsEngine &Diags;
   std::string CurrentFile;
+  const llvm::SmallVector<StringRef> &StableDirs;
 };
 
 /// Visit the given prebuilt module and collect all of the modules it
@@ -139,13 +202,23 @@ class PrebuiltModuleListener : public ASTReaderListener {
 static bool visitPrebuiltModule(StringRef PrebuiltModuleFilename,
                                 CompilerInstance &CI,
                                 PrebuiltModuleFilesT &ModuleFiles,
-                                PrebuiltModuleVFSMapT &PrebuiltModuleVFSMap,
+                                PrebuiltModulesAttrsMap &PrebuiltModulesASTMap,
                                 DiagnosticsEngine &Diags) {
+
+  // Gather the set of stable directories to use as transitive dependencies are
+  // discovered.
+  llvm::SmallVector<StringRef> StableDirs;
+  std::string SysrootToUse(CI.getHeaderSearchOpts().Sysroot);
+  if (!SysrootToUse.empty() &&
+      (llvm::sys::path::root_directory(SysrootToUse) != SysrootToUse))
+    StableDirs = {SysrootToUse, CI.getHeaderSearchOpts().ResourceDir};
+
   // List of module files to be processed.
   llvm::SmallVector<std::string> Worklist;
-  PrebuiltModuleListener Listener(ModuleFiles, Worklist, PrebuiltModuleVFSMap,
+
+  PrebuiltModuleListener Listener(ModuleFiles, Worklist, PrebuiltModulesASTMap,
                                   CI.getHeaderSearchOpts(), CI.getLangOpts(),
-                                  Diags);
+                                  Diags, StableDirs);
 
   Listener.visitModuleFile(PrebuiltModuleFilename,
                            serialization::MK_ExplicitModule);
@@ -368,16 +441,18 @@ class DependencyScanningAction : public tooling::ToolAction {
     auto *FileMgr = ScanInstance.createFileManager(FS);
     ScanInstance.createSourceManager(*FileMgr);
 
-    // Store the list of prebuilt module files into header search options. This
-    // will prevent the implicit build to create duplicate modules and will
-    // force reuse of the existing prebuilt module files instead.
-    PrebuiltModuleVFSMapT PrebuiltModuleVFSMap;
+    // Store a mapping of prebuilt module files and their properties like header
+    // search options. This will prevent the implicit build to create duplicate
+    // modules and will force reuse of the existing prebuilt module files
+    // instead.
+    PrebuiltModulesAttrsMap PrebuiltModulesASTMap;
+
     if (!ScanInstance.getPreprocessorOpts().ImplicitPCHInclude.empty())
       if (visitPrebuiltModule(
               ScanInstance.getPreprocessorOpts().ImplicitPCHInclude,
               ScanInstance,
               ScanInstance.getHeaderSearchOpts().PrebuiltModuleFiles,
-              PrebuiltModuleVFSMap, ScanInstance.getDiagnostics()))
+              PrebuiltModulesASTMap, ScanInstance.getDiagnostics()))
         return false;
 
     // Create the dependency collector that will collect the produced
@@ -407,7 +482,7 @@ class DependencyScanningAction : public tooling::ToolAction {
     case ScanningOutputFormat::Full:
       MDC = std::make_shared<ModuleDepCollector>(
           Service, std::move(Opts), ScanInstance, Consumer, Controller,
-          OriginalInvocation, std::move(PrebuiltModuleVFSMap));
+          OriginalInvocation, std::move(PrebuiltModulesASTMap));
       ScanInstance.addDependencyCollector(MDC);
       break;
     }
diff --git a/clang/lib/Tooling/DependencyScanning/ModuleDepCollector.cpp b/clang/lib/Tooling/DependencyScanning/ModuleDepCollector.cpp
index d715ef874e002..45e70bdcf68c5 100644
--- a/clang/lib/Tooling/DependencyScanning/ModuleDepCollector.cpp
+++ b/clang/lib/Tooling/DependencyScanning/ModuleDepCollector.cpp
@@ -39,10 +39,20 @@ const std::vector<std::string> &ModuleDeps::getBuildArguments() {
   return std::get<std::vector<std::string>>(BuildInfo);
 }
 
+void PrebuiltModuleASTAttrs::updateDependentsNotInStableDirs(
+    PrebuiltModulesAttrsMap &PrebuiltModulesMap) {
+  setInStableDir();
+  for (const auto Dep : ModuleFileDependents) {
+    if (!PrebuiltModulesMap[Dep].isInStableDir())
+      return;
+    PrebuiltModulesMap[Dep].updateDependentsNotInStableDirs(PrebuiltModulesMap);
+  }
+}
+
 static void
 optimizeHeaderSearchOpts(HeaderSearchOptions &Opts, ASTReader &Reader,
                          const serialization::ModuleFile &MF,
-                         const PrebuiltModuleVFSMapT &PrebuiltModuleVFSMap,
+                         const PrebuiltModulesAttrsMap &PrebuiltModulesASTMap,
                          ScanningOptimizations OptimizeArgs) {
   if (any(OptimizeArgs & ScanningOptimizations::HeaderSearch)) {
     // Only preserve search paths that were used during the dependency scan.
@@ -87,11 +97,13 @@ optimizeHeaderSearchOpts(HeaderSearchOptions &Opts, ASTReader &Reader,
           } else {
             // This is not an implicitly built module, so it may have different
             // VFS options. Fall back to a string comparison instead.
-            auto VFSMap = PrebuiltModuleVFSMap.find(MF->FileName);
-            if (VFSMap == PrebuiltModuleVFSMap.end())
+            auto PrebuiltModulePropIt =
+                PrebuiltModulesASTMap.find(MF->FileName);
+            if (PrebuiltModulePropIt == PrebuiltModulesASTMap.end())
               return;
             for (std::size_t I = 0, E = VFSOverlayFiles.size(); I != E; ++I) {
-              if (VFSMap->second.contains(VFSOverlayFiles[I]))
+              if (PrebuiltModulePropIt->second.getVFS().contains(
+                      VFSOverlayFiles[I]))
                 VFSUsage[I] = true;
             }
           }
@@ -157,32 +169,6 @@ static void optimizeCWD(CowCompilerInvocation &BuildInvocation, StringRef CWD) {
   }
 }
 
-/// Check a subset of invocation options to determine whether the current
-/// context can safely be considered as stable.
-static bool areOptionsInStableDir(CowCompilerInvocation &BuildInvocation,
-                                  const ArrayRef<StringRef> StableDirs) {
-  const auto &HSOpts = BuildInvocation.getHeaderSearchOpts();
-  assert(isPathInStableDir(StableDirs, HSOpts.Sysroot) &&
-         "Sysroots differ between module dependencies and current TU");
-
-  assert(isPathInStableDir(StableDirs, HSOpts.ResourceDir) &&
-         "ResourceDirs differ between module dependencies and current TU");
-
-  for (const auto &Entry : HSOpts.UserEntries) {
-    if (!Entry.IgnoreSysRoot)
-      continue;
-    if (!isPathInStableDir(StableDirs, Entry.Path))
-      return false;
-  }
-
-  for (const auto &SysPrefix : HSOpts.SystemHeaderPrefixes) {
-    if (!isPathInStableDir(StableDirs, SysPrefix.Prefix))
-      return false;
-  }
-
-  return true;
-}
-
 static std::vector<std::string> splitString(std::string S, char Separator) {
   SmallVector<StringRef> Segments;
   StringRef(S).split(Segments, Separator, /*MaxSplit=*/-1, /*KeepEmpty=*/false);
@@ -257,6 +243,29 @@ bool dependencies::isPathInStableDir(const ArrayRef<StringRef> Directories,
   });
 }
 
+bool dependencies::areOptionsInStableDir(const ArrayRef<StringRef> Directories,
+                                         const HeaderSearchOptions &HSOpts) {
+  assert(isPathInStableDir(Directories, HSOpts.Sysroot) &&
+         "Sysroots differ between module dependencies and current TU");
+
+  assert(isPathInStableDir(Directories, HSOpts.ResourceDir) &&
+         "ResourceDirs differ between module dependencies and current TU");
+
+  for (const auto &Entry : HSOpts.UserEntries) {
+    if (!Entry.IgnoreSysRoot)
+      continue;
+    if (!isPathInStableDir(Directories, Entry.Path))
+      return false;
+  }
+
+  for (const auto &SysPrefix : HSOpts.SystemHeaderPrefixes) {
+    if (!isPathInStableDir(Directories, SysPrefix.Prefix))
+      return false;
+  }
+
+  return true;
+}
+
 static CowCompilerInvocation
 makeCommonInvocationForModuleBuild(CompilerInvocation CI) {
   CI.resetNonModularOptions();
@@ -825,7 +834,7 @@ ModuleDepCollectorPP::handleTopLevelModule(const Module *M) {
                      ScanningOptimizations::VFS)))
               optimizeHeaderSearchOpts(BuildInvocation.getMutHeaderSearchOpts(),
                                        *MDC.ScanInstance.getASTReader(), *MF,
-    ...
[truncated]

@cyndyishida
Copy link
Member Author

From the previous PR:

This implementation adds some overhead for two reasons.

It deserializes out more content from prebuilt pcm files.
When a module path is fully realized, it traverses down into the leaf dependency and updates whether they are in the sysroot.
I measured this overhead on a large-ish objc project that uses pch's and toy examples against the macos sdk. I did not observe a noticeable difference.

@cyndyishida
Copy link
Member Author

ping

Comment on lines 5837 to 5840
size_t FilenameLen = ModuleDir.size() + Record[7] + 1;
auto Filename = ResolveImportedPath(PathBuf, Blob, ModuleDir);
StringRef FilenameAsRequested = Filename->substr(0, FilenameLen);
StringRef ExternalFilename = Filename->substr(FilenameLen);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you explain how this is supposed to work? IIRC Blob is the concatenation of the as-requested name and the name, so this doesn't make sense to me. (To be frank, even the original code seems incorrect.)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIRC Blob is the concatenation of the as-requested name and the name.

That is correct.

Basically Record[7] encodes the length of the basename (or filename without the full path) of the vfs version of the file. https://github.com/llvm/llvm-project/blob/main/clang/lib/Serialization/ASTWriter.cpp#L1902

ModuleDir is the directory of the vfs-version of the directory those vfs mapped files live in. https://github.com/llvm/llvm-project/pull/132237/files#diff-c61a3cce4bfa099b5af032fa83cbf1563f0af4bf58dc112b39571d74b6b681c1R5784

With that combination I derived the full length of of the vfs file path. The external path is the rest.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, got it.

What's confusing to me is that Filename is essentially "<ModuleDir>/<FilenameAsRequested><Filename>", which doesn't mean anything. And ExternalFilename is just <Filename>, which is unresolved.

Why don't we split Blob like we do elsewhere (Blob.substr(0, AsRequestedLength) and Blob.substr(AsRequestedLength)) and call ResolveImportedPath() on both?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Applied your suggestion except for calling ResolveImportedPath on both parts of the substring. Let me know if I am missing something, but the external path should always be absolute, so I think it would be unnecessary. It also reduces some logic of needing to copy strings since

The return value must go out of scope before the next call to \c ResolveImportedPath.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both of these go through PreparePathForOutput() in ASTWriter, so both of these can be relative.

* Also add a check for absolute paths in `isPathInStableDir`
Comment on lines +5859 to +5866
std::string ResolvedFilenameStr;
if (!UnresolvedFilename.empty()) {
SmallString<0> FilenameBuf;
FilenameBuf.reserve(256);
auto ResolvedFilename =
ResolveImportedPath(FilenameBuf, UnresolvedFilename, ModuleDir);
ResolvedFilenameStr = ResolvedFilename->str();
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The visitor API makes it sound like it'll pass Filename even if it's the same as FilenameAsRequested (which is what I'd expect it to), but here we pass an empty string. Why?

Also, we're performing unnecessary allocations. This matters with the amount of files the scanner is processing. The ResolvedFilenameAsRequested->str() allocation is unnecessary when UnresolvedFilename is empty (i.e. same as UnresolvedFilenameAsRequested) - you can just reuse the same buffer for both.

When the two paths differ, shuffling data between std::string and SmallString is also unnecessary.

I have something like this in mind:

auto [UnresolvedFilenameAsRequested, UnresolvedFilename] =
    getUnresolvedInputFilenames(Record, Blob);
auto TmpPathBufRef = ResolveImportedPath(
    PathBuf, UnresolvedFilenameAsRequested, ModuleDir); // no allocation (amortized)
StringRef FilenameAsRequested = *TmpPathBufRef;
std::string FilenameAsRequestedBuf;
StringRef Filename;
if (UnresolvedFilename.empty())
  Filename = FilenameAsRequested;
else {
  FilenameAsRequestedBuf = FilenameAsRequested->str(); // allocation (often elided)
  FilenameAsRequested = FilenameAsRequestedBuf;
  TmpPathBufRef =
      ResolveImportedPath(PathBuf, UnresolvedFilename, ModuleDir); // no allocation (amortized)
  Filename = *TmpPathBufRef;
}
shouldContinue = Listener.visitInputFile(
    FilenameAsRequested, Filename, isSystemFile, Overridden,
    /*IsExplicitModule=*/false);

Alternatively, we can have two "global" PathBuf strings so that we can amortize even the often-elided allocation:

auto [UnresolvedFilenameAsRequested, UnresolvedFilename] =
    getUnresolvedInputFilenames(Record, Blob);
auto FilenameAsRequestedBuf = ResolveImportedPath(
    PathBuf1, UnresolvedFilenameAsRequested, ModuleDir); // no allocation (amortized)
std::optional<TemporarilyOwnedStringRef> FilenameBuf;
StringRef Filename;
if (UnresolvedFilename.empty())
  Filename = *FilenameAsRequestedBuf;
else {
  FilenameBuf = ResolveImportedPath(
    PathBuf2, UnresolvedFilename, ModuleDir); // no allocation (amortized)
  Filename = **FilenameBuf;
}
shouldContinue = Listener.visitInputFile(
    *FilenameAsRequestedBuf, Filename, isSystemFile, Overridden,
    /*IsExplicitModule=*/false);

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
clang:modules C++20 modules and Clang Header Modules clang Clang issues not falling into any other category
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants