Skip to content

Commit

Permalink
Import based on compilation outputs (#53)
Browse files Browse the repository at this point in the history
This PR adds the capability to import output files from a compilation:
to improve performance and experiment with using a "global shared index"
while remote caching a subset for lib. The github issue has some overall
mentions of this bazelbuild/rules_swift#561

Index while building is performant in clang and swift becauase it
conditionally writes data based on files in a global index cache reused
across all clang/swift invocations in the build. The overhead of 3-5%
performance hit in the whitepaper of "Index while building" was hinged
on the fact it'd be using a global index cache.  With the per library
index as rules_swift implemented this feature, "Index while building"
adds a 300% increase in build times in my testing.

Longer term, can add flags to clang and swift indexing to be aware of
remote compilation and caching, this is a quick hack incase that doesn't
ever happen.

Remote build possibilities:

The plan is to be able to use a global index on compilation fix
compilation performance and minimize the O ( M imports * N modules )
disk usage. I'd then intend to use this code to import a subset of the
global index material into the remote cache by putting it in `bazel-out`
at `swift_library`'s "index-store" path. Eventually a feature can be
added to rules_cc following the same pattern for .m, .cpp

Local build possibilities:

I plan to import swift_library cache remote cache hits from `bazel-out`
into Xcode's index.

However, importing output files as a capability is useful even if that
doesn't play out: if we orient index-import around output files, then it
doesn't have to enumerate and process an entire index ever again.
Another build related program ran aas an aspect or via BEP could read
outputs and then invoke index-import with these outputs.
  • Loading branch information
jerrymarino authored Mar 12, 2021
1 parent 44fbe01 commit d90eb46
Showing 1 changed file with 90 additions and 39 deletions.
129 changes: 90 additions & 39 deletions index-import.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -24,14 +24,16 @@ using namespace clang;
using namespace clang::index;
using namespace clang::index::writer;

static cl::list<std::string> PathRemaps("remap", cl::OneOrMore,
static cl::list<std::string> PathRemaps("remap",
cl::desc("Path remapping substitution"),
cl::value_desc("regex=replacement"));
static cl::alias PathRemapsAlias("r", cl::aliasopt(PathRemaps));

static cl::list<std::string> InputIndexPaths(cl::Positional, cl::OneOrMore,
cl::desc("<input-indexstores>"));

static cl::list<std::string> RemapFilePaths("import-output-file",
cl::desc("import-output-file="));
static cl::opt<std::string> OutputIndexPath(cl::Positional, cl::Required,
cl::desc("<output-indexstore>"));

Expand Down Expand Up @@ -175,16 +177,49 @@ static bool isUnitUpToDate(StringRef unitsPath, StringRef outputFile,
return *isUpToDateOpt;
}

// Append the path of a record inside of an index
void appendInteriorRecordPath(StringRef RecordName,
SmallVectorImpl<char> &PathBuf) {
// To avoid putting a huge number of files into the records directory, it creates
// subdirectories based on the last 2 characters from the hash.
// Note: the actual record name is a function of the bits in the record
StringRef hash2chars = RecordName.substr(RecordName.size()-2);
sys::path::append(PathBuf, hash2chars);
sys::path::append(PathBuf, RecordName);
}

static bool cloneRecord(StringRef from, StringRef to) {
// Two record files of the same name are guaranteed to have the same
// contents, because the filename contains a hash of its contents. If the
// destination record file already exists, no action needs to be taken.
if (fs::exists(to)) {
return true;
}

// With swift-5.1, fs::copy_file supports cloning. Until then, use copyfile.
int failed =
copyfile(from.str().c_str(), to.str().c_str(), nullptr, COPYFILE_CLONE);

// In parallel mode we might be racing against other threads trying to create
// the same record. To handle this, just silently drop file exists errors.
return (failed == 0 || errno == EEXIST);
}


// Returns None if the Unit file is already up to date
static Optional<IndexUnitWriter>
remapUnit(StringRef outputUnitsPath, StringRef inputUnitPath,
importUnit(StringRef outputUnitsPath, StringRef inputUnitPath,
StringRef outputRecordsPath, StringRef inputRecordsPath,
const std::unique_ptr<IndexUnitReader> &reader,
const Remapper &remapper, FileManager &fileMgr,
ModuleNameScope &moduleNames) {
// The set of remapped paths.
auto workingDir = remapper.remap(reader->getWorkingDirectory());
auto outputFile = remapper.remap(reader->getOutputFile());

// Cloning records when we've got an output records path
const auto cloneDepRecords = !outputRecordsPath.empty();

if (Incremental) {
// Check if the unit file is already up to date
SmallString<256> remappedOutputFilePath;
Expand Down Expand Up @@ -214,6 +249,11 @@ remapUnit(StringRef outputUnitsPath, StringRef inputUnitPath,
sysrootPath, moduleNames.getModuleInfo);

reader->foreachDependency([&](const IndexUnitReader::DependencyInfo &info) {
SmallString<128> inputRecordPath;
SmallString<128> outputRecordPath;
SmallString<128> outputRecordInterDir;
std::error_code createRecordDirFailed;

const auto name = info.UnitOrRecordName;
const auto moduleNameRef = moduleNames.getReference(info.ModuleName);
const auto isSystem = info.IsSystem;
Expand All @@ -237,6 +277,21 @@ remapUnit(StringRef outputUnitsPath, StringRef inputUnitPath,
break;
}
case IndexUnitReader::DependencyKind::Record:
if (cloneDepRecords) {
sys::path::append(outputRecordPath, outputRecordsPath);
appendInteriorRecordPath(info.UnitOrRecordName, outputRecordPath);

// Compute/create the new interior directory by dropping the file name
outputRecordInterDir = outputRecordPath;
sys::path::remove_filename(outputRecordInterDir);
createRecordDirFailed = fs::create_directory(outputRecordInterDir);
if (createRecordDirFailed && createRecordDirFailed != std::errc::file_exists) {
errs() << "error: failed create output record dir" << outputRecordInterDir << "\n";
}
sys::path::append(inputRecordPath, inputRecordsPath);
appendInteriorRecordPath(info.UnitOrRecordName, inputRecordPath);
cloneRecord(StringRef(inputRecordPath), StringRef(outputRecordPath));
}
writer.addRecordFile(name, file, isSystem, moduleNameRef);
break;
case IndexUnitReader::DependencyKind::File:
Expand All @@ -259,23 +314,6 @@ remapUnit(StringRef outputUnitsPath, StringRef inputUnitPath,
return writer;
}

static bool cloneRecord(StringRef from, StringRef to) {
// Two record files of the same name are guaranteed to have the same
// contents, because the filename contains a hash of its contents. If the
// destination record file already exists, no action needs to be taken.
if (fs::exists(to)) {
return true;
}

// With swift-5.1, fs::copy_file supports cloning. Until then, use copyfile.
int failed =
copyfile(from.str().c_str(), to.str().c_str(), nullptr, COPYFILE_CLONE);

// In parallel mode we might be racing against other threads trying to create
// the same record. To handle this, just silently drop file exists errors.
return (failed == 0 || errno == EEXIST);
}

static bool cloneRecords(StringRef recordsDirectory,
const std::string &inputIndexPath,
const std::string &outputIndexPath) {
Expand Down Expand Up @@ -341,45 +379,32 @@ static bool remapIndex(const Remapper &remapper,
path::append(recordsDirectory, InputIndexPath, "v5", "records");
SmallString<256> outputUnitDirectory;
path::append(outputUnitDirectory, OutputIndexPath, "v5", "units");
SmallString<256> outputRecordsDirectory;
path::append(outputRecordsDirectory, OutputIndexPath, "v5", "records");

if (not fs::is_directory(unitDirectory) ||
not fs::is_directory(recordsDirectory)) {
errs() << "error: invalid index store directory " << InputIndexPath << "\n";
return false;
}
bool success = true;

if (not cloneRecords(recordsDirectory, InputIndexPath, outputIndexPath)) {
success = false;
}

bool success = true;
FileSystemOptions fsOpts;
FileManager fileMgr{fsOpts};

std::error_code dirError;
fs::directory_iterator dir{unitDirectory, dirError};
fs::directory_iterator end;
while (dir != end && !dirError) {
const auto unitPath = dir->path();
dir.increment(dirError);

if (unitPath.empty()) {
// The directory iterator returns a single empty path, ignore it.
continue;
}

auto handleUnitPath = [&](StringRef unitPath, StringRef outputRecordsPath_) {
std::string unitReadError;
auto reader = IndexUnitReader::createWithFilePath(unitPath, unitReadError);
if (not reader) {
errs() << "error: failed to read unit file " << unitPath << "\n"
<< unitReadError;
success = false;
continue;
return;
}

ModuleNameScope moduleNames;
auto writer = remapUnit(outputUnitDirectory, unitPath, reader, remapper,
fileMgr, moduleNames);
auto writer = importUnit(outputUnitDirectory, unitPath, outputRecordsPath_,
recordsDirectory, reader, remapper, fileMgr, moduleNames);

if (writer.hasValue()) {
std::string unitWriteError;
Expand All @@ -389,6 +414,32 @@ static bool remapIndex(const Remapper &remapper,
success = false;
}
}
};

// Map over the file paths that the user provided
if (RemapFilePaths.size()) {
for (auto & path : RemapFilePaths) {
SmallString<256> outPath;
getUnitPathForOutputFile(unitDirectory, normalizePath(path), outPath, fileMgr);
handleUnitPath(outPath.c_str(), outputRecordsDirectory);
}
return success;
}

// This batch clones records in the entire index. If we're importing individual
// ouput files we don't want this.
if (not cloneRecords(recordsDirectory, InputIndexPath, outputIndexPath)) {
success = false;
}

// Process and map the entire index directory
std::error_code dirError;
fs::directory_iterator dir{unitDirectory, dirError};
fs::directory_iterator end;
while (dir != end && !dirError) {
const auto unitPath = dir->path();
dir.increment(dirError);
handleUnitPath(unitPath, "");
}

if (dirError) {
Expand Down

0 comments on commit d90eb46

Please sign in to comment.