Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Get rid of Hash::dummy from BinaryCacheStore #3935

Merged
129 changes: 80 additions & 49 deletions src/libstore/binary-cache-store.cc
Expand Up @@ -142,17 +142,10 @@ struct FileSource : FdSource
}
};

void BinaryCacheStore::addToStore(const ValidPathInfo & info, Source & narSource,
RepairFlag repair, CheckSigsFlag checkSigs)
ref<const ValidPathInfo> BinaryCacheStore::addToStoreCommon(
Source & narSource, RepairFlag repair, CheckSigsFlag checkSigs,
std::function<ValidPathInfo(HashResult)> mkInfo)
{
assert(info.narSize);

if (!repair && isValidPath(info.path)) {
// FIXME: copyNAR -> null sink
narSource.drain();
Ericson2314 marked this conversation as resolved.
Show resolved Hide resolved
return;
}

auto [fdTemp, fnTemp] = createTempFile();

AutoDelete autoDelete(fnTemp);
Expand All @@ -162,23 +155,24 @@ void BinaryCacheStore::addToStore(const ValidPathInfo & info, Source & narSource
/* Read the NAR simultaneously into a CompressionSink+FileSink (to
write the compressed NAR to disk), into a HashSink (to get the
NAR hash), and into a NarAccessor (to get the NAR listing). */
HashSink fileHashSink(htSHA256);
HashSink fileHashSink { htSHA256 };
std::shared_ptr<FSAccessor> narAccessor;
HashSink narHashSink { htSHA256 };
{
FdSink fileSink(fdTemp.get());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is beyond the scope of this PR, but since we're revisiting these store API methods it's worth noting that we could optimize away the tmpfile if we know a little bit more in advance. Although optimizing away a tmpfile seems unimpressive, it changes the time taken from O(sum(steps)) to O(max(steps)), which is significant when compression and upload take similar amounts of time.

When the file is known to be small (low hanging fruit)

Add a size parameter to addToStoreCommon or use a fancy sink that only writes to file when it crosses a limit. This is similar to what LocalStore::addToStoreFromDump does.

When we know the nar hash in advance

For http binary caches this does require us to change the binary cache filenames to match uncompressed hashes, which seems to be equivalent and can only result in one-time duplication in existing caches when new paths are uploaded.
I don't know yet how IPFS caches fit into this picture, but if those can compress after hashing, this would be beneficial.
Another reason to do this is so we don't need to compress before we can decide to reuse an available nar file.

In this case it does make sense to have both addToStore(const ValidPathInfo & info, .....) and addToStore(....., std::function<ValidPathInfo(HashResult)) where the prior can have a default implementation in terms of the latter.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I decided not to try to optimize away the temporary file for small NARs because the overhead is likely to be insignificant compared to stuff like HTTP requests.

TeeSink teeSink(fileSink, fileHashSink);
auto compressionSink = makeCompressionSink(compression, teeSink);
TeeSource teeSource(narSource, *compressionSink);
TeeSink teeSinkCompressed { fileSink, fileHashSink };
auto compressionSink = makeCompressionSink(compression, teeSinkCompressed);
TeeSink teeSinkUncompressed { *compressionSink, narHashSink };
TeeSource teeSource { narSource, teeSinkUncompressed };
narAccessor = makeNarAccessor(teeSource);
compressionSink->finish();
fileSink.flush();
}

auto now2 = std::chrono::steady_clock::now();

auto info = mkInfo(narHashSink.finish());
auto narInfo = make_ref<NarInfo>(info);
narInfo->narSize = info.narSize;
narInfo->narHash = info.narHash;
narInfo->compression = compression;
auto [fileHash, fileSize] = fileHashSink.finish();
narInfo->fileHash = fileHash;
Expand Down Expand Up @@ -300,6 +294,41 @@ void BinaryCacheStore::addToStore(const ValidPathInfo & info, Source & narSource
writeNarInfo(narInfo);

stats.narInfoWrite++;

return narInfo;
}

void BinaryCacheStore::addToStore(const ValidPathInfo & info, Source & narSource,
RepairFlag repair, CheckSigsFlag checkSigs)
{
if (!repair && isValidPath(info.path)) {
// FIXME: copyNAR -> null sink
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't need to parse the NAR to determine the end if we make the caller responsible for ending narSource. That's what addCAToStore is doing.

Suggested change
// FIXME: copyNAR -> null sink
// FIXME: make sure all callers truncate `narSource`

nix-store --import comes to mind. It will have to parse the NAR because the import/export format doesn't have a way to determine the end by simpler means.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think agree it's better to make the caller responsible, but I'm a bit wary on changing the direction of this FIXME as it and this code already existed, I just moved it here.

I'll let @edolstra decide :).

narSource.drain();
return;
}

addToStoreCommon(narSource, repair, checkSigs, {[&](HashResult nar) {
/* FIXME reinstate these, once we can correctly do hash modulo sink as
needed. We need to throw here in case we uploaded a corrupted store path. */
// assert(info.narHash == nar.first);
// assert(info.narSize == nar.second);
return info;
}});
}

StorePath BinaryCacheStore::addToStoreFromDump(Source & dump, const string & name,
FileIngestionMethod method, HashType hashAlgo, RepairFlag repair)
{
if (method != FileIngestionMethod::Recursive || hashAlgo != htSHA256)
unsupported("addToStoreFromDump");
return addToStoreCommon(dump, repair, CheckSigs, [&](HashResult nar) {
ValidPathInfo info {
makeFixedOutputPath(method, nar.first, name),
nar.first,
};
info.narSize = nar.second;
return info;
})->path;
}

bool BinaryCacheStore::isValidPathUncached(const StorePath & storePath)
Expand Down Expand Up @@ -367,50 +396,52 @@ void BinaryCacheStore::queryPathInfoUncached(const StorePath & storePath,
StorePath BinaryCacheStore::addToStore(const string & name, const Path & srcPath,
FileIngestionMethod method, HashType hashAlgo, PathFilter & filter, RepairFlag repair)
{
// FIXME: some cut&paste from LocalStore::addToStore().
/* FIXME: Make BinaryCacheStore::addToStoreCommon support
non-recursive+sha256 so we can just use the default
implementation of this method in terms of addToStoreFromDump. */

/* Read the whole path into memory. This is not a very scalable
method for very large paths, but `copyPath' is mainly used for
small files. */
StringSink sink;
std::optional<Hash> h;
HashSink sink { hashAlgo };
if (method == FileIngestionMethod::Recursive) {
dumpPath(srcPath, sink, filter);
h = hashString(hashAlgo, *sink.s);
} else {
auto s = readFile(srcPath);
dumpString(s, sink);
h = hashString(hashAlgo, s);
readFile(srcPath, sink);
}
auto h = sink.finish().first;

ValidPathInfo info {
makeFixedOutputPath(method, *h, name),
Hash::dummy, // Will be fixed in addToStore, which recomputes nar hash
};

auto source = StringSource { *sink.s };
addToStore(info, source, repair, CheckSigs);

return std::move(info.path);
auto source = sinkToSource([&](Sink & sink) {
dumpPath(srcPath, sink, filter);
});
return addToStoreCommon(*source, repair, CheckSigs, [&](HashResult nar) {
ValidPathInfo info {
makeFixedOutputPath(method, h, name),
nar.first,
};
info.narSize = nar.second;
info.ca = FixedOutputHash {
.method = method,
.hash = h,
};
return info;
})->path;
}

StorePath BinaryCacheStore::addTextToStore(const string & name, const string & s,
const StorePathSet & references, RepairFlag repair)
{
ValidPathInfo info {
computeStorePathForText(name, s, references),
Hash::dummy, // Will be fixed in addToStore, which recomputes nar hash
};
info.references = references;

if (repair || !isValidPath(info.path)) {
StringSink sink;
dumpString(s, sink);
auto source = StringSource { *sink.s };
addToStore(info, source, repair, CheckSigs);
}

return std::move(info.path);
auto textHash = hashString(htSHA256, s);
auto path = makeTextPath(name, textHash, references);

if (!repair && isValidPath(path))
return path;

auto source = StringSource { s };
return addToStoreCommon(source, repair, CheckSigs, [&](HashResult nar) {
ValidPathInfo info { path, nar.first };
info.narSize = nar.second;
info.ca = TextHash { textHash };
info.references = references;
return info;
})->path;
}

ref<FSAccessor> BinaryCacheStore::getFSAccessor()
Expand Down
7 changes: 7 additions & 0 deletions src/libstore/binary-cache-store.hh
Expand Up @@ -72,6 +72,10 @@ private:

void writeNarInfo(ref<NarInfo> narInfo);

ref<const ValidPathInfo> addToStoreCommon(
Source & narSource, RepairFlag repair, CheckSigsFlag checkSigs,
std::function<ValidPathInfo(HashResult)> mkInfo);

public:

bool isValidPathUncached(const StorePath & path) override;
Expand All @@ -85,6 +89,9 @@ public:
void addToStore(const ValidPathInfo & info, Source & narSource,
RepairFlag repair, CheckSigsFlag checkSigs) override;

StorePath addToStoreFromDump(Source & dump, const string & name,
FileIngestionMethod method, HashType hashAlgo, RepairFlag repair) override;

StorePath addToStore(const string & name, const Path & srcPath,
FileIngestionMethod method, HashType hashAlgo,
PathFilter & filter, RepairFlag repair) override;
Expand Down
4 changes: 1 addition & 3 deletions src/libstore/store-api.hh
Expand Up @@ -454,9 +454,7 @@ public:
// FIXME: remove?
virtual StorePath addToStoreFromDump(Source & dump, const string & name,
FileIngestionMethod method = FileIngestionMethod::Recursive, HashType hashAlgo = htSHA256, RepairFlag repair = NoRepair)
{
throw Error("addToStoreFromDump() is not supported by this store");
}
{ unsupported("addToStoreFromDump"); }

/* Like addToStore, but the contents written to the output path is
a regular file containing the given string. */
Expand Down