release-19.2: batcheval: use write-batches to apply small SSTables #41768

dt · 2019-10-21T16:25:57Z

Backport 1/1 commits from #41705.

/cc @cockroachdb/release

Adding very small SSTables is not great: the benefits of adding the raw SST, in terms of avoiding per-key overhead, are small for small SSTs, while the fixed costs of adding a file are the same — triggering flushes, adding to number of files that need to be compacted, etc — so when files become too small, they can actually be more expensive, per key, than just writing their contents via the regular write path.

This patch gives the caller control of how SSTs are ingested — either via the usual direct AddFile or by constructing a normal write-batch instead.

Requests marked for write-based ingestion can additionally skip the back-pressure mechanisms added for SSTable additions — since we’re not creating files directly, the normal write back pressure still applies, and these requests also don’t risk of bloating the file count limits with tiny files.

Release note: none.

cockroach-teamcity · 2019-10-21T16:26:05Z

This change is

Adding very small SSTables is not great: the benefits of adding the raw SST, in terms of avoiding per-key overhead, are small for small SSTs, while the fixed costs of adding a file are the same — triggering flushes, adding to number of files that need to be compacted, etc — so when files become too small, they can actually be more expensive, per key, than just writing their contents via the regular write path. This patch gives the caller control of how SSTs are ingested — either via the usual direct AddFile or by constructing a normal write-batch instead - in the existing AddSSTable API. An alternative approach would be to have the caller choose a different API if they don't want to add an actual SSTable e.g. switch to use Put or WriteBatch. However the AddSSTable method does much more than just "add an SSTable" and much of the ingestion pipeline is built around those specific semantics: ingesting keys with arbitrary timestamps, key-by-key collision detection, cheaper MVCC stats support, etc. Switching between API methods would mean duplicating some of those semantics into other methods, keeping them in sync, and asking the client to keep track of what is supported by which methods. Instead, allowing the the existing method simply change how it writes its result lets small batches continue to use it, just without being forced to then incur the cost of writing a small file. The result is we have just one bulk-ingest KV API, which provides the specific semantics the bulk-ingestion pipeline needs, regardless of how it writes its result to the storage engine. Requests marked for write-based ingestion can additionally skip the back-pressure mechanisms added for SSTable additions — since we’re not creating files directly, the normal write back pressure still applies, and these requests also don’t risk of bloating the file count limits with tiny files. Release note (performance improvement): bulk-ingestion

ajwerner

dt requested review from ajwerner, nvanbenschoten and petermattis October 21, 2019 16:25

dt force-pushed the backport19.2-41705 branch from e814ee3 to 7db6142 Compare October 21, 2019 17:38

dt force-pushed the backport19.2-41705 branch from 7db6142 to f938107 Compare October 21, 2019 17:45

ajwerner approved these changes Oct 21, 2019

View reviewed changes

dt merged commit de88463 into cockroachdb:release-19.2 Oct 21, 2019

dt deleted the backport19.2-41705 branch October 21, 2019 20:19

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

release-19.2: batcheval: use write-batches to apply small SSTables #41768

release-19.2: batcheval: use write-batches to apply small SSTables #41768

dt commented Oct 21, 2019

cockroach-teamcity commented Oct 21, 2019

ajwerner left a comment •

edited by petermattis

Loading

release-19.2: batcheval: use write-batches to apply small SSTables #41768

release-19.2: batcheval: use write-batches to apply small SSTables #41768

Conversation

dt commented Oct 21, 2019

cockroach-teamcity commented Oct 21, 2019

ajwerner left a comment • edited by petermattis Loading

Choose a reason for hiding this comment

ajwerner left a comment •

edited by petermattis

Loading