Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

release-19.2: batcheval: use write-batches to apply small SSTables #41768

Merged
merged 1 commit into from
Oct 21, 2019

Conversation

dt
Copy link
Member

@dt dt commented Oct 21, 2019

Backport 1/1 commits from #41705.

/cc @cockroachdb/release


Adding very small SSTables is not great: the benefits of adding the raw SST, in terms of avoiding per-key overhead, are small for small SSTs, while the fixed costs of adding a file are the same — triggering flushes, adding to number of files that need to be compacted, etc — so when files become too small, they can actually be more expensive, per key, than just writing their contents via the regular write path.

This patch gives the caller control of how SSTs are ingested — either via the usual direct AddFile or by constructing a normal write-batch instead.

Requests marked for write-based ingestion can additionally skip the back-pressure mechanisms added for SSTable additions — since we’re not creating files directly, the normal write back pressure still applies, and these requests also don’t risk of bloating the file count limits with tiny files.

Release note: none.

@cockroach-teamcity
Copy link
Member

This change is Reviewable

Adding very small SSTables is not great: the benefits of adding the raw
SST, in terms of avoiding per-key overhead, are small for small SSTs,
while the fixed costs of adding a file are the same — triggering
flushes, adding to number of files that need to be compacted, etc — so
when files become too small, they can actually be more expensive, per
key, than just writing their contents via the regular write path.

This patch gives the caller control of how SSTs are ingested — either
via the usual direct AddFile or by constructing a normal write-batch
instead - in the existing AddSSTable API.

An alternative approach would be to have the caller choose a different
API if they don't want to add an actual SSTable e.g. switch to use Put
or WriteBatch. However the AddSSTable method does much more than just
"add an SSTable" and much of the ingestion pipeline is built around
those specific semantics: ingesting keys with arbitrary timestamps,
key-by-key collision detection, cheaper MVCC stats support, etc.
Switching between API methods would mean duplicating some of those
semantics into other methods, keeping them in sync, and asking the
client to keep track of what is supported by which methods. Instead,
allowing the the existing method simply change how it writes its result
lets small batches continue to use it, just without being forced to then
incur the cost of writing a small file.

The result is we have just one bulk-ingest KV API, which provides the
specific semantics the bulk-ingestion pipeline needs, regardless of
how it writes its result to the storage engine.

Requests marked for write-based ingestion can additionally skip the
back-pressure mechanisms added for SSTable additions — since we’re not
creating files directly, the normal write back pressure still applies,
and these requests also don’t risk of bloating the file count limits
with tiny files.

Release note (performance improvement): bulk-ingestion
Copy link
Contributor

@ajwerner ajwerner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

@dt dt merged commit de88463 into cockroachdb:release-19.2 Oct 21, 2019
@dt dt deleted the backport19.2-41705 branch October 21, 2019 20:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants