-
Notifications
You must be signed in to change notification settings - Fork 3.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
storage/engine,libroach: 1x write amplification on ingest sst #34886
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Were you able to measure a perf benefit or otherwise verify that this reduces disk traffic during IMPORT? Or perhaps I'd be satisfied if we could have a unit test that this is working properly. I'm not quite sure how that would work, but perhaps generate an SST and ingest it and verify that the SST is unmodified. Somehow. Maybe.
Reviewable status:
complete! 1 of 0 LGTMs obtained (waiting on @dt, @petermattis, and @tbg)
c-deps/libroach/db.cc, line 700 at r1 (raw file):
// only safe to disable the seqno if older versions of RocksDB (<5.16) will // not be used to read these SSTs. // Before editing a file to insert a seq_no, RocksDB also checks the option
Nit: add a blank line between paragraphs.
a4a221a
to
f7c78bb
Compare
Realized it wasn't working because rocks checks the allow_modification flag before it checks if it is actually using seqno at all. Added a |
@petermattis turns out we did actually have a unit test this that should have started failing if the original change had worked as intended -- I found (and updated it) when I fixed it. Here's the sst apply vs copy with and without this change enabled: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good. Minor nit remaining.
Reviewable status:
complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @dt and @tbg)
c-deps/libroach/db.cc, line 700 at r1 (raw file):
Previously, petermattis (Peter Mattis) wrote…
Nit: add a blank line between paragraphs.
Ping.
c-deps/libroach/include/libroach.h, line 412 at r2 (raw file):
// it would have tried to modify any of the files' sequence numbers rather than // editing the files in place. If skip_seqno is true, it will skip the // seq_no entirely -- this is only safe if this will only ever be read by Rocks
This is a bit confusing. skip_seqno
indicates that the global seqno doesn't need to be written into the sstable. The global seqno is written into the LSM state, just outside of the sstable. I'd prefer to keep this parameter named as write_global_seqno
, and to clarify the commentary.
This uses he new flag added in facebook/rocksdb#4172 to avoid the global_seq_no edits and the SST copying they forced us to do to preserve the raft log. Passing this flag has to be gated on a cluster version that is defined well after the upgrade to rocks 5.17 -- using the flag requires that older versions of RocksDB (< 5.16) will never be used to read, as they will still be looking for and using the seq_no in the file. Release note (performance improvement): reduce write-amplification during bulk-loading (IMPORT and RESTORE)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status:
complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @petermattis and @tbg)
c-deps/libroach/db.cc, line 700 at r1 (raw file):
Previously, petermattis (Peter Mattis) wrote…
Ping.
Done.
c-deps/libroach/include/libroach.h, line 412 at r2 (raw file):
Previously, petermattis (Peter Mattis) wrote…
This is a bit confusing.
skip_seqno
indicates that the global seqno doesn't need to be written into the sstable. The global seqno is written into the LSM state, just outside of the sstable. I'd prefer to keep this parameter named aswrite_global_seqno
, and to clarify the commentary.
Done.
bors r+ |
34806: util/mon: Downgrade "bytes usage increases" log message r=knz a=bdarnell This message can be logged frequently in certain workloads, and its appearance in the info log is not actionable since there is no way to tie it back to the query that caused it. I see no reason to have these messages enabled by default. Release note: None 34886: storage/engine,libroach: 1x write amplification on ingest sst r=dt a=dt This uses the new flag added in facebook/rocksdb#4172 to avoid needing to make global_seq_no-related edits to SSTs, and thus avoid the SST copying those edits forced us to do to preserve the raft log immutability when hard-linking directly to side load SSTs. This is gated on a cluster version that is defined well after the upgrade to rocks 5.17, since using the flag requires that no older versions of RocksDB (<5.16), will ever be used to read these SSTs, as they will lack the seq_no that the older versions need for correctness. Release note (performance improvement): reduce write-amplification during bulk-loading (IMPORT and RESTORE) 34902: sql: ensure spans get anonymized in plan collection r=knz a=knz Fixes #34889. Release note (bug fix): The logical plans collected for display in the web UI now also hide the details of which key ranges are scanned in table lookups. Co-authored-by: Ben Darnell <ben@bendarnell.com> Co-authored-by: David Taylor <tinystatemachine@gmail.com> Co-authored-by: Raphael 'kena' Poss <knz@cockroachlabs.com>
Build succeeded |
This uses the new flag added in facebook/rocksdb#4172 to avoid needing to make global_seq_no-related edits to SSTs, and thus avoid the SST copying those edits forced us to do to preserve the raft log immutability when hard-linking directly to side load SSTs.
This is gated on a cluster version that is defined well after the upgrade to rocks 5.17, since using the flag requires that no older versions of RocksDB (<5.16),
will ever be used to read these SSTs, as they will lack the seq_no that the older versions need for correctness.
Release note (performance improvement): reduce write-amplification during bulk-loading (IMPORT and RESTORE)