Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

storage/engine,libroach: 1x write amplification on ingest sst #34886

Merged
merged 1 commit into from
Feb 25, 2019

Conversation

dt
Copy link
Member

@dt dt commented Feb 14, 2019

This uses the new flag added in facebook/rocksdb#4172 to avoid needing to make global_seq_no-related edits to SSTs, and thus avoid the SST copying those edits forced us to do to preserve the raft log immutability when hard-linking directly to side load SSTs.

This is gated on a cluster version that is defined well after the upgrade to rocks 5.17, since using the flag requires that no older versions of RocksDB (<5.16),
will ever be used to read these SSTs, as they will lack the seq_no that the older versions need for correctness.

Release note (performance improvement): reduce write-amplification during bulk-loading (IMPORT and RESTORE)

@dt dt requested review from tbg, petermattis and a team February 14, 2019 00:15
@cockroach-teamcity
Copy link
Member

This change is Reviewable

@dt dt requested a review from a team February 14, 2019 01:24
Copy link
Collaborator

@petermattis petermattis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Were you able to measure a perf benefit or otherwise verify that this reduces disk traffic during IMPORT? Or perhaps I'd be satisfied if we could have a unit test that this is working properly. I'm not quite sure how that would work, but perhaps generate an SST and ingest it and verify that the SST is unmodified. Somehow. Maybe.

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @dt, @petermattis, and @tbg)


c-deps/libroach/db.cc, line 700 at r1 (raw file):

  // only safe to disable the seqno if older versions of RocksDB (<5.16) will
  // not be used to read these SSTs.
  // Before editing a file to insert a seq_no, RocksDB also checks the option

Nit: add a blank line between paragraphs.

@dt dt force-pushed the rocks-no-seq-no branch 2 times, most recently from a4a221a to f7c78bb Compare February 21, 2019 19:29
@dt
Copy link
Member Author

dt commented Feb 21, 2019

Realized it wasn't working because rocks checks the allow_modification flag before it checks if it is actually using seqno at all. Added a || skip_seqno to the allow_modification flag indeed, it seems to be working as expected. On a very fast MBP SSD that is otherwise unloaded, the speedup on a direct (overlapping) import of ~1.7gb of tpcc data looks like about 5%, but past experience is that using available disk IO is a bigger deal on more throughput constrained disks (e.g. ebs) or with other traffic contending for disk io. I observed the process wrote about 1.5gb less, but with overlapping import, the heavy recompactions dominate total write traffic so it only worked out to about a 20% reduction in total write traffic.

@dt
Copy link
Member Author

dt commented Feb 21, 2019

@petermattis turns out we did actually have a unit test this that should have started failing if the original change had worked as intended -- I found (and updated it) when I fixed it. Here's the sst apply vs copy with and without this change enabled:
screen shot 2019-02-21 at 2 37 12 pm

Copy link
Collaborator

@petermattis petermattis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. Minor nit remaining.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @dt and @tbg)


c-deps/libroach/db.cc, line 700 at r1 (raw file):

Previously, petermattis (Peter Mattis) wrote…

Nit: add a blank line between paragraphs.

Ping.


c-deps/libroach/include/libroach.h, line 412 at r2 (raw file):

// it would have tried to modify any of the files' sequence numbers rather than
// editing the files in place. If skip_seqno is true, it will skip the
// seq_no entirely -- this is only safe if this will only ever be read by Rocks

This is a bit confusing. skip_seqno indicates that the global seqno doesn't need to be written into the sstable. The global seqno is written into the LSM state, just outside of the sstable. I'd prefer to keep this parameter named as write_global_seqno, and to clarify the commentary.

This uses he new flag added in
facebook/rocksdb#4172 to avoid the global_seq_no
edits and the SST copying they forced us to do to preserve the raft log.

Passing this flag has to be gated on a cluster version that is defined
well after the upgrade to rocks 5.17 -- using the flag requires that
older versions of RocksDB (< 5.16) will never be used to read, as they
will still be looking for and using the seq_no in the file.

Release note (performance improvement): reduce write-amplification during bulk-loading (IMPORT and RESTORE)
Copy link
Member Author

@dt dt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @petermattis and @tbg)


c-deps/libroach/db.cc, line 700 at r1 (raw file):

Previously, petermattis (Peter Mattis) wrote…

Ping.

Done.


c-deps/libroach/include/libroach.h, line 412 at r2 (raw file):

Previously, petermattis (Peter Mattis) wrote…

This is a bit confusing. skip_seqno indicates that the global seqno doesn't need to be written into the sstable. The global seqno is written into the LSM state, just outside of the sstable. I'd prefer to keep this parameter named as write_global_seqno, and to clarify the commentary.

Done.

@dt
Copy link
Member Author

dt commented Feb 25, 2019

bors r+

craig bot pushed a commit that referenced this pull request Feb 25, 2019
34806: util/mon: Downgrade "bytes usage increases" log message r=knz a=bdarnell

This message can be logged frequently in certain workloads, and its
appearance in the info log is not actionable since there is no way to
tie it back to the query that caused it. I see no reason to have
these messages enabled by default.

Release note: None

34886: storage/engine,libroach: 1x write amplification on ingest sst r=dt a=dt

This uses the new flag added in facebook/rocksdb#4172 to avoid needing to make global_seq_no-related edits to SSTs, and thus avoid the SST copying those edits forced us to do to preserve the raft log immutability when hard-linking directly to side load SSTs.

This is gated on a cluster version that is defined well after the upgrade to rocks 5.17, since using the flag requires that no older versions of RocksDB (<5.16),
will ever be used to read these SSTs, as they will lack the seq_no that the older versions need for correctness.

Release note (performance improvement): reduce write-amplification during bulk-loading (IMPORT and RESTORE)

34902: sql: ensure spans get anonymized in plan collection r=knz a=knz

Fixes #34889.

Release note (bug fix): The logical plans collected for display in the
web UI now also hide the details of which key ranges are scanned in
table lookups.

Co-authored-by: Ben Darnell <ben@bendarnell.com>
Co-authored-by: David Taylor <tinystatemachine@gmail.com>
Co-authored-by: Raphael 'kena' Poss <knz@cockroachlabs.com>
@craig
Copy link
Contributor

craig bot commented Feb 25, 2019

Build succeeded

@craig craig bot merged commit 4ffed77 into cockroachdb:master Feb 25, 2019
@dt dt deleted the rocks-no-seq-no branch February 26, 2019 15:06
@danhhz danhhz mentioned this pull request Feb 26, 2019
6 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants