Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

storage/engine/rocksdb: set optimize_filters_for_hits #10085

Merged

Conversation

petermattis
Copy link
Collaborator

@petermattis petermattis commented Oct 19, 2016

Do not create bloom filters for the last level (i.e. the largest level
which contains data in the LSM store). Setting this option reduces the
size of the bloom filters by 10x. This is significant given that bloom
filters require 1.25 bytes (10 bites) per key which can translate into
gigabytes of memory given typical key and value sizes. The downside is
that bloom filters will only be usable on the higher levels, but that
seems acceptable. We typically see read amplification of 5-6x on
clusters (i.e. there are 5-6 levels of sstables) which means we'll
achieve 80-90% of the benefit of having bloom filters on every level for
only 10% of the memory cost.

There is no significant impact on block_writer performance tests. The
most affected benchmark is:

name                             old time/op   new time/op   delta
MVCCGet1Version8Bytes_RocksDB-8   47.7µs ± 1%   53.3µs ± 2%  +11.69%  (p=0.000 n=10+10)

But this delta is a little spurious because the "old" data contained
4 sstables while the new contains 5.

Fixes #10050


This change is Reviewable

@petermattis petermattis force-pushed the pmattis/optimize-filters-for-hits branch from 3aaff79 to 0f82611 Compare October 19, 2016 16:10
@petermattis
Copy link
Collaborator Author

Cc @cockroachdb/stability

@tbg
Copy link
Member

tbg commented Oct 19, 2016

LGTM

Do not create bloom filters for the last level (i.e. the largest level
which contains data in the LSM store). Setting this option reduces the
size of the bloom filters by 10x. This is significant given that bloom
filters require 1.25 bytes (10 bits) per key which can translate into
gigabytes of memory given typical key and value sizes. The downside is
that bloom filters will only be usable on the higher levels, but that
seems acceptable. We typically see read amplification of 5-6x on
clusters (i.e. there are 5-6 levels of sstables) which means we'll
achieve 80-90% of the benefit of having bloom filters on every level for
only 10% of the memory cost.

There is no significant impact on `block_writer` performance tests. The
most affected benchmark is:

name                             old time/op   new time/op   delta
MVCCGet1Version8Bytes_RocksDB-8   47.7µs ± 1%   53.3µs ± 2%  +11.69%  (p=0.000 n=10+10)

But this delta is a little spurious because the "old" data contained 4
sstables while the new contains 5.

Fixes cockroachdb#10050
@petermattis petermattis force-pushed the pmattis/optimize-filters-for-hits branch from 0f82611 to 919f219 Compare October 19, 2016 16:27
@petermattis petermattis merged commit fb0252d into cockroachdb:master Oct 19, 2016
@petermattis petermattis deleted the pmattis/optimize-filters-for-hits branch October 19, 2016 16:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants