-
Notifications
You must be signed in to change notification settings - Fork 429
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
db: sstable output size during compactions should consider range tombstones #167
Comments
I was thinking the iterator underlying compaction iterator could present the range tombstone start keys inline with the point keys. When compaction iterator encounters such a key, it would pass the corresponding end key to Will take a look at what's been done already and whether this is feasible given the current code structure. |
|
I tried out @petermattis's original suggestion today: It is pretty close to correct but this edge case is troubling me:
|
Is the concern that there is no other key being output in the compaction? I think there needs to be a final call into |
Yes there is no other key to pass to |
Previously range tombstones could cause output files that would later undergo excessively large compactions due to overlap with the grandparent level. This happened because only point keys were considered for cutting output files according to grandparent overlap. Thus if a range tombstone extended over a region devoid of point keys, compaction would never split files within that region. To fix this we now split output files at grandparent boundaries regardless of the presence of point keys. That means a region covered by a range tombstone and devoid of point keys can be split at grandparent boundaries if needed to prevent future huge compactions. Fixes cockroachdb#167.
Previously range tombstones could cause output files that would later undergo excessively large compactions due to overlap with the grandparent level. This happened because only point keys were considered for cutting output files according to grandparent overlap. Thus if a range tombstone extended over a region devoid of point keys, compaction would never split files within that region. To fix this we now split output files at grandparent boundaries regardless of the presence of point keys. That means a region covered by a range tombstone and devoid of point keys can be split at grandparent boundaries if needed to prevent future huge compactions. Fixes cockroachdb#167.
Previously range tombstones could cause output files that would later undergo excessively large compactions due to overlap with the grandparent level. This happened because only point keys were considered for cutting output files according to grandparent overlap. Thus if a range tombstone extended over a region devoid of point keys, compaction would never split files within that region. To fix this we now split output files at grandparent boundaries regardless of the presence of point keys. That means a region covered by a range tombstone and devoid of point keys can be split at grandparent boundaries if needed to prevent future huge compactions. Fixes #167.
See facebook/rocksdb#3977 for the similar issue in RocksDB. The TLDR is the the sstable output size does not truncate range tombstones. This can result in an sstable covering an excessively large number of sstables in the level below, resulting in an excessively large compaction. There is a comment in the code about this:
I'm not clear on exactly how I'd want to incorporate range tombstones in the
shouldStopBefore
decision. See the RocksDB issues for a problematic scenario. It is possible that we need to truncate and split range tombstones to avoid creating sstables which overlap too much in the grandparent level.The text was updated successfully, but these errors were encountered: