Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

db: implement SingleDelete #222

Closed
petermattis opened this issue Aug 14, 2019 · 3 comments

Comments

@petermattis
Copy link
Collaborator

@petermattis petermattis commented Aug 14, 2019

The SingleDelete operation is similar to the regular Delete operation, but with slightly different semantics. While Delete deletes all of the earlier versions of a key. SingleDelete deletes only the previous version. These semantics can be tricky to use, but they can provide a significant performance improvement.

Delete is often described as laying down a "tombstone". That tombstone shadows all of the earlier versions of a key. During compactions, the deletion tombstone needs to be propagated down to lower levels until it reaches a level where the system can determine there are no older versions of the key (this usually occurs when the tombstone reaches the bottom level).

A common pattern is to have a key that is written exactly once and then deleted exactly once. When that pattern occurs, compacting the deletion tombstone down to the bottom level is excessive. As soon as a compaction sees both the SingleDelete operation and the Set operation for a key, it can elide both from the compaction output. SingleDelete can be viewed as "antimatter" which combusts when it touches a Set, leaving no trace of either. The benefit can be significant. For example, if a Set("a") is followed quickly by SingleDelete("a"), the Set and SingleDelete can combine and be elided when the memtable is flushed and avoid ever writing the SingleDelete to disk (other than in the WAL).

The primary complexity to implementing SingleDelete is in adding the logic to support it in compactionIter. An adjacent SingleDelete and Set operation should cause no entry to be elided, but we have to pay attention to whether those operations straddle a snapshot stripe. RocksDB says that combining SingleDelete and Delete/Merge is undefined. That feels a bit unsatisfactory. A SingleDelete adjacent to a Delete should result in a Delete. ASingleDelete adjacent to a Merge could also result in a Delete (i.e. we transform the SingleDelete into a full Delete).

@hueypark

This comment has been minimized.

Copy link
Contributor

@hueypark hueypark commented Aug 15, 2019

Hi, @petermattis may I implement SingleDelete?
I'm not working on this project fulltime and this issue takes a slightly longer time, But Pebble for CockroachDB is a great idea and I want to contribute to this project.

@petermattis

This comment has been minimized.

Copy link
Collaborator Author

@petermattis petermattis commented Aug 15, 2019

@hueypark go for it. CockroachDB doesn’t currently use SingleDelete, but we’ve played around with doing so. Let me know if you have any questions as you get into the implementation. You’ll probably want to look at RocksDB to see if there are any gotchas (e.g. look at their test cases).

@hueypark

This comment has been minimized.

Copy link
Contributor

@hueypark hueypark commented Aug 15, 2019

@petermattis Thanks! I go for it!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
2 participants
You can’t perform that action at this time.