Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

storage: investigate whether facebook/rocksdb#6973 should be backported #50182

Closed
petermattis opened this issue Jun 13, 2020 · 5 comments
Closed
Assignees
Labels
C-investigation Further steps needed to qualify. C-label will change.

Comments

@petermattis
Copy link
Collaborator

facebook/rocksdb#6973

Fix a bug that causes iterator to return wrong result in a rare data race

The bug fixed in #1816 is applicable to iterator too. If a put and DB flush happens just between iterator to get latest sequence number and getting super version, empty result for the key or an older value can be returned, which is wrong.

@blathers-crl
Copy link

blathers-crl bot commented Jun 13, 2020

Hi @petermattis, I've guessed the C-ategory of your issue and suitably labeled it. Please re-label if inaccurate.

While you're here, please consider adding an A- label to help keep our repository tidy.

🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is otan.

@blathers-crl blathers-crl bot added the C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. label Jun 13, 2020
@petermattis petermattis added this to To Do (this milestone) in (Deprecated) Storage Jun 13, 2020
@petermattis petermattis added C-investigation Further steps needed to qualify. C-label will change. and removed C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. labels Jun 13, 2020
@jbowens
Copy link
Collaborator

jbowens commented Jun 16, 2020

It looks like a very unlikely race but worthwhile to backport. I think the crux of the race is that if you grab a sequence number before grabbing the superversion/readState, some keys visible at the sequence number you grabbed may have already been overwritten with more recent sequence numbers, causing you to think they didn't exist at all at your sequence number.

  • Thread A writes Set(k, v1), updating the memtable. Thread A tries to read the value it just wrote and grabs a sequence number x to read from, but gets descheduled before it can acquire a reference to the super version.
  • Thread B does a Set(k, v2) at sequence number x+1 and flushes the memtable.
  • Thread A wakes up, grabs the new superversion. When it looks for the key k, it only finds k with sequence number x+1, so it returns that the key doesn’t exist.

Pebble doesn’t have the same problem for Get or NewIter, because we grab the read state before we pick a sequence number to read from. I think there’s a related (very unlikely) race with NewSnapshot though, because we grab the snapshot’s sequence number before we lock d.mu in order to append to the snapshot list. If d.mu is contended and we get really unlikely, it seems like it would be possible for new keys to be flushed and eventually overwrite the snapshot's keys if the flush/compaction grabbed the snapshot list before NewSnapshot can append its sequence number.

@petermattis
Copy link
Collaborator Author

I think there’s a related (very unlikely) race with NewSnapshot though, because we grab the snapshot’s sequence number before we lock d.mu in order to append to the snapshot list. If d.mu is contended and we get really unlikely, it seems like it would be possible for new keys to be flushed and eventually overwrite the snapshot's keys if the flush/compaction grabbed the snapshot list before NewSnapshot can append its sequence number.

Yeah, I think there is a problem here. I think we need to grab the sequence number atomically with adding the snapshot, which means grabbing the sequence number while holding DB.mu. Seems feasible to write a test which would demonstrate this badness.

jbowens added a commit to jbowens/pebble that referenced this issue Jun 16, 2020
Fix a race in NewSnapshot that could cause recently-written keys to be
missing. Previously, NewSnapshot's reading of the current sequence
number and appending to the database's snapshot list was not atomic.
This meant that a concurrent write and flush or compaction might cause
keys at the snapshot number to be dropped. This commit locks d.mu before
reading the current sequence number, which prevents flushes and
compactions from reading d.mu.snapshots before we've updated it.

Also, this commit adds a new test case to exercise this race. The test
case doesn't reliably fail before the fix, but run under `stress` it
fails within the first batch of runs.

Discovered as a part of investigating cockroachdb/cockroach#50182.
jbowens added a commit to jbowens/pebble that referenced this issue Jun 16, 2020
Fix a race in NewSnapshot that could cause recently-written keys to be
missing. Previously, NewSnapshot's reading of the current sequence
number and appending to the database's snapshot list was not atomic.
This meant that a concurrent write and flush or compaction might cause
keys at the snapshot number to be dropped. This commit locks d.mu before
reading the current sequence number, which prevents flushes and
compactions from reading d.mu.snapshots before we've updated it.

Also, this commit adds a new test case to exercise this race. The test
case doesn't reliably fail before the fix, but run under `stress` it
fails within the first batch of runs.

Discovered as a part of investigating cockroachdb/cockroach#50182.
jbowens added a commit to cockroachdb/pebble that referenced this issue Jun 16, 2020
Fix a race in NewSnapshot that could cause recently-written keys to be
missing. Previously, NewSnapshot's reading of the current sequence
number and appending to the database's snapshot list was not atomic.
This meant that a concurrent write and flush or compaction might cause
keys at the snapshot number to be dropped. This commit locks d.mu before
reading the current sequence number, which prevents flushes and
compactions from reading d.mu.snapshots before we've updated it.

Also, this commit adds a new test case to exercise this race. The test
case doesn't reliably fail before the fix, but run under `stress` it
fails within the first batch of runs.

Discovered as a part of investigating cockroachdb/cockroach#50182.
jbowens added a commit to jbowens/pebble that referenced this issue Jun 16, 2020
Fix a race in NewSnapshot that could cause recently-written keys to be
missing. Previously, NewSnapshot's reading of the current sequence
number and appending to the database's snapshot list was not atomic.
This meant that a concurrent write and flush or compaction might cause
keys at the snapshot number to be dropped. This commit locks d.mu before
reading the current sequence number, which prevents flushes and
compactions from reading d.mu.snapshots before we've updated it.

Also, this commit adds a new test case to exercise this race. The test
case doesn't reliably fail before the fix, but run under `stress` it
fails within the first batch of runs.

Discovered as a part of investigating cockroachdb/cockroach#50182.
jbowens added a commit to cockroachdb/pebble that referenced this issue Jun 16, 2020
Fix a race in NewSnapshot that could cause recently-written keys to be
missing. Previously, NewSnapshot's reading of the current sequence
number and appending to the database's snapshot list was not atomic.
This meant that a concurrent write and flush or compaction might cause
keys at the snapshot number to be dropped. This commit locks d.mu before
reading the current sequence number, which prevents flushes and
compactions from reading d.mu.snapshots before we've updated it.

Also, this commit adds a new test case to exercise this race. The test
case doesn't reliably fail before the fix, but run under `stress` it
fails within the first batch of runs.

Discovered as a part of investigating cockroachdb/cockroach#50182.
@jbowens
Copy link
Collaborator

jbowens commented Jun 22, 2020

@petermattis How far back do you think we should backport the RocksDB bug?

@petermattis
Copy link
Collaborator Author

Responded verbally already. Responding here for posterity: 19.1, 19.2, and 20.1.

@jbowens jbowens closed this as completed Jun 23, 2020
(Deprecated) Storage automation moved this from To Do (this milestone) to Done (milestone C) Jun 23, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-investigation Further steps needed to qualify. C-label will change.
Projects
None yet
Development

No branches or pull requests

2 participants