New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
storage: don't corrupt unsafe value buffer in replayTransactionalWrite #52234
storage: don't corrupt unsafe value buffer in replayTransactionalWrite #52234
Conversation
Fixes cockroachdb#49482. Fixes cockroachdb#50386. Fixes cockroachdb#50683. This commit fixes a bug in replayTransactionalWrite that resulted in incorrect "transaction %s with sequence %d has a different value %+v after recomputing from what was written" errors being thrown when write requests were reissued, often due to transaction refreshes after partially successful batches. The bug was caused by an unsafe byte buffer acquired through `Iterator.UnsafeValue` being used after its iterator had been moved. Moving an iterator invalidates and potentially corrupts unsafe memory. This caused a sanity check later in the function to occasionally fail. I had a hard time reproducing this in a unit test. Even with very large keys, very large values, interspersed compactions, and a durable Pebble instance, I couldn't create a situation where moving the iterator actually caused corruption in value previously retrieved using `iter.UnsafeValue`. Maybe I'm missing a trick there. Regardless, this certainly seems to fix the failures observed in cockroachdb#50683, as I can no longer reproduce it under stress. This will need to be backported to release-20.1, release-19.2, and release-19.1, as all three release branches are susceptible to the spurious errors that this bug can result in. Release note (bug fix): Large write requests no longer have a chance of erroneously throwing a "transaction with sequence has a different value" error.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice find! I bet that wasn't fun to track down.
Maybe I'm missing a trick there. Regardless, this
certainly seems to fix the failures observed in #50683, as I can no
longer reproduce it under stress.
Unsafe values returned by Pebble and RocksDB point into cache memory. (Unsafe keys are different in that they point to temporary buffers because the keys need to be prefix decompressed). The cache memory is very unlikely to be reused in a short time period unless the cache is small. Perhaps try a test where the cache is zero size, or a test where there is concurrent pressure on the cache. Or we could just agree that this was definitely a bug and move on.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @itsbilal and @nvanbenschoten)
pkg/storage/mvcc.go, line 1398 at r1 (raw file):
// calculated value on this replay is the same as the one we've previously // written. if !bytes.Equal(value, writtenValue) {
Could you instead move this error check about the if valueFn != nil
block? That does change the error handling somewhat , but it would avoid having to copy writtenValue
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TFTR!
Perhaps try a test where the cache is zero size
I tried this out but still wasn't able to get a repro. I do wonder if there is anything we can do to uncover this form of bug through our existing test suite. For instance, maybe we could intentionally poison unsafe key and value buffers whenever an iterator is moved under race builds.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @itsbilal and @petermattis)
pkg/storage/mvcc.go, line 1398 at r1 (raw file):
Previously, petermattis (Peter Mattis) wrote…
Could you instead move this error check about the
if valueFn != nil
block? That does change the error handling somewhat , but it would avoid having to copywrittenValue
.
When valueFn != nil
, we use valueFn
to determine the value
to compare against. So I don't think this is possible, unless I'm misunderstanding your suggestion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried this out but still wasn't able to get a repro. I do wonder if there is anything we can do to uncover this form of bug through our existing test suite. For instance, maybe we could intentionally poison unsafe key and value buffers whenever an iterator is moved under race builds.
Hmm, we do that somewhere. Perhaps in Pebble. Let me track it down.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @itsbilal and @nvanbenschoten)
pkg/storage/mvcc.go, line 1398 at r1 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
When
valueFn != nil
, we usevalueFn
to determine thevalue
to compare against. So I don't think this is possible, unless I'm misunderstanding your suggestion.
Nope, you understood it. My suggestion was bad. I was looking for places where we set writtenValue
, but neglected to notice that we set value
within that block of code. I think the current change is as good as it gets.
I'm interested to see that. I wonder if applying this to CRDB's test suite could catch any misuses in CRDB itself. This might be a good knob for Pebble to expose. bors r+ |
I just took a look right now and couldn't find what I thought was there. Perhaps I'm misremembering. This does sound like a useful idea. I'll take another look later and if I don't find anything I'll file an issue. |
Build succeeded: |
Fixes #49482.
Fixes #50386.
Fixes #50683.
This commit fixes a bug in replayTransactionalWrite that resulted in
incorrect "transaction %s with sequence %d has a different value %+v
after recomputing from what was written" errors being thrown when write
requests were reissued, often due to transaction refreshes after
partially successful batches. The bug was caused by an unsafe byte
buffer acquired through
Iterator.UnsafeValue
being used after itsiterator had been moved. Moving an iterator invalidates and potentially
corrupts unsafe memory. This caused a sanity check later in the function
to occasionally fail.
I had a hard time reproducing this in a unit test. Even with very large
keys, very large values, interspersed compactions, and a durable Pebble
instance, I couldn't create a situation where moving the iterator
actually caused corruption in value previously retrieved using
iter.UnsafeValue
. Maybe I'm missing a trick there. Regardless, thiscertainly seems to fix the failures observed in #50683, as I can no
longer reproduce it under stress.
This will need to be backported to release-20.1, release-19.2, and
release-19.1, as all three release branches are susceptible to the
spurious errors that this bug can result in.
Release note (bug fix): Large write requests no longer have a chance
of erroneously throwing a "transaction with sequence has a different
value" error.