-
Notifications
You must be signed in to change notification settings - Fork 449
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
db: add CanDeterministicallySingleDelete #3077
Conversation
92ea423
to
3f35cf1
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 0 of 1 files reviewed, 1 unresolved discussion (waiting on @jbowens)
iterator.go
line 1921 at r1 (raw file):
// InternalNext advances internal Iterator state forward to expose the // InternalKeyKind of the next internal key with a user key equal to Key().
So InternalNext
behaves as if nothing was actually done to the iterator, yes?
This looks fine.
We may have to do something about stability of the key/value pair already returned, if we want the caller to be able to use them after calling InternalNext
. I guess the key is already stable since we are not fiddling with i.key
. And cloning the LazyValue
does feel unnecessary since CockroachDB eagerly extracts the value and parses the MVCCMetadata
proto. In which case we could just document the instability of the value.
3f35cf1
to
ee67608
Compare
0bbfed2
to
c870aeb
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is RFAL
Reviewable status: 0 of 9 files reviewed, 1 unresolved discussion (waiting on @sumeerbhola)
iterator.go
line 1921 at r1 (raw file):
Previously, sumeerbhola wrote…
So
InternalNext
behaves as if nothing was actually done to the iterator, yes?
This looks fine.We may have to do something about stability of the key/value pair already returned, if we want the caller to be able to use them after calling
InternalNext
. I guess the key is already stable since we are not fiddling withi.key
. And cloning theLazyValue
does feel unnecessary since CockroachDB eagerly extracts the value and parses theMVCCMetadata
proto. In which case we could just document the instability of the value.
Done.
c870aeb
to
000d39d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 9 of 9 files at r2, all commit messages.
Reviewable status: all files reviewed, 8 unresolved discussions (waiting on @jbowens)
iterator.go
line 1931 at r2 (raw file):
// InternalNext, unlike all other Iterator methods, exposes internal LSM state. // InternalNext advances the Iterator's internal iterator to the next shadowed // key with the a user key equal to Key(). When a key is overwritten or deleted,
nit: ... with a user ...
iterator.go
line 1935 at r2 (raw file):
// allows the caller to see whether an obsolete internal key exists with the // current Key(), and what it's key kind is. Note that the existence of an // internal key is nondeterministic and depedent on internal LSM state. These
dependent
iterator.go
line 1956 at r2 (raw file):
return InternalNextExhausted, base.InternalKeyKindInvalid } i.lastPositioningOp = unknownLastPositionOp
I was thinking that we will lose the monotonic seek optimization during intent resolution of a batch of intents because of this, but that may be ok. But then I noticed the loop with the call to ltIter.NextEngineKey
in https://github.com/cockroachdb/cockroach/blob/e23591614d7ae4447a1d788c0e23e68a24018878/pkg/storage/mvcc.go#L4912 which suggests we already don't hit that optimization.
iterator.go
line 1965 at r2 (raw file):
// variant so that the caller does not need to check i.iter.Error() // in the common case that the next internal key has a new user key. if i.err = firstError(i.err, i.iter.Error()); i.err != nil {
isn't i.err
guaranteed to be nil prior to this assignment, so we could do i.err = i.iter.Error()
?
metamorphic/ops.go
line 1294 at r2 (raw file):
_, _ = i.InternalNext() // The return value of an InternalNext is dependent on internal LSM state // and non-deterministic, so we don't record it. Including the operation
should we also have a test option that makes iterInternalNextOp
a noop?
metamorphic/ops.go
line 1298 at r2 (raw file):
// not change the result of any other Iterator operation that should be // deterministic, regardless of the outcome of InternalNext. h.Recordf("%s // %v", o, i.Error())
why is the error deterministic?
Is it because the only errors we happen to encounter here in the metamorphic test are caused by i.err
already being true prior to the call to i.InternalNext()
or errors.New("switching from reverse to forward via peeking is prohibited")
?
testdata/iter_histories/internal_next
line 109 at r2 (raw file):
flush ----
Can you add a comment stating that we have two versions for each user key.
000d39d
to
3b05204
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TFTR!
Reviewable status: 6 of 9 files reviewed, 7 unresolved discussions (waiting on @sumeerbhola)
iterator.go
line 1935 at r2 (raw file):
Previously, sumeerbhola wrote…
dependent
Done.
iterator.go
line 1956 at r2 (raw file):
Previously, sumeerbhola wrote…
I was thinking that we will lose the monotonic seek optimization during intent resolution of a batch of intents because of this, but that may be ok. But then I noticed the loop with the call to
ltIter.NextEngineKey
in https://github.com/cockroachdb/cockroach/blob/e23591614d7ae4447a1d788c0e23e68a24018878/pkg/storage/mvcc.go#L4912 which suggests we already don't hit that optimization.
Hmm, I don't see any reason why we couldn't leave lastPositioningOp
untouched, but these things can be subtle.
iterator.go
line 1965 at r2 (raw file):
Previously, sumeerbhola wrote…
isn't
i.err
guaranteed to be nil prior to this assignment, so we could doi.err = i.iter.Error()
?
Yeah, Done.
metamorphic/ops.go
line 1294 at r2 (raw file):
Previously, sumeerbhola wrote…
should we also have a test option that makes
iterInternalNextOp
a noop?
I think we should omit it because it allows us to assert that errors are deterministic. Some of the InternalNext branches are already just about noops, and the metamorphic tests are good at constructing a variety of internal LSM states that'll hit them.
metamorphic/ops.go
line 1298 at r2 (raw file):
Previously, sumeerbhola wrote…
why is the error deterministic?
Is it because the only errors we happen to encounter here in the metamorphic test are caused byi.err
already being true prior to the call toi.InternalNext()
orerrors.New("switching from reverse to forward via peeking is prohibited")
?
Yeah, that's right. Added a comment here.
testdata/iter_histories/internal_next
line 109 at r2 (raw file):
Previously, sumeerbhola wrote…
Can you add a comment stating that we have two versions for each user key.
Done.
3b05204
to
1dd1490
Compare
6af9173
to
fa1f4d6
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@sumeerbhola — I reworked this to keep the "internal next" operation internal, instead exposing a purpose-built function pebble.CanDeterministicallySingleDelete
that also handles the knowledge of what internal key kind sequences are okay. PTAL!
Reviewable status: 0 of 9 files reviewed, 7 unresolved discussions (waiting on @sumeerbhola)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 9 of 9 files at r3, all commit messages.
Reviewable status: all files reviewed, 10 unresolved discussions (waiting on @jbowens and @nvanbenschoten)
iterator.go
line 2862 at r3 (raw file):
// // If CanDeterministicallySingleDelete returns true AND the key at the iterator // position is not modified between the creation ofi the Iterator and the commit
of
iterator.go
line 2864 at r3 (raw file):
// position is not modified between the creation ofi the Iterator and the commit // of a batch containing a SingleDelete over the key, then the caller can be // assured that SingleDelete is equivalent to Delete.
How about:
... is equivalent to Delete on the local engine. For batches that will be replicated, CanDeterministicallySingleDelete
can return true on one engine, but it may not be true on another engine, since the first engine may have had multiple SETs collapsed into one.
iterator.go
line 2880 at r3 (raw file):
// SINGLEDEL; SET; SINGLEDEL; SET; // is not deterministic, because the first SET may consume the // second SINGLEDEL resulting in two SETs shadowed by one SINGLEDEL.
It is safe because we have SETWITHDEL. And hopefully we did a similar check when writing the SINGLEDEL so we don't have to continue now.
e14bd8c
to
aaaebc5
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TFTR!
Reviewable status: 8 of 9 files reviewed, 10 unresolved discussions (waiting on @nvanbenschoten and @sumeerbhola)
iterator.go
line 2862 at r3 (raw file):
Previously, sumeerbhola wrote…
of
Done.
iterator.go
line 2864 at r3 (raw file):
Previously, sumeerbhola wrote…
How about:
... is equivalent to Delete on the local engine. For batches that will be replicated,CanDeterministicallySingleDelete
can return true on one engine, but it may not be true on another engine, since the first engine may have had multiple SETs collapsed into one.
Done.
iterator.go
line 2880 at r3 (raw file):
Previously, sumeerbhola wrote…
It is safe because we have SETWITHDEL. And hopefully we did a similar check when writing the SINGLEDEL so we don't have to continue now.
Ah, right!
b941a86
to
ee4f553
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 6 of 9 files at r3, 2 of 2 files at r4, all commit messages.
Reviewable status: all files reviewed, 11 unresolved discussions (waiting on @jbowens and @sumeerbhola)
-- commits
line 18 at r4:
Is the thinking that we will add calls to CanDeterministicallySingleDelete
in testing builds of CockroachDB? Behind a buildutil.CrdbTestBuild
?
If so, where will we add those? Just during evaluation of ResolveIntent(Range) requests above Raft on the leaseholder? Or also before applying WriteBatches
that contain SingleDelete
operations on all replicas below Raft?
The second check would be more comprehensive, but also more difficult to introduce without adding logic to iterate over every WriteBatch during raft application to search for SingleDelete
operations. It makes me wonder whether this kind of assertion would be more natural to push into Pebble. For example, I could imagine a mode where Pebble could be configured to check for determinism whenever applying any SingleDelete operation. We might not want to be reading on the apply path though, so that may not work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: all files reviewed, 11 unresolved discussions (waiting on @nvanbenschoten and @sumeerbhola)
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
Is the thinking that we will add calls to
CanDeterministicallySingleDelete
in testing builds of CockroachDB? Behind abuildutil.CrdbTestBuild
?If so, where will we add those? Just during evaluation of ResolveIntent(Range) requests above Raft on the leaseholder? Or also before applying
WriteBatches
that containSingleDelete
operations on all replicas below Raft?The second check would be more comprehensive, but also more difficult to introduce without adding logic to iterate over every WriteBatch during raft application to search for
SingleDelete
operations. It makes me wonder whether this kind of assertion would be more natural to push into Pebble. For example, I could imagine a mode where Pebble could be configured to check for determinism whenever applying any SingleDelete operation. We might not want to be reading on the apply path though, so that may not work.
The intention is to call it in production builds just in evaluation. We expect the regression introduced to be small, because we already advance over these internal keys in NextEngineKey. Although this function doesn’t change the externally-visible iterator position, it does reduce the work the subsequent Next needs to perform.
Long-term I would like to explore pushing the single deletion decision point into Pebble (some thoughts in cockroachdb/cockroach#114492). I think it’d be nice to either completely obscure the single deletion optimization to the user, or make it simply a suggestion that Pebble may ignore if it’s unsafe or too expensive to prove it’s safe. Doing so might regress in terms of resource utilization per intent resolved since we need to introduce a read somewhere there wasn’t previously, but that seems like it should be offset by its wider applicability, like the raft log.
Previously, jbowens (Jackson Owens) wrote…
To be clear, we intend to call it in production during evaluation only as an invariant assertion. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: all files reviewed, 11 unresolved discussions (waiting on @nvanbenschoten and @sumeerbhola)
iterator.go
line 2880 at r3 (raw file):
Previously, jbowens (Jackson Owens) wrote…
Ah, right!
On second thought, maybe we should still check beneath the observed SINGLEDEL because in the context of MVCCResolveWriteIntentRange
we're going to iterate over those internal keys during the next call to NextEngineKey
anyways: may as well validate that the already-written SINGLEDEL obeyed invariants too?
Add a new specialized CanDeterministicallySingleDelete function that allows a user to inspect hidden internal keys within the current user key to determine whether the behavior of a SingleDelete of the current Key would be deterministic, assuming no writes are made to the key between Iterator creation and commit of the SingleDelete. The CanDeterministicallySingleDelete func may only be called on an Iterator oriented in the forward direction and does not change the external Iterator position. During intent resolution, CockroachDB will write a SINGLEDEL tombstone to delete the intent if the intent's value indicates that it must have only been written once. Subtle logic that cuts across the stack makes this optimization possible, and one possible explanation for a recent instance of replica divergence (cockroachdb/cockroach#114421) is a bug in this logic. We anticipate using CanDeterministicallySingleDelete within CockroachDB's intent resolution to validate this logic at runtime. There is some potential for this function to drive the decision to use SingleDelete when creating batches for local Engine application only. Informs cockroachdb/cockroach#114492.
ee4f553
to
be24b2e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 2 of 2 files at r5.
Reviewable status: all files reviewed (commit messages unreviewed), 11 unresolved discussions (waiting on @nvanbenschoten)
TFTR! |
Add a new specialized CanDeterministicallySingleDelete function that allows a
user to inspect hidden internal keys within the current user key to determine
whether the behavior of a SingleDelete of the current Key would be
deterministic, assuming no writes are made to the key between Iterator creation
and commit of the SingleDelete. The CanDeterministicallySingleDelete func may
only be called on an Iterator oriented in the forward direction and does not
change the external Iterator position.
During intent resolution, CockroachDB will write a SINGLEDEL tombstone to
delete the intent if the intent's value indicates that it must have only been
written once. Subtle logic that cuts across the stack makes this optimization
possible, and one possible explanation for a recent instance of replica
divergence (cockroachdb/cockroach#114421) is a bug in this logic. We anticipate
using CanDeterministicallySingleDelete within CockroachDB's intent resolution
to validate this logic at runtime.
There is some potential for this function to drive the decision to use
SingleDelete when creating batches for local Engine application only.
Informs cockroachdb/cockroach#114492.