New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
storage: Splits get write locks on the entire span #14992
Conversation
Reviewed 1 of 1 files at r1. Comments from Reviewable |
Might be good to test more directly, though. Review status: all files reviewed at latest revision, all discussions resolved, some commit checks pending. Comments from Reviewable |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can confirm that this removes the flake.
This was meant to go in as a part of cockroachdb#14833. De-flakes TestUnsplittableRange. Fixes cockroachdb#14881
Added a unit test. |
I was thinking we'd test it at a higher level, specifically inducing the troublesome race. I'm OK with this, too. Reviewed 1 of 1 files at r2. Comments from Reviewable |
A higher-level test for this would be tricky to write, because we don't have any hooks in place to detect when the Put has been blocked by the command queue. |
@bdarnell I was thinking about this in relation to #16075 (comment) and it's pretty unfortunate that splits need to declare write locks on their left and right-hand sides. Naively I would expect that a read lock would be sufficient to block all writes on the range that may concurrently mess with the stats, but I assume we're running into issues with #14342 (writes in the future affecting the stats). If that is, in fact, the issue, we might consider adjusting the perceived timestamp of the split in the eyes of the |
You mean putting the split in the command queue as a read at the maximum possible timestamp, but still using its real timestamp for the timestamp cache when the operation completes? I think that would work, although it feels hacky. In that case then reads to the LHS would be uninterrupted, and reads to the RHS would be allowed except for the ones in flight as the split occurs (since they would no longer be addressed to the correct range). |
In one sense, I agree, but it also seems logically consistent to say that the operation on the LHS is a read that needs to depend on all writes and therefore operates at the maximum timestamp. Do we have any notion of this to deal with inline values? I may also be blowing this out of proportion, but it seems like blocking all reads to a range during splits as serious enough to warrant some special logic.
How is that handled now? Are reads to the RHS which are waiting in the cmdQueue during the split all rejected with a |
Reads always depend on all writes in their past; it's weird for a read to depend on writes in its future.
Currently, we apply the command's timestamp to all of the keys it touches, even if they have inline values.
Yes, exactly. We might be able to allow the RHS reads through as long as everything is properly synchronized, so that there is a smaller window of disruption. I don't think that's safe today, though. |
@bdarnell In the process of reworking the CommandQueue, I noticed that the logic from #14342 is disabled when a command enters the CommandQueue with an empty timestamp. This means that we could simply pass an empty timestamp and avoid the "maximum possible timestamp" hack we were discussing above. Once we do that, we can declare a read span for the entire range and allow concurrent reads to go through (although the locking on the RHS might be tricky). Think it's worth pursuing this? |
That sounds plausible. The question is just whether that behavior is something we want to commit to in the command queue - when, aside from this trick, would we ever have both timestamped and untimestamped access to the same keys? |
It's actually something we're already committed to and I think it makes sense. We already do exactly this for locally-scoped keys. Providing an empty timestamp to the command queue is basically a way of saying "don't treat this as an MVCC operation".
Currently, I don't believe so. It doesn't seem like a big leap to me though. |
This was meant to go in as a part of #14833.
De-flakes TestUnsplittableRange.
Fixes #14881