Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

roachpb: spell out how observed timestamps pertain to follower reads #51555

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

aayushshah15
Copy link
Contributor

Release note: None

@cockroach-teamcity
Copy link
Member

This change is Reviewable

@aayushshah15
Copy link
Contributor Author

It took me a while to understand the observed timestamp property and come to the observation in the patch. Please let me know if it's incorrect. However, given that this stuff is so tricky, I felt it was worthwhile including some talk around follower reads & observed timestamps here.

Copy link
Member

@nvanbenschoten nvanbenschoten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, this stuff is confusing. I'm very glad to see that you're thinking about it.

What you have here seems correct to me, but I'm hoping we can improve it a bit more. You say that "the reduced uncertainty window only pertains to requests served by the leaseholder and not ones served by followers" which seems to be both accurate and also not saying all that much because of how ambiguous it is. What do you mean by "only pertains"? We currently call limitTxnMaxTimestamp during follower reads and use observed timestamps off of the follower's clock. I think that's just straight-up wrong. It sounds like you've come to the same conclusion. But then the question becomes – can we use any observed timestamp when performing a follower read? Does an observed timestamp off the current leaseholder replica tell us anything? I think so, as long as we still forward the observed timestamp by the start of the current lease, but we should think through that more.

I'd also generalize the comment a little. The max offset vs. closed timestamp duration part is interesting, but it seems too specific. That doesn't really change anything about the fact that an observed timestamp from a follower replica tells us nothing about causality between a read and values in the range that the read might observe, right?

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @andreimatei and @nvanbenschoten)

@aayushshah15 aayushshah15 force-pushed the observed_timestamps_follower_reads branch from 29af488 to 7fa5923 Compare July 21, 2020 15:18
@aayushshah15
Copy link
Contributor Author

What do you mean by "only pertains"?

I improved the language around this, should be clearer now.

It sounds like you've come to the same conclusion.

Yup, but see the section on follower reads at the end. Currently, with the target duration being greater than max offset, at the time these follower read requests are served (which would be more than closed_timestamp_target_duration in the future), the observed timestamp of such a non-leaseholder store would be greater than read_timestamp + max_offset. I suppose the sentiment I was trying to convey was "an observed timestamp from a follower store does say something about causality of these reads with the range's values, though nothing useful".

Does an observed timestamp off the current leaseholder replica tell us anything?

Makes sense, but I see that you have a TODO saying exactly this:

// TODO(nvanbenschoten): This should use the lease's node id.

Do you remember if there was a reason you didn't make that change?

Copy link
Contributor

@andreimatei andreimatei left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm glad we're improving this comment.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @aayushshah15 and @andreimatei)


pkg/roachpb/data.proto, line 396 at r1 (raw file):

  // There are two invariants necessary for this property to hold:
  //
  // 1. a leaseholder's clock must always be equal to or greater than the

I find this paragraph pretty confusing; let's take the opportunity to refine it.
To understand it you have to connect a bunch of dots. It's talking about the hazard where intents have been written at timestamp 10, txn commits at ts 20, and someone observes ts 15 before the intents are finally written at ts 20. The observer of 15 might ignore the values @ 20. We protect against this hazard by ensuring that nobody can observe 15 if the writing txn committed.
It'd be good for the paragraph to spell this out, and also to have some words about specifically how we bump all the clocks before committing. Does the txnCommitter do that? The EndTxn batch can have the CanForwardReadTimestamp bit set, in which case the txn's timestamp can be decided when evaluating this batch, so presumably the bumping of timestamp needs to happen on the return path, before telling the client that the writer succeeded. But I don't see it... How does this work @nvanbenschoten ?


pkg/roachpb/data.proto, line 402 at r1 (raw file):

  // committing, even after writing intents on remote Ranges. To accommodate
  // this situation, transactions ensure that at the time of their commit, any
  // leaseholder for a Range that contains one of its intent has an HLC clock

s/its intent/its intents

since you're taking the blame anyway

But are you sure you want to take the blame on this paragraph? I don't think you're changed anything.


pkg/roachpb/data.proto, line 406 at r1 (raw file):

  // TODO(nvanbenschoten): This is violated by txn refreshes. See #36431.
  //
  // 2. a store's clock must always be equal to or greater than the timestamp of

I find it confusing to talk about stores here. Let's just talk about replicas (of which the leaseholder is one).


pkg/roachpb/data.proto, line 449 at r1 (raw file):

  // on a copy of the slice.
  //
  // ### Follower Reads:

I'd note here somewhere that commonly follower reads have no uncertainty - when coming from the as of system time syntax. Which means that they can observe the "causal inversion" phenomenon (seeing the reply post but not the original post in a forum thread example).


pkg/roachpb/data.proto, line 451 at r1 (raw file):

  // ### Follower Reads:
  //
  // If the store serving a request weren't a leaseholder for the range, we

I think there's something funny with the first statement. s/weren't/isn't ?


pkg/roachpb/data.proto, line 453 at r1 (raw file):

  // If the store serving a request weren't a leaseholder for the range, we
  // can't always guarantee that invariant 2 stated above in our formalization
  // is upheld. If the closed timestamp duration is greater than the max offset,

I don't think I really follow what we're talking about here. "invariant 2" is about leases and leaseholder. It always holds. You want to talk about the followers clocks and what properties they have, right?. It seems to me that this has nothing to do with the invariants above, or the property that they give us (which is also about the leaseholder). I'd like to re-read this paragraph when it's put in that light -> how the followers' clocks differ from the leaseholders'.
No?

@aayushshah15
Copy link
Contributor Author


pkg/roachpb/data.proto, line 453 at r1 (raw file):

"invariant 2" is about leases and leaseholder

My intention was to make invariant 2 a little more general than this, which is why I changed leaseholders to stores in that statement. Maybe a way to improve the explanation would be to juxtapose how it needs to be upheld for follower vs leaseholder reads:

For reads to be served by the leaseholder of a range, all we need to do is ensure that, upon acquiring a lease for a range, the new leaseholder forwards its local clock to be greater than or equal to the previous leaseholder's clock when it stopped serving writes. 
For reads to be served by a follower of a range, we need to ensure that its clock is above _both_ the current & previous leaseholders of the range.

Does that sound better?

The way I see it, invariant 1 only concerns leaseholders whereas invariant 2 should be more general.

@aayushshah15 aayushshah15 force-pushed the observed_timestamps_follower_reads branch from 7fa5923 to d66424e Compare July 23, 2020 19:57
Copy link
Member

@nvanbenschoten nvanbenschoten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you remember if there was a reason you didn't make that change?

No, I don't think there was a good reason for not addressing it.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @aayushshah15 and @andreimatei)


pkg/roachpb/data.proto, line 396 at r1 (raw file):

It'd be good for the paragraph to spell this out, and also to have some words about specifically how we bump all the clocks before committing. Does the txnCommitter do that?

No, no one does that. That's why #36431 is broken. This is referenced in the TODO below.


pkg/roachpb/data.proto, line 402 at r1 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

s/its intent/its intents

since you're taking the blame anyway

But are you sure you want to take the blame on this paragraph? I don't think you're changed anything.

Agreed, probably best to fix the typo without moving the rest of the paragraph in order to not mess up the blame. It's useful to be able to quickly get back to the commit/PR that made a change, especially when the change is discussing something like this and may provide valuable context.

Also, I think this was formatted such that it read well in godoc. Does the outdent help that?


pkg/roachpb/data.proto, line 406 at r1 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

I find it confusing to talk about stores here. Let's just talk about replicas (of which the leaseholder is one).

+1


pkg/roachpb/data.proto, line 453 at r1 (raw file):
I don't get why we're making it more general and then saying that it doesn't hold on anything other than the leaseholder. That's not really making it more general, is it?

For reads to be served by a follower of a range, we need to ensure that its clock is above both the current & previous leaseholders of the range.

But this isn't true, is it? How could we enforce this?


pkg/roachpb/data.proto, line 449 at r2 (raw file):

  // on a copy of the slice.
  //
  // ### Follower Reads:

nit: move this section above Usage.


pkg/roachpb/data.proto, line 451 at r2 (raw file):

  // ### Follower Reads:
  //
  // If the store serving a request isn't a leaseholder for the range, we

I'd still avoid talking about "stores". What you mean here is "replica".


pkg/roachpb/data.proto, line 457 at r2 (raw file):

as the observed timestamp of such a store ... will be above the max offset

What does this mean? It's comparing a timestamp against a duration.


pkg/roachpb/data.proto, line 463 at r2 (raw file):

  // that we can only limit the upper bound of our uncertainty interval for read
  // requests served by a leaseholder and *not* for the ones served by
  // followers.

I'm still a little confused why we're comparing max offset vs. follower read duration. Neither situation seems to tell us anything all that interesting. Maybe the insight just hasn't clicked for me yet.

@tbg tbg added the X-noremind Bots won't notify about PRs with X-noremind label May 6, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
X-noremind Bots won't notify about PRs with X-noremind
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants