Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kvfollowerreadsccl: implement txn retries for nearest_only=True #68969

Merged

Conversation

otan
Copy link
Contributor

@otan otan commented Aug 16, 2021

See commit for details.

@otan otan requested review from nvanbenschoten and a team August 16, 2021 00:19
@cockroach-teamcity
Copy link
Member

This change is Reviewable

@otan
Copy link
Contributor Author

otan commented Aug 16, 2021

i'm a little iffy at the schema changes, so would appreciate a look from @cockroachdb/sql-schema on that.

@otan otan force-pushed the schema-unavailability-problem branch 2 times, most recently from 0f33fbb to 0306380 Compare August 16, 2021 01:15
@otan otan requested a review from rytaft August 16, 2021 01:15
@otan
Copy link
Contributor Author

otan commented Aug 16, 2021

also @rytaft if you could look at the opt changes and let me know what you think about whether we can do better on the AOST matching clause here if outer != in (see the test changed needed in as_of to make this happen)

worked around this

@otan otan force-pushed the schema-unavailability-problem branch from 0306380 to e990692 Compare August 16, 2021 05:33
Copy link
Contributor

@ajwerner ajwerner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a design for how this is supposed to work regarding schema descriptor timestamps and bounded staleness time ranges?

Historical descriptors have a stricter safety policy than current descriptors for which you actively hold a lease. To use historical descriptors, you must use them during their durable validity interval (between their ModificationTime and the successor descriptor's Modification time, exclusive). This is because we don't track the moment when we've stopped assuming that the predecessor version might be active.

Can you help me understand some timestamps in play here? My assumption is that the ReadTimestamp on the Txn is set to the minimum timestamp that the query is allowed to see (i.e. the value computed from the AOST clause). Let's think about this in terms of a timeline. Imagine user specifies that their query should use a timestamp no older than a and now is g.

Now we need to plan the query with some descriptor versions. Imagine our query uses two descriptors and those descriptors have gone through some schema changes during the time interval. Namely d1 saw a change at c and at e and d2 saw changes at b, d, and f.

--time-->
abc  d   e   f  g // timestamp labels
[...............] // interval for bounded staleness query, g == now
..|......|....... // version validity intervals for two versions of descriptor d1
.|...|.......|... // version validity intervals for two versions of descriptor d2

So, there should exist the following distinct plans as far as the optimizer is concerned:
[a, b), [b, c), [c, d), [d, e), [e, f), and [f, g).

When I look back at the RFC, there's some language about using follower reads and a similar lookup protocol to find descriptors and then to try to make that cohere here. I don't think I ever internalized this part of the proposal at all and assumed it would just use leases and other cached historical descriptors given we rely on system tables remaining available.

So, in my idealized world, in the face of a schema change, we'd

  1. Plan at the present time
  2. Look at the descriptors used to plan and bound the minimum timestamp to the earliest modification time of any descriptors used.

In this case it would mean that we'd end up changing the minimum timestamp from a to f for the purpose of execution and we'd use g for planning.

I may be misunderstanding things and am more than happy to get on calls.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @nvanbenschoten and @rytaft)

@otan
Copy link
Contributor Author

otan commented Aug 16, 2021

When I look back at the RFC, there's some language about using follower reads and a similar lookup protocol to find descriptors and then to try to make that cohere here. I don't think I ever internalized this part of the proposal at all and assumed it would just use leases and other cached historical descriptors given we rely on system tables remaining available.

Ah, yeah... that's because this used to that we would give up on looking at the past due to the "schema unavailability" problem, but we changed it to what you saw when we thought we could change this: #67837

I probably should have circulated the RFC change when we did this.

Can you help me understand some timestamps in play here?

I think the pseudocode made sense for conceptualising what was happening, but in actuality (aka when I made this work), we do something like this:

a.....b.....c    
|.....|.....

Assume (a) a bounded staleness bound, (b) is a schema modification, (c) is "current time".

  • The code right now will attempt a staleness read between [a, c], but grabs the latest schema and realises it can only service [b, c]. It attempts a read, and if nearest_only=True and the follower is not past b it will error. This is what is implemented as of kvfollowerreadsccl: plug bounded staleness into the execution engine #68967, and follows your "idealised world"
  • We added this in this PR: in nearest_only=True, if we detect that we "jumped forward" because of a schema modification, we will retry, but set the cap at b. The bounded staleness read will attempt to read at [a, b), assuming the schema is read from the point in time b.Prev(). The planner side will use the descriptor at time b.Prev() to do the planning.

This can continue if there are more schema changes, in which case we always do [max(bounded_staleness_timestamp, schema-modification_time), min(now, point before next schema)].

Copy link
Contributor

@ajwerner ajwerner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I feel a lot better now.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @nvanbenschoten, @otan, and @rytaft)


pkg/kv/txn.go, line 1108 at r1 (raw file):

			err := errors.AssertionFailedf("%s: ba=%s, txn=%s", s, ba.String(), txn.String())
			err = errors.WithContextTags(err, ctx)
			panic(err)

are we happy with this new panic behavior? Is this a lot better than turning it into a roachpb error? It should retain it's assertion failure nature as it goes through that round of boxing and unboxing.


pkg/sql/conn_executor.go, line 2191 at r7 (raw file):

	var minTSErr *roachpb.MinTimestampBoundUnsatisfiableError
	if errors.As(err, &minTSErr) {

total nit: you can scope this variable to ths conditional with:

if minTSErr := (*roachpb.MinTimestampBoundUnsatisfiableError)(nil); errors.As(err, &minTSErr) {
    // ...

pkg/sql/conn_executor.go, line 2198 at r7 (raw file):

				// a minimum timestamp during a bounded staleness read to be greater
				// than the maximum staleness bound we put up.
				err = errors.Wrapf(

assertion failure?


pkg/sql/catalog/descs/collection.go, line 109 at r7 (raw file):

	// nearest_only=True, in which we want a schema read that should be
	// no older than maxTimestampBound.
	maxTimestampBound *hlc.Timestamp

Assuming we need all of this, thoughts on just sticking a maxTimestampBoundDeadlineHolder here and then detecting whether that thing has a zero timestamp as opposed to allocating a new one on each use?


pkg/sql/catalog/descs/collection.go, line 122 at r7 (raw file):

// SetMaxTimestampBound sets the maximum timestamp to read schemas at.
func (tc *Collection) SetMaxTimestampBound(maxTimestampBound *hlc.Timestamp) {

I'm still unclear on why we need this. Reasons seem to be:

  1. we want to avoid actually using the transaction to go read from the store for the descriptor in the case we don't have a lease
  2. we want to sidestep the UpdateDeadline call
  3. we want to use this timestamp as opposed to the transaction's read timestamp

Taking them backwards, what is the ReadTimestamp on one of these transactions? What does happen on UpdateDeadline on one of these transactions? Can you tell that you have one of these transactions just from the *kv.Txn object?

Copy link
Contributor Author

@otan otan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @ajwerner, @nvanbenschoten, and @rytaft)


pkg/kv/txn.go, line 1108 at r1 (raw file):

Previously, ajwerner wrote…

are we happy with this new panic behavior? Is this a lot better than turning it into a roachpb error? It should retain it's assertion failure nature as it goes through that round of boxing and unboxing.

cc @nvanbenschoten


pkg/sql/conn_executor.go, line 2191 at r7 (raw file):

Previously, ajwerner wrote…
	var minTSErr *roachpb.MinTimestampBoundUnsatisfiableError
	if errors.As(err, &minTSErr) {

total nit: you can scope this variable to ths conditional with:

if minTSErr := (*roachpb.MinTimestampBoundUnsatisfiableError)(nil); errors.As(err, &minTSErr) {
    // ...

Done.


pkg/sql/catalog/descs/collection.go, line 122 at r7 (raw file):

Previously, ajwerner wrote…

I'm still unclear on why we need this. Reasons seem to be:

  1. we want to avoid actually using the transaction to go read from the store for the descriptor in the case we don't have a lease
  2. we want to sidestep the UpdateDeadline call
  3. we want to use this timestamp as opposed to the transaction's read timestamp

Taking them backwards, what is the ReadTimestamp on one of these transactions? What does happen on UpdateDeadline on one of these transactions? Can you tell that you have one of these transactions just from the *kv.Txn object?

I was told by @nvanbenschoten to not put things on kv.Txn, which is how it ended up here where it is.
ReadTimestamp doesn't change at the moment in a staleness read - should it? I tried that and a bunch of other things complained based on assertions made that it should not be set during a max staleness read. I've been assuming (based on the APIs given) that it's not important.

Your reasons are correct.

@otan otan force-pushed the schema-unavailability-problem branch from e990692 to 8c9e19d Compare August 16, 2021 23:38
Copy link
Collaborator

@rytaft rytaft left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The non-kv, non-schema stuff :lgtm:

Reviewed 1 of 18 files at r1, 3 of 3 files at r4, 1 of 2 files at r5, 1 of 19 files at r8, 5 of 10 files at r9, 10 of 13 files at r10, all commit messages.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @ajwerner, @nvanbenschoten, and @otan)


pkg/sql/conn_executor.go, line 2409 at r10 (raw file):

	// as there was a schema update in between. When retrying, we want to keep the
	// same minimum timestamp for the AOST read, but set the maximum timestamp
	// to the point just before our failed read to ensure we don't try read data

nit: try read -> try to read


pkg/sql/exec_util.go, line 1170 at r10 (raw file):

	DistSQLReceiverPushCallbackFactory func(query string) func(rowenc.EncDatumRow, *execinfrapb.ProducerMetadata)

	// OnTxnRetry, if sets, will be called if there is a transaction retry.

nit: if sets -> if set


pkg/sql/sem/tree/as_of.go, line 93 at r10 (raw file):

	NearestOnly bool
	// If this is a bounded staleness read with nearest_only=True, this is set
	// when we failed to satisfy a bounded staleness with a nearby replica as

nit: satisfy a bounded staleness read

@otan otan force-pushed the schema-unavailability-problem branch 2 times, most recently from ba20676 to e90277b Compare August 17, 2021 08:43
@otan otan marked this pull request as ready for review August 17, 2021 08:43
@otan otan requested a review from a team as a code owner August 17, 2021 08:43
nvanbenschoten added a commit to nvanbenschoten/cockroach that referenced this pull request Aug 17, 2021
Copy link
Contributor

@ajwerner ajwerner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @ajwerner, @nvanbenschoten, and @rytaft)


pkg/sql/catalog/descs/collection.go, line 122 at r7 (raw file):

Previously, otan (Oliver Tan) wrote…

I was told by @nvanbenschoten to not put things on kv.Txn, which is how it ended up here where it is.
ReadTimestamp doesn't change at the moment in a staleness read - should it? I tried that and a bunch of other things complained based on assertions made that it should not be set during a max staleness read. I've been assuming (based on the APIs given) that it's not important.

Your reasons are correct.

Sounds good. I'm cool with this. My sincere and heartfelt hope is to soon rework the dependency logic in this here Collection object to be such that it is constructed (or reset, I suppose) with a binding to the (abstracted) transaction and we remove txn parameter from all of these methods. Seems unlikely to make this release so let's march forward with this and then, when that refactor arrives, we can just set this up at construction time in a more direct manner than MaybeUpdateDeadline.

Copy link
Member

@nvanbenschoten nvanbenschoten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 7 of 13 files at r10, 1 of 3 files at r11, 33 of 33 files at r12, all commit messages.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @ajwerner and @otan)


pkg/ccl/kvccl/kvfollowerreadsccl/testdata/boundedstaleness/single_row, line 145 at r12 (raw file):

# successfully. Note the schema changes several times, hence the multiple events
# output. Note on the first read, event 7 is not a follower read as the query
# requires reading a descriptor which does not utilize a follower read.

Can we ignore the descriptor lookup? Or somehow ensure that it's cached at this point? This seems like it will be a source of flakiness. I also don't understand why the descriptor lookup results in a local read, but not a follower read. Have we ensured that the descriptors leaseholder is on n2?


pkg/sql/conn_executor.go, line 2208 at r12 (raw file):

				)
			}
			if aost.Timestamp.Less(minTSErr.MinTimestampBound.Prev()) {

Could you add a comment explaining the .Prev() here? It must have something to do with an inclusive/exclusive translation somewhere else, right?


pkg/sql/conn_executor.go, line 2412 at r12 (raw file):

	// same minimum timestamp for the AOST read, but set the maximum timestamp
	// to the point just before our failed read to ensure we don't try to read
	//data which may be after the schema change when we retry.

nit: missing space at the beginning of this line


pkg/sql/conn_executor.go, line 2415 at r12 (raw file):

	var minTSErr *roachpb.MinTimestampBoundUnsatisfiableError
	if err := ex.extraTxnState.autoRetryReason; err != nil && errors.As(err, &minTSErr) {
		bound := minTSErr.MinTimestampBound.Prev()

Would things be easier to reason about it max_timestamp_bound was exclusive instead of inclusive? We can make that change in the KV API if necessary.


pkg/sql/catalog/descs/descriptor.go, line 212 at r12 (raw file):

		ctx, tc.deadlineHolder(txn), parentID, parentSchemaID, name)
	if err != nil {

nit: stray new line


pkg/sql/catalog/descs/leased_descriptors.go, line 59 at r12 (raw file):

	ctx context.Context, deadline hlc.Timestamp,
) error {
	return nil

Do we need to use the deadline even for bounded staleness queries? Right now we aren't passing a MaxTimestampBound for bounded staleness queries that have not yet retried, but I think this may be faulty. If the bounded staleness read is operating on the current schema, gets delayed, and then chooses a dynamic timestamp beyond the end of the validity period for this schema, won't bad things happen?

cc. @ajwerner


pkg/sql/catalog/descs/leased_descriptors.go, line 141 at r12 (raw file):

		// limitations of AcquireByName for details.
		// Note we never should read from store during a bounded staleness read,
		// as it is safe to return the schema as non-existent.

What is the effect of this? Does this mean that if the schema exists but is not cached, we won't observe it and will incorrectly claim that it does not exist?


pkg/sql/sem/tree/as_of.go, line 96 at r12 (raw file):

	// have no followers with an up-to-date schema.
	// In this case, we want a read that is >= Timestamp and <= MaxTimestampBound.
	MaxTimestampBound *hlc.Timestamp

nit: we generally use the zero-value ot hlc.Timestamp to represent an optional timestamp. There's an IsEmpty method you can use to check whether the timestamp is present or not. No reason to incur runtime cost for this.


pkg/kv/txn.go, line 1108 at r1 (raw file):

Previously, otan (Oliver Tan) wrote…

cc @nvanbenschoten

We could turn this into an error. Done in #69046.

Copy link
Contributor Author

@otan otan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @ajwerner and @nvanbenschoten)


pkg/ccl/kvccl/kvfollowerreadsccl/testdata/boundedstaleness/single_row, line 145 at r12 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Can we ignore the descriptor lookup? Or somehow ensure that it's cached at this point? This seems like it will be a source of flakiness. I also don't understand why the descriptor lookup results in a local read, but not a follower read. Have we ensured that the descriptors leaseholder is on n2?

it's because kv.OnlyFollowerReads ensures everything is a follower read, including the descriptor lookup.
i've changed this to wait for an instance of follower read before continuing so we have the cache entry.


pkg/sql/conn_executor.go, line 2208 at r12 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Could you add a comment explaining the .Prev() here? It must have something to do with an inclusive/exclusive translation somewhere else, right?

ah, it's not needed here.


pkg/sql/conn_executor.go, line 2415 at r12 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Would things be easier to reason about it max_timestamp_bound was exclusive instead of inclusive? We can make that change in the KV API if necessary.

still needs the Prev() for the schema descriptor lookup, so i prefer it this way tbh.


pkg/sql/catalog/descs/leased_descriptors.go, line 59 at r12 (raw file):

If the bounded staleness read is operating on the current schema, gets delayed, and then chooses a dynamic timestamp beyond the end of the validity period for this schema, won't bad things happen?

isn't that ok? we would truncate the min timestamp for that read anyway right, and retry would take care of the rest? unless i'm misunderstanding.


pkg/sql/catalog/descs/leased_descriptors.go, line 141 at r12 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

What is the effect of this? Does this mean that if the schema exists but is not cached, we won't observe it and will incorrectly claim that it does not exist?

The schema doesn't exist; we're falling back to the KV read in case it does. But for bounded staleness, we should avoid that because that only observes the "latest" read.

@otan otan force-pushed the schema-unavailability-problem branch from e90277b to 0cd5f95 Compare August 17, 2021 22:18
Copy link
Member

@nvanbenschoten nvanbenschoten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 9 of 9 files at r13, all commit messages.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @ajwerner and @otan)


pkg/sql/catalog/descs/leased_descriptors.go, line 59 at r12 (raw file):

Previously, otan (Oliver Tan) wrote…

If the bounded staleness read is operating on the current schema, gets delayed, and then chooses a dynamic timestamp beyond the end of the validity period for this schema, won't bad things happen?

isn't that ok? we would truncate the min timestamp for that read anyway right, and retry would take care of the rest? unless i'm misunderstanding.

We would truncate the min timestamp, but what about the max timestamp? I'm concerned that we have a lease on a descriptor from time 20 to time 30 and we have a min_timestamp_bound of 10. We bump the min_timestamp_bound to 20, but what if we end up evaluating the bounded staleness read at time 35? What if a schema change or two landed between 30 and 35? Could that lead to incoherence between the table data and the schema we use to decode that data?

The TL;DR is that I think we may need to always set a max timestamp bound.


pkg/sql/catalog/descs/leased_descriptors.go, line 141 at r12 (raw file):

Previously, otan (Oliver Tan) wrote…

The schema doesn't exist; we're falling back to the KV read in case it does. But for bounded staleness, we should avoid that because that only observes the "latest" read.

I see, this is about dropped/truncated tables, not newly created ones. This makes sense then. Returning that the schema is non-existent is the "least stale" result possible. Makes sense.


pkg/sql/sem/tree/as_of.go, line 96 at r12 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

nit: we generally use the zero-value ot hlc.Timestamp to represent an optional timestamp. There's an IsEmpty method you can use to check whether the timestamp is present or not. No reason to incur runtime cost for this.

follow-up nit: mention that the value can be set to the zero value when a query has no maximum timestamp bound.

@otan
Copy link
Contributor Author

otan commented Aug 18, 2021


pkg/sql/catalog/descs/leased_descriptors.go, line 59 at r12 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

We would truncate the min timestamp, but what about the max timestamp? I'm concerned that we have a lease on a descriptor from time 20 to time 30 and we have a min_timestamp_bound of 10. We bump the min_timestamp_bound to 20, but what if we end up evaluating the bounded staleness read at time 35? What if a schema change or two landed between 30 and 35? Could that lead to incoherence between the table data and the schema we use to decode that data?

The TL;DR is that I think we may need to always set a max timestamp bound.

Not sure I understand, this controls what time the schema should be read.

For the first read, if we have a min_timestamp_bound of 10 in your example, max is not set, so we try read from latest (and this code is not used). If we have something between 30-35, then the min_timestamp_bound will be reset to something in 30-35 (say 30).
If that fails, we have set a min_timestamp_bound of say 30 in the read because of the bump, then max timestamp bound is set to 29.
The next time, we always read the schema from time 29. We then use this deadline holder. Does the deadline matter in this case when we know there is a schema that is later?

Copy link
Member

@nvanbenschoten nvanbenschoten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @ajwerner and @otan)


pkg/sql/catalog/descs/leased_descriptors.go, line 59 at r12 (raw file):

Previously, otan (Oliver Tan) wrote…

Not sure I understand, this controls what time the schema should be read.

For the first read, if we have a min_timestamp_bound of 10 in your example, max is not set, so we try read from latest (and this code is not used). If we have something between 30-35, then the min_timestamp_bound will be reset to something in 30-35 (say 30).
If that fails, we have set a min_timestamp_bound of say 30 in the read because of the bump, then max timestamp bound is set to 29.
The next time, we always read the schema from time 29. We then use this deadline holder. Does the deadline matter in this case when we know there is a schema that is later?

Isn't UpdateDeadline used to ensure a transaction does not commit beyond the validity of the schema descriptors that it is using? If I'm not wildly off on that, then I'm thinking that we need the same form of protection for bounded staleness reads, even on their first (the "latest") attempt. If the first attempt is performed without any max_timestamp_bound at all, then how do we ensure that it doesn't dynamically choose a timestamp well in the future and beyond its descriptor's expiration?

@otan
Copy link
Contributor Author

otan commented Aug 18, 2021


pkg/sql/catalog/descs/leased_descriptors.go, line 59 at r12 (raw file):

then how do we ensure that it doesn't dynamically choose a timestamp well in the future and beyond its descriptor's expiration?

I'm not as learned in the schema code, but we don't use deadline holder in this case return nil (it falls back to the existing leasing logic). see

func (tc *Collection) deadlineHolder(txn *kv.Txn) deadlineHolder {
	if tc.maxTimestampBoundDeadlineHolder.maxTimestampBound.IsEmpty() {
		return txn
	}
	return &tc.maxTimestampBoundDeadlineHolder
}

@otan
Copy link
Contributor Author

otan commented Aug 18, 2021


pkg/sql/catalog/descs/leased_descriptors.go, line 59 at r12 (raw file):

Previously, otan (Oliver Tan) wrote…

then how do we ensure that it doesn't dynamically choose a timestamp well in the future and beyond its descriptor's expiration?

I'm not as learned in the schema code, but we don't use deadline holder in this case return nil (it falls back to the existing leasing logic). see

func (tc *Collection) deadlineHolder(txn *kv.Txn) deadlineHolder {
	if tc.maxTimestampBoundDeadlineHolder.maxTimestampBound.IsEmpty() {
		return txn
	}
	return &tc.maxTimestampBoundDeadlineHolder
}

sorry, we don't use this deadline holder in that case (we fall back to the txn one), which updates the deadline.

@otan otan force-pushed the schema-unavailability-problem branch from 0cd5f95 to 7a8e6bc Compare August 18, 2021 05:02
Copy link
Member

@nvanbenschoten nvanbenschoten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @ajwerner, @nvanbenschoten, and @otan)


pkg/sql/catalog/descs/leased_descriptors.go, line 59 at r12 (raw file):

Previously, otan (Oliver Tan) wrote…

sorry, we don't use this deadline holder in that case (we fall back to the txn one), which updates the deadline.

Got it, I think that's ok, but then we need to do some extra work in kv.Txn to respect the deadline when NegotiateAndSend is used. That should be fine and I should have realized that's how this was going to work earlier. We'll just use the same max_timestamp_bound mechanism with some code in NegotiateAndSend like:

if txn.mu.deadline != nil {
    ba.BoundedStaleness.MaxTimestampBound.Backward(*txn.mu.deadline)
}

I'll open an issue for this, though I won't get to it until next week.

@otan
Copy link
Contributor Author

otan commented Aug 18, 2021


pkg/sql/catalog/descs/leased_descriptors.go, line 59 at r12 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Got it, I think that's ok, but then we need to do some extra work in kv.Txn to respect the deadline when NegotiateAndSend is used. That should be fine and I should have realized that's how this was going to work earlier. We'll just use the same max_timestamp_bound mechanism with some code in NegotiateAndSend like:

if txn.mu.deadline != nil {
    ba.BoundedStaleness.MaxTimestampBound.Backward(*txn.mu.deadline)
}

I'll open an issue for this, though I won't get to it until next week.

sounds good, yeah it smelt as thought it would have affected the nearest_only=False case as well.
if this LGTM and is the last comment i'm going to merge before you go :D

Copy link
Member

@nvanbenschoten nvanbenschoten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewed 1 of 1 files at r14, all commit messages.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (and 1 stale) (waiting on @ajwerner and @otan)

This implements the schema unavailability workaround for bounded
staleness reads, where if we had to bump the AOST min timestamp bound as
we fetched a schema descriptor which was more recent than the AOST min
timestamp bound. In this case, we look to retry, using an older schema
descriptor each time we go back.

We utilize the existing machinery around transaction retries to work.
Between retries, we keep track of a `MaxStalenessBound`, so that in
between retries we always read a lower bound.

There are a number of changes around retries needed to make this work:
* We preserve AOST between transaction retries. This is needed to
  prevent a case where the timestamp bound keeps progressing.
* with_min_timestamp / with_max_staleness now always return the context
  AS OF SYSTEM TIME. This is required as checks elsewhere require this
  to be equal to the evaluated time, but on txn retries.
* The schema descriptor code has to know the `MaxStalenessBound`. As
  such, we store this in the `Collection` object which is persisted
  inside `extraTxnState` and is preserved between retries.

Release note (sql change): Bounded staleness reads now retry
transactions when nearest_only=True and a schema change is detected
which may prevent a follower read from being served.
@otan otan force-pushed the schema-unavailability-problem branch from 7a8e6bc to f5bdee0 Compare August 18, 2021 06:12
@otan
Copy link
Contributor Author

otan commented Aug 18, 2021

cheers!

bors r=nvanbenschoten,ajwerner,rytaft

@craig
Copy link
Contributor

craig bot commented Aug 18, 2021

Build succeeded:

@craig craig bot merged commit b16c86d into cockroachdb:master Aug 18, 2021
nvanbenschoten added a commit to nvanbenschoten/cockroach that referenced this pull request Aug 27, 2021
Instead of panicking. Discussed in cockroachdb#68969.

Release justification: change to new functionality
craig bot pushed a commit that referenced this pull request Aug 28, 2021
69046: kv: return error from checkNegotiateAndSendPreconditions r=nvanbenschoten a=nvanbenschoten

Instead of panicking. Discussed in #68969.

Release justification: change to new functionality

69273: sql: crdb_internal.reset_sql_stats() now resets persisted SQL Stats  r=maryliag a=Azhng

Depends on #68401 and #68434

Previously, crdb_internal.reset_sql_stats() builtin only resets
cluster-wide in-memory sql stats.
This patch updated the builtin to be able to reset persisted
sql stats as well.

Release justification: category 4

Release note (sql change): crdb_internal.reset_sql_stats() now resets
 persisted SQL Stats.



Co-authored-by: Nathan VanBenschoten <nvanbenschoten@gmail.com>
Co-authored-by: Azhng <archer.xn@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants