Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sql/kv: bounded staleness reads #67562

Open
12 of 14 tasks
nvanbenschoten opened this issue Jul 13, 2021 · 1 comment
Open
12 of 14 tasks

sql/kv: bounded staleness reads #67562

nvanbenschoten opened this issue Jul 13, 2021 · 1 comment
Labels
A-kv-transactions Relating to MVCC and the transactional model. A-multiregion Related to multi-region A-sql-execution Relating to SQL execution. A-sql-optimizer SQL logical planning and optimizations. C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) T-multiregion T-sql-queries SQL Queries Team

Comments

@nvanbenschoten nvanbenschoten added C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) A-sql-optimizer SQL logical planning and optimizations. A-kv-transactions Relating to MVCC and the transactional model. A-sql-execution Relating to SQL execution. A-multiregion Related to multi-region labels Jul 13, 2021
nvanbenschoten added a commit to nvanbenschoten/cockroach that referenced this issue Jul 19, 2021
Closes cockroachdb#67549.
Touches cockroachdb#67562.

This commit introduces a new QueryResolvedTimestampRequest type, which is the
first step towards implementing bounded staleness reads. This new request type
requests the resolved timestamp of the key span it is issued over.

The resolved timestamp of a key span is defined as the minimum of all closed
timestamps across the key span (there can be multiple if the key span touches
multiple ranges) along with the timestamp immediately preceding each intent
in the key span. Because the closed timestamp increases monotonically on all
ranges and also blocks the creation of new intents at timestamps below it,
the resolved timestamp over a given key span also increases monotonically.

However, within a given range, the closed timestamp and the set of intents
are both properties of the specific replica consulted. This means that two
replicas in the same range may report different resolved timestamps at the
same point in time, depending on how far they have each caught up on their
range's Raft log. As a result, the resolved timestamp is only guaranteed to
increase monotonically if the same replica or set of replicas are consulted
each time.

The expectation is that a CONSISTENT read at or below a key span's resolved
timestamp will never block on replication or on conflicting transactions. For
this to be guaranteed, the read must be issued to the same replica or set of
replicas (for multi-range reads) that were consulted when computing the key
span's resolved timestamp.

The resolved timestamp of a key span is a sibling concept to the resolved
timestamp of a rangefeed, which is defined in:
  pkg/kv/kvserver/rangefeed/resolved_timestamp.go
Whereas the resolved timestamp of a rangefeed refers to a timestamp below
which no future updates will be published on the rangefeed, the resolved
timestamp of a key span refers to a timestamp below which no future state
modifications that could change the result of read requests will be made.
Both concepts rely on some notion of immutability, but the former imparts
this property on a stream of events while the latter imparts this property
on materialized state.
nvanbenschoten added a commit to nvanbenschoten/cockroach that referenced this issue Jul 23, 2021
Closes cockroachdb#67549.
Touches cockroachdb#67562.

This commit introduces a new QueryResolvedTimestampRequest type, which is the
first step towards implementing bounded staleness reads. This new request type
requests a resolved timestamp for the key span it is issued over.

A resolved timestamp for a key span is a timestamp at or below which all
future reads within the span are guaranteed to produce the same results, i.e.
at which MVCC history has become immutable. The most up-to-date such bound
can be computed for a key span contained in a single range by taking the
minimum of the leaseholder's closed timestamp and the timestamp preceding the
earliest intent present on the range that overlaps with the key span of
interest. This optimum timestamp is nondecreasing over time, since the closed
timestamp will not regress and since it also prevents intents at lower
timestamps from being created. Follower replicas can also provide a resolved
timestamp, though it may not be the most recent one due to replication delay.
However, a given follower replica will similarly produce a nondecreasing
sequence of closed timestamps.

QueryResolvedTimestampRequest returns a resolved timestamp for the input key
span by returning the minimum of all replicas contacted in order to cover the
key span. This means that repeated invocations of this operation will be
guaranteed nondecreasing only if routed to the same replicas.

A CONSISTENT read at or below a key span's resolved timestamp will never
block on replication or on conflicting transactions. However, as can be
inferred from the previous paragraph, for this to be guaranteed, the read
must be issued to the same replica or set of replicas (for multi-range reads)
that were consulted when computing the key span's resolved timestamp.

A resolved timestamp for a key span is a sibling concept a resolved timestamp
for a rangefeed, which is defined in:
  pkg/kv/kvserver/rangefeed/resolved_timestamp.go
Whereas a resolved timestamp for a rangefeed refers to a timestamp below
which no future updates will be published on the rangefeed, a resolved
timestamp for a key span refers to a timestamp below which no future state
modifications that could change the result of read requests will be made.
Both concepts rely on some notion of immutability, but the former imparts
this property on a stream of events while the latter imparts this property
on materialized state.

This commit does not begin using the new QueryResolvedTimestampRequest. Its
use will begin in a follow-up commit that implements the "Server-side
negotiation fast-path". See the bounded staleness RFC for details.
nvanbenschoten added a commit to nvanbenschoten/cockroach that referenced this issue Jul 26, 2021
Closes cockroachdb#67549.
Touches cockroachdb#67562.

This commit introduces a new QueryResolvedTimestampRequest type, which is the
first step towards implementing bounded staleness reads. This new request type
requests a resolved timestamp for the key span it is issued over.

A resolved timestamp for a key span is a timestamp at or below which all
future reads within the span are guaranteed to produce the same results, i.e.
at which MVCC history has become immutable. The most up-to-date such bound
can be computed for a key span contained in a single range by taking the
minimum of the leaseholder's closed timestamp and the timestamp preceding the
earliest intent present on the range that overlaps with the key span of
interest. This optimum timestamp is nondecreasing over time, since the closed
timestamp will not regress and since it also prevents intents at lower
timestamps from being created. Follower replicas can also provide a resolved
timestamp, though it may not be the most recent one due to replication delay.
However, a given follower replica will similarly produce a nondecreasing
sequence of resolved timestamps.

QueryResolvedTimestampRequest returns a resolved timestamp for the input key
span by returning the minimum of all replicas contacted in order to cover the
key span. This means that repeated invocations of this operation will be
guaranteed nondecreasing only if routed to the same replicas.

A CONSISTENT read at or below a key span's resolved timestamp will never
block on replication or on conflicting transactions. However, as can be
inferred from the previous paragraph, for this to be guaranteed, the read
must be issued to the same replica or set of replicas (for multi-range reads)
that were consulted when computing the key span's resolved timestamp.

A resolved timestamp for a key span is a sibling concept a resolved timestamp
for a rangefeed, which is defined in:
  pkg/kv/kvserver/rangefeed/resolved_timestamp.go
Whereas a resolved timestamp for a rangefeed refers to a timestamp below
which no future updates will be published on the rangefeed, a resolved
timestamp for a key span refers to a timestamp below which no future state
modifications that could change the result of read requests will be made.
Both concepts rely on some notion of immutability, but the former imparts
this property on a stream of events while the latter imparts this property
on materialized state.

This commit does not begin using the new QueryResolvedTimestampRequest. Its
use will begin in a follow-up commit that implements the "Server-side
negotiation fast-path". See the bounded staleness RFC for details.
nvanbenschoten added a commit to nvanbenschoten/cockroach that referenced this issue Jul 26, 2021
Closes cockroachdb#67549.
Touches cockroachdb#67562.

This commit introduces a new QueryResolvedTimestampRequest type, which is the
first step towards implementing bounded staleness reads. This new request type
requests a resolved timestamp for the key span it is issued over.

A resolved timestamp for a key span is a timestamp at or below which all
future reads within the span are guaranteed to produce the same results, i.e.
at which MVCC history has become immutable. The most up-to-date such bound
can be computed for a key span contained in a single range by taking the
minimum of the leaseholder's closed timestamp and the timestamp preceding the
earliest intent present on the range that overlaps with the key span of
interest. This optimum timestamp is nondecreasing over time, since the closed
timestamp will not regress and since it also prevents intents at lower
timestamps from being created. Follower replicas can also provide a resolved
timestamp, though it may not be the most recent one due to replication delay.
However, a given follower replica will similarly produce a nondecreasing
sequence of resolved timestamps.

QueryResolvedTimestampRequest returns a resolved timestamp for the input key
span by returning the minimum of all replicas contacted in order to cover the
key span. This means that repeated invocations of this operation will be
guaranteed nondecreasing only if routed to the same replicas.

A CONSISTENT read at or below a key span's resolved timestamp will never
block on replication or on conflicting transactions. However, as can be
inferred from the previous paragraph, for this to be guaranteed, the read
must be issued to the same replica or set of replicas (for multi-range reads)
that were consulted when computing the key span's resolved timestamp.

A resolved timestamp for a key span is a sibling concept a resolved timestamp
for a rangefeed, which is defined in:
  pkg/kv/kvserver/rangefeed/resolved_timestamp.go
Whereas a resolved timestamp for a rangefeed refers to a timestamp below
which no future updates will be published on the rangefeed, a resolved
timestamp for a key span refers to a timestamp below which no future state
modifications that could change the result of read requests will be made.
Both concepts rely on some notion of immutability, but the former imparts
this property on a stream of events while the latter imparts this property
on materialized state.

This commit does not begin using the new QueryResolvedTimestampRequest. Its
use will begin in a follow-up commit that implements the "Server-side
negotiation fast-path". See the bounded staleness RFC for details.
craig bot pushed a commit that referenced this issue Jul 26, 2021
66782: jobs: add support for SHOW CREATE SCHEDULE command r=annezhu98 a=annezhu98

Before this change, there was no command to show SQL statements used to create scheduled jobs. This commit allows users to view create statements for scheduled jobs. There are currently two options for this command: 

- `SHOW CREATE SCHEDULE <schedule_id>`: show the create statement for a scheduled job.
- `SHOW CREATE ALL SCHEDULES`: show the create statements for all scheduled jobs.

Example usage:

```
 > SHOW CREATE SCHEDULE 123;
      schedule_id     |                                                   create_statement
 ---------------------+------------------------------------------------------------------------------------------------------------------------
        123           | CREATE SCHEDULE 'core_schedule_label' FOR BACKUP DATABASE defaultdb INTO 'nodelocal://1/my_backup' RECURRING '@daily'
 (1 row)
```

Example execution results:

![image](https://user-images.githubusercontent.com/29808757/126368662-7fb6a5d8-a7c1-4906-be14-b4ab4679c689.png)
![image (1)](https://user-images.githubusercontent.com/29808757/126368672-f8bd1ee7-107b-4821-9f7f-2874a9ae0e55.png)
![image (2)](https://user-images.githubusercontent.com/29808757/126368676-7b0dc939-d9a4-4f41-a67e-7de2a2d7ea86.png)
![image](https://user-images.githubusercontent.com/29808757/126369040-df35ebaf-19c2-4f92-aff6-e5fd8f276986.png)



Resolves: #58372

Release note (sql change): added SHOW CREATE SCHEDULES command to view SQL statements used to create existing schedules

67725: kv: introduce QueryResolvedTimestamp request r=tbg,irfansharif a=nvanbenschoten

Closes #67549.
Touches #67562.

This commit introduces a new `QueryResolvedTimestampRequest` type, which is the first step towards implementing bounded staleness reads. This new request type requests a resolved timestamp for the key span it is issued over.

A resolved timestamp for a key span is a timestamp at or below which all future reads within the span are guaranteed to produce the same results, i.e. at which MVCC history has become immutable. The most up-to-date such bound can be computed for a key span contained in a single range by taking the minimum of the leaseholder's closed timestamp and the timestamp preceding the earliest intent present on the range that overlaps with the key span of interest. This optimum timestamp is nondecreasing over time, since the closed timestamp will not regress and since it also prevents intents at lower timestamps from being created. Follower replicas can also provide a resolved timestamp, though it may not be the most recent one due to replication delay. However, a given follower replica will similarly produce a nondecreasing sequence of resolved timestamps.

QueryResolvedTimestampRequest returns a resolved timestamp for the input key span by returning the minimum of all replicas contacted in order to cover the key span. This means that repeated invocations of this operation will be guaranteed nondecreasing only if routed to the same replicas.

A CONSISTENT read at or below a key span's resolved timestamp will never block on replication or on conflicting transactions. However, as can be inferred from the previous paragraph, for this to be guaranteed, the read must be issued to the same replica or set of replicas (for multi-range reads) that were consulted when computing the key span's resolved timestamp.

A resolved timestamp for a key span is a sibling concept a resolved timestamp for a rangefeed, which is defined in `pkg/kv/kvserver/rangefeed/resolved_timestamp.go`. Whereas a resolved timestamp for a rangefeed refers to a timestamp below which no future updates will be published on the rangefeed, a resolved timestamp for a key span refers to a timestamp below which no future state modifications that could change the result of read requests will be made. Both concepts rely on some notion of immutability, but the former imparts this property on a stream of events while the latter imparts this property on materialized state.

This commit does not begin using the new QueryResolvedTimestampRequest. Its use will begin in a follow-up commit that implements the "Server-side negotiation fast-path". See the bounded staleness RFC for details.

68041: tree: OPERATOR pretty printing changes r=rafiss a=otan

See individual commits for details.

Resolves #68035.

Co-authored-by: Anne Zhu <anne.zhu@cockroachlabs.com>
Co-authored-by: Nathan VanBenschoten <nvanbenschoten@gmail.com>
Co-authored-by: Oliver Tan <otan@cockroachlabs.com>
nvanbenschoten added a commit to nvanbenschoten/cockroach that referenced this issue Jul 28, 2021
Touches cockroachdb#67562.

This commit introduces a new RoutingPolicy configuration that lives on a
BatchRequest header. A request's routing policy specifies how the
request should be routed to the replicas of its target range(s) by the
DistSender. There are initially two routing policies:
```
enum RoutingPolicy {
  // LEASEHOLDER means that the DistSender should route the request to the
  // leaseholder replica(s) of its target range(s).
  LEASEHOLDER = 0;
  // NEAREST means that the DistSender should route the request to the
  // nearest replica(s) of its target range(s).
  NEAREST = 1;
}
```

The default policy is `LEASEHOLDER`.

Routing policies allow us to stop overloading the use of the
ReadConsistency enum to dictate both how the client should route a
request to a server and which kinds of requests should be eligible to be
served by a given replica. Routing policies are a client-side only
concept. They do not dictate which replicas in a range are eligible to
serve the request, only which replicas are considered as targets by the
DistSender, and in which order. A request that is routed to an
ineligible replica (a function of request type, timestamp, and read
consistency) will be rejected by that replica and the DistSender will
target another replica in the range.

As discussed in cockroachdb#67725 (review),
we will likely need to introduce a third routing policy that called
`SINGLE_REPLICA` to address cockroachdb#67554. This policy would be accompanied by
a ReplicaDescriptor and would specify that a given request must be sent
to that replica and DistSender should throw an error if the replica is
not part of its cached range descriptor. This is important to ensure
that a QueryResolvedTimestampRequest and its follow-up ScanRequest are
both sent to the same replica.
nvanbenschoten added a commit to nvanbenschoten/cockroach that referenced this issue Jul 28, 2021
Touches cockroachdb#67562.

This commit adds support for a subset of non-transactional batch
requests to perform follower reads. Specifically, it makes those that do
not rely on their timestamp being set from the server's clock eligible.

This condition is necessary because if a follower with a lagging clock
sets its timestamp and this then allows the follower to evaluate the
batch as a follower read, then the batch might miss past writes served
at higher timestamps on the leaseholder.
otan pushed a commit to otan-cockroach/cockroach that referenced this issue Jul 29, 2021
Fixes cockroachdb#67551.
Fixes cockroachdb#67552.
Fixes cockroachdb#67553.
Touches cockroachdb#67562.

Bounded-staleness read orchestration consists of two phases -
negotiation and execution. Negotiation determines the timestamp to run
the query at in order to ensure that the read will not block on
replication or on conflicting transactions. Execution then uses this
timestamp to run the read request.

This commit implements the bounded staleness server-side negotiation
fast-path. This fast-path allows a bounded staleness read request that
lands on a single range to perform its negotiation phase and execution
phase in a single RPC.

The server-side negotiation fast-path provides two benefits:
1. it avoids two network hops in the common-case where a bounded
   staleness read is targeting a single range. This in an important
   performance optimization for single-row point lookups.
2. it provides stronger guarantees around minimizing staleness during
   bounded staleness reads. Bounded staleness reads that hit the
   server-side fast-path use their target replica's most up-to-date
   resolved timestamp, so they are as fresh as possible. Bounded
   staleness reads that miss the fast-path and perform explicit
   negotiation (see below) consult a cache, so they may use an
   out-of-date, suboptimal resolved timestamp, as long as it is fresh
   enough to satisfy the staleness bound of the request.

The commit then uses this new functionality to implement the
`(*Txn).NegotiateAndSend` method detailed in the bounded staleness RFC.
`NegotiateAndSend` is a specialized version of `Send` that is capable of
orchestrating a bounded-staleness read through a transaction, given a
read-only BatchRequest with a `min_timestamp_bound` set in its Header.
If the method returns successfully, the transaction will have been given
a fixed timestamp equal to the timestamp that the read-only request was
evaluated at.
nvanbenschoten added a commit to nvanbenschoten/cockroach that referenced this issue Aug 2, 2021
Touches cockroachdb#67562.

This commit adds support for a subset of non-transactional batch
requests to perform follower reads. Specifically, it makes those that do
not rely on their timestamp being set from the server's clock eligible.

This condition is necessary because if a follower with a lagging clock
sets its timestamp and this then allows the follower to evaluate the
batch as a follower read, then the batch might miss past writes served
at higher timestamps on the leaseholder.
craig bot pushed a commit that referenced this issue Aug 2, 2021
68192: kv: permit some non-transactional batches to perform follower reads r=nvanbenschoten a=nvanbenschoten

Touches #67562.

This commit adds support for a subset of non-transactional batch requests to perform follower reads. Specifically, it makes those that do not rely on their timestamp being set from the server's clock eligible.

This condition is necessary because if a follower with a lagging clock sets its timestamp and this then allows the follower to evaluate the batch as a follower read, then the batch might miss past writes served at higher timestamps on the leaseholder.

Co-authored-by: Nathan VanBenschoten <nvanbenschoten@gmail.com>
nvanbenschoten added a commit to nvanbenschoten/cockroach that referenced this issue Aug 2, 2021
Touches cockroachdb#67562.

This commit introduces a new RoutingPolicy configuration that lives on a
BatchRequest header. A request's routing policy specifies how the
request should be routed to the replicas of its target range(s) by the
DistSender. There are initially two routing policies:
```
enum RoutingPolicy {
  // LEASEHOLDER means that the DistSender should route the request to the
  // leaseholder replica(s) of its target range(s).
  LEASEHOLDER = 0;
  // NEAREST means that the DistSender should route the request to the
  // nearest replica(s) of its target range(s).
  NEAREST = 1;
}
```

The default policy is `LEASEHOLDER`.

Routing policies allow us to stop overloading the use of the
ReadConsistency enum to dictate both how the client should route a
request to a server and which kinds of requests should be eligible to be
served by a given replica. Routing policies are a client-side only
concept. They do not dictate which replicas in a range are eligible to
serve the request, only which replicas are considered as targets by the
DistSender, and in which order. A request that is routed to an
ineligible replica (a function of request type, timestamp, and read
consistency) will be rejected by that replica and the DistSender will
target another replica in the range.

As discussed in cockroachdb#67725 (review),
we will likely need to introduce a third routing policy that called
`SINGLE_REPLICA` to address cockroachdb#67554. This policy would be accompanied by
a ReplicaDescriptor and would specify that a given request must be sent
to that replica and DistSender should throw an error if the replica is
not part of its cached range descriptor. This is important to ensure
that a QueryResolvedTimestampRequest and its follow-up ScanRequest are
both sent to the same replica.
nvanbenschoten added a commit to nvanbenschoten/cockroach that referenced this issue Aug 2, 2021
Fixes cockroachdb#67551.
Fixes cockroachdb#67552.
Fixes cockroachdb#67553.
Touches cockroachdb#67562.

Bounded-staleness read orchestration consists of two phases -
negotiation and execution. Negotiation determines the timestamp to run
the query at in order to ensure that the read will not block on
replication or on conflicting transactions. Execution then uses this
timestamp to run the read request.

This commit implements the bounded staleness server-side negotiation
fast-path. This fast-path allows a bounded staleness read request that
lands on a single range to perform its negotiation phase and execution
phase in a single RPC.

The server-side negotiation fast-path provides two benefits:
1. it avoids two network hops in the common-case where a bounded
   staleness read is targeting a single range. This in an important
   performance optimization for single-row point lookups.
2. it provides stronger guarantees around minimizing staleness during
   bounded staleness reads. Bounded staleness reads that hit the
   server-side fast-path use their target replica's most up-to-date
   resolved timestamp, so they are as fresh as possible. Bounded
   staleness reads that miss the fast-path and perform explicit
   negotiation (see below) consult a cache, so they may use an
   out-of-date, suboptimal resolved timestamp, as long as it is fresh
   enough to satisfy the staleness bound of the request.

The commit then uses this new functionality to implement the
`(*Txn).NegotiateAndSend` method detailed in the bounded staleness RFC.
`NegotiateAndSend` is a specialized version of `Send` that is capable of
orchestrating a bounded-staleness read through a transaction, given a
read-only BatchRequest with a `min_timestamp_bound` set in its Header.
If the method returns successfully, the transaction will have been given
a fixed timestamp equal to the timestamp that the read-only request was
evaluated at.
nvanbenschoten added a commit to nvanbenschoten/cockroach that referenced this issue Aug 5, 2021
Touches cockroachdb#67562.

This commit introduces a new RoutingPolicy configuration that lives on a
BatchRequest header. A request's routing policy specifies how the
request should be routed to the replicas of its target range(s) by the
DistSender. There are initially two routing policies:
```
enum RoutingPolicy {
  // LEASEHOLDER means that the DistSender should route the request to the
  // leaseholder replica(s) of its target range(s).
  LEASEHOLDER = 0;
  // NEAREST means that the DistSender should route the request to the
  // nearest replica(s) of its target range(s).
  NEAREST = 1;
}
```

The default policy is `LEASEHOLDER`.

Routing policies allow us to stop overloading the use of the
ReadConsistency enum to dictate both how the client should route a
request to a server and which kinds of requests should be eligible to be
served by a given replica. Routing policies are a client-side only
concept. They do not dictate which replicas in a range are eligible to
serve the request, only which replicas are considered as targets by the
DistSender, and in which order. A request that is routed to an
ineligible replica (a function of request type, timestamp, and read
consistency) will be rejected by that replica and the DistSender will
target another replica in the range.

As discussed in cockroachdb#67725 (review),
we will likely need to introduce a third routing policy that called
`SINGLE_REPLICA` to address cockroachdb#67554. This policy would be accompanied by
a ReplicaDescriptor and would specify that a given request must be sent
to that replica and DistSender should throw an error if the replica is
not part of its cached range descriptor. This is important to ensure
that a QueryResolvedTimestampRequest and its follow-up ScanRequest are
both sent to the same replica.
craig bot pushed a commit that referenced this issue Aug 5, 2021
68191: kv: introduce request RoutingPolicy configuration r=nvanbenschoten a=nvanbenschoten

Half of #67551.
Touches #67562.

This commit introduces a new RoutingPolicy configuration that lives on a BatchRequest header. A request's routing policy specifies how the request should be routed to the replicas of its target range(s) by the DistSender. There are initially two routing policies:
```
enum RoutingPolicy {
  // LEASEHOLDER means that the DistSender should route the request to the
  // leaseholder replica(s) of its target range(s).
  LEASEHOLDER = 0;
  // NEAREST means that the DistSender should route the request to the
  // nearest replica(s) of its target range(s).
  NEAREST = 1;
}
```

The default policy is `LEASEHOLDER`.

Routing policies allow us to stop overloading the use of the ReadConsistency enum to dictate both how the client should route a request to a server and which kinds of requests should be eligible to be served by a given replica. Routing policies are a client-side only concept. They do not dictate which replicas in a range are eligible to serve the request, only which replicas are considered as targets by the DistSender, and in which order. A request that is routed to an ineligible replica (a function of request type, timestamp, and read consistency) will be rejected by that replica and the DistSender will target another replica in the range.

As discussed in #67725 (review), we will likely need to introduce a third routing policy called `SINGLE_REPLICA` to address #67554. This policy would be accompanied by a ReplicaDescriptor and would specify that a given request must be sent to that replica and DistSender should throw an error if the replica is not part of its cached range descriptor. This is important to ensure that a QueryResolvedTimestampRequest and its follow-up ScanRequest are both sent to the same replica.

Co-authored-by: Nathan VanBenschoten <nvanbenschoten@gmail.com>
sajjadrizvi pushed a commit to sajjadrizvi/cockroach that referenced this issue Aug 10, 2021
Touches cockroachdb#67562.

This commit introduces a new RoutingPolicy configuration that lives on a
BatchRequest header. A request's routing policy specifies how the
request should be routed to the replicas of its target range(s) by the
DistSender. There are initially two routing policies:
```
enum RoutingPolicy {
  // LEASEHOLDER means that the DistSender should route the request to the
  // leaseholder replica(s) of its target range(s).
  LEASEHOLDER = 0;
  // NEAREST means that the DistSender should route the request to the
  // nearest replica(s) of its target range(s).
  NEAREST = 1;
}
```

The default policy is `LEASEHOLDER`.

Routing policies allow us to stop overloading the use of the
ReadConsistency enum to dictate both how the client should route a
request to a server and which kinds of requests should be eligible to be
served by a given replica. Routing policies are a client-side only
concept. They do not dictate which replicas in a range are eligible to
serve the request, only which replicas are considered as targets by the
DistSender, and in which order. A request that is routed to an
ineligible replica (a function of request type, timestamp, and read
consistency) will be rejected by that replica and the DistSender will
target another replica in the range.

As discussed in cockroachdb#67725 (review),
we will likely need to introduce a third routing policy that called
`SINGLE_REPLICA` to address cockroachdb#67554. This policy would be accompanied by
a ReplicaDescriptor and would specify that a given request must be sent
to that replica and DistSender should throw an error if the replica is
not part of its cached range descriptor. This is important to ensure
that a QueryResolvedTimestampRequest and its follow-up ScanRequest are
both sent to the same replica.
nvanbenschoten added a commit to nvanbenschoten/cockroach that referenced this issue Aug 10, 2021
Fixes cockroachdb#67551.
Fixes cockroachdb#67552.
Fixes cockroachdb#67553.
Touches cockroachdb#67562.

Bounded-staleness read orchestration consists of two phases -
negotiation and execution. Negotiation determines the timestamp to run
the query at in order to ensure that the read will not block on
replication or on conflicting transactions. Execution then uses this
timestamp to run the read request.

This commit implements the bounded staleness server-side negotiation
fast-path. This fast-path allows a bounded staleness read request that
lands on a single range to perform its negotiation phase and execution
phase in a single RPC.

The server-side negotiation fast-path provides two benefits:
1. it avoids two network hops in the common-case where a bounded
   staleness read is targeting a single range. This in an important
   performance optimization for single-row point lookups.
2. it provides stronger guarantees around minimizing staleness during
   bounded staleness reads. Bounded staleness reads that hit the
   server-side fast-path use their target replica's most up-to-date
   resolved timestamp, so they are as fresh as possible. Bounded
   staleness reads that miss the fast-path and perform explicit
   negotiation (see below) consult a cache, so they may use an
   out-of-date, suboptimal resolved timestamp, as long as it is fresh
   enough to satisfy the staleness bound of the request.

The commit then uses this new functionality to implement the
`(*Txn).NegotiateAndSend` method detailed in the bounded staleness RFC.
`NegotiateAndSend` is a specialized version of `Send` that is capable of
orchestrating a bounded-staleness read through a transaction, given a
read-only BatchRequest with a `min_timestamp_bound` set in its Header.
If the method returns successfully, the transaction will have been given
a fixed timestamp equal to the timestamp that the read-only request was
evaluated at.
nvanbenschoten added a commit to nvanbenschoten/cockroach that referenced this issue Aug 16, 2021
Fixes cockroachdb#67551.
Fixes cockroachdb#67552.
Fixes cockroachdb#67553.
Touches cockroachdb#67562.

Bounded-staleness read orchestration consists of two phases -
negotiation and execution. Negotiation determines the timestamp to run
the query at in order to ensure that the read will not block on
replication or on conflicting transactions. Execution then uses this
timestamp to run the read request.

This commit implements the bounded staleness server-side negotiation
fast-path. This fast-path allows a bounded staleness read request that
lands on a single range to perform its negotiation phase and execution
phase in a single RPC.

The server-side negotiation fast-path provides two benefits:
1. it avoids two network hops in the common-case where a bounded
   staleness read is targeting a single range. This in an important
   performance optimization for single-row point lookups.
2. it provides stronger guarantees around minimizing staleness during
   bounded staleness reads. Bounded staleness reads that hit the
   server-side fast-path use their target replica's most up-to-date
   resolved timestamp, so they are as fresh as possible. Bounded
   staleness reads that miss the fast-path and perform explicit
   negotiation (see below) consult a cache, so they may use an
   out-of-date, suboptimal resolved timestamp, as long as it is fresh
   enough to satisfy the staleness bound of the request.

The commit then uses this new functionality to implement the
`(*Txn).NegotiateAndSend` method detailed in the bounded staleness RFC.
`NegotiateAndSend` is a specialized version of `Send` that is capable of
orchestrating a bounded-staleness read through a transaction, given a
read-only BatchRequest with a `min_timestamp_bound` set in its Header.
If the method returns successfully, the transaction will have been given
a fixed timestamp equal to the timestamp that the read-only request was
evaluated at.
craig bot pushed a commit that referenced this issue Aug 17, 2021
68194: kv: implement server-side negotiation fast-path and Txn.NegotiateAndSend r=nvanbenschoten a=nvanbenschoten

Fixes #67551.
Fixes #67552.
Fixes #67553.
Touches #67562.

Bounded-staleness read orchestration consists of two phases - negotiation and execution. Negotiation determines the timestamp to run the query at in order to ensure that the read will not block on replication or on conflicting transactions. Execution then uses this timestamp to run the read request.

This commit implements the bounded staleness server-side negotiation fast-path. This fast-path allows a bounded staleness read request that lands on a single range to perform its negotiation phase and execution phase in a single RPC.

The server-side negotiation fast-path provides two benefits:
1. it avoids two network hops in the common-case where a bounded staleness read is targeting a single range. This in an important performance optimization for single-row point lookups.
2. it provides stronger guarantees around minimizing staleness during bounded staleness reads. Bounded staleness reads that hit the server-side fast-path use their target replica's most up-to-date resolved timestamp, so they are as fresh as possible. Bounded staleness reads that miss the fast-path and perform explicit negotiation (see below) consult a cache, so they may use an out-of-date, suboptimal resolved timestamp, as long as it is fresh enough to satisfy the staleness bound of the request.

The commit then uses this new functionality to implement the `(*Txn).NegotiateAndSend` method detailed in the bounded staleness RFC. `NegotiateAndSend` is a specialized version of `Send` that is capable of orchestrating a bounded-staleness read through a transaction, given a read-only BatchRequest with a `min_timestamp_bound` set in its Header. If the method returns successfully, the transaction will have been given a fixed timestamp equal to the timestamp that the read-only request was evaluated at.

Co-authored-by: Nathan VanBenschoten <nvanbenschoten@gmail.com>
@otan otan removed their assignment Aug 24, 2021
nvanbenschoten added a commit to nvanbenschoten/cockroach that referenced this issue Oct 5, 2021
…ollower-reads` suite

Related to cockroachdb#67562.

This commit adds the following two roachtest variants:
- `follower-reads/survival=zone/locality=regional/reads=bounded-staleness/insufficient-quorum`
- `follower-reads/survival=region/locality=regional/reads=bounded-staleness/insufficient-quorum`

These two tests are similar to the other `follower-reads` variants in that they
perform follower reads across 3 different regions on a table in a multi-region
database. In this case, both variants perform bounded staleness reads on a
REGIONAL table using the `with_max_staleness` option. The new addition to this
test is that these variants kill the database's primary region and assert that
bounded staleness reads remain available from outside of that region. This
directly tests one of the major benefits of bounded staleness reads in a way
that we had not yet done in an end-to-end system test like this.

Release note: None
Release justification: testing only
nvanbenschoten added a commit to nvanbenschoten/cockroach that referenced this issue Oct 5, 2021
…ollower-reads` suite

Related to cockroachdb#67562.

This commit adds the following two roachtest variants:
- `follower-reads/survival=zone/locality=regional/reads=bounded-staleness/insufficient-quorum`
- `follower-reads/survival=region/locality=regional/reads=bounded-staleness/insufficient-quorum`

These two tests are similar to the other `follower-reads` variants in that they
perform follower reads across 3 different regions on a table in a multi-region
database. In this case, both variants perform bounded staleness reads on a
REGIONAL table using the `with_max_staleness` option. The new addition to this
test is that these variants kill the database's primary region and assert that
bounded staleness reads remain available from outside of that region. This
directly tests one of the major benefits of bounded staleness reads in a way
that we had not yet done in an end-to-end system test like this.

Release note: None
Release justification: testing only
craig bot pushed a commit that referenced this issue Oct 5, 2021
70716: sql: fix timezone formatting for GMT offsets r=otan a=RichardJCai

Release note (sql change): If the time zone is set in a GMT offset,
for example +7 or -11, the timezone will be formatted as
<+07>-07 and <-11>+11 respectively instead of +7, -11.

This most notably shows up when doing SHOW TIME ZONE.

71122: roachtest: add bounded staleness `insufficient-quorum` variants to `follower-reads` suite r=nvanbenschoten a=nvanbenschoten

Related to #67562.

This commit adds the following two roachtest variants:
- `follower-reads/survival=zone/locality=regional/reads=bounded-staleness/insufficient-quorum`
- `follower-reads/survival=region/locality=regional/reads=bounded-staleness/insufficient-quorum`

These two tests are similar to the other `follower-reads` variants in that they
perform follower reads across 3 different regions on a table in a multi-region
database. In this case, both variants perform bounded staleness reads on a
REGIONAL table using the `with_max_staleness` option. The new addition to this
test is that these variants kill the database's primary region and assert that
bounded staleness reads remain available from outside of that region. This
directly tests one of the major benefits of bounded staleness reads in a way
that we had not yet done in an end-to-end system test like this.

71143: storage: remove unused SSTableInfo types r=nicktrav a=jbowens

Release note: None

Co-authored-by: richardjcai <caioftherichard@gmail.com>
Co-authored-by: Nathan VanBenschoten <nvanbenschoten@gmail.com>
Co-authored-by: Jackson Owens <jackson@cockroachlabs.com>
@github-actions
Copy link

We have marked this issue as stale because it has been inactive for
18 months. If this issue is still relevant, removing the stale label
or adding a comment will keep it active. Otherwise, we'll close it in
10 days to keep the issue queue tidy. Thank you for your contribution
to CockroachDB!

@rytaft rytaft removed their assignment Aug 28, 2023
@yuzefovich yuzefovich added the T-sql-queries SQL Queries Team label May 2, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-kv-transactions Relating to MVCC and the transactional model. A-multiregion Related to multi-region A-sql-execution Relating to SQL execution. A-sql-optimizer SQL logical planning and optimizations. C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) T-multiregion T-sql-queries SQL Queries Team
Projects
Status: Backlog
Development

No branches or pull requests

5 participants