Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sql/opt: add implicit SELECT FOR UPDATE support for UPSERT statements #50180

Closed
nvanbenschoten opened this issue Jun 13, 2020 · 0 comments · Fixed by #53132
Closed

sql/opt: add implicit SELECT FOR UPDATE support for UPSERT statements #50180

nvanbenschoten opened this issue Jun 13, 2020 · 0 comments · Fixed by #53132
Assignees
Labels
A-sql-optimizer SQL logical planning and optimizations. C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception)

Comments

@nvanbenschoten
Copy link
Member

Address this TODO:

// tryApplyImplicitLockingToUpsertInput determines whether or not the builder
// should apply a FOR UPDATE row-level locking mode to the initial row scan of
// an UPSERT statement.
//
// TODO(nvanbenschoten): implement this method to match on appropriate Upsert
// expression trees and apply a row-level locking mode.
func (b *Builder) shouldApplyImplicitLockingToUpsertInput(ups *memo.UpsertExpr) bool {
return false
}

Specifically, UPSERT statements can occasionally hit a fast-path where they perform a blind PutRequest. However, for those that can't (e.g. writing a subset of columns or need to update a secondary index), an initial row scan must be used, just like with UPDATE statements. We should apply a FOR UPDATE row-level locking mode to the initial row scan of an UPSERT statement, in accordance with the enable_implicit_select_for_update session variable.

@nvanbenschoten nvanbenschoten added C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) A-sql-optimizer SQL logical planning and optimizations. labels Jun 13, 2020
@nvanbenschoten nvanbenschoten self-assigned this Jun 13, 2020
nvanbenschoten added a commit to nvanbenschoten/cockroach that referenced this issue Jun 25, 2020
…anges

Fixes cockroachdb#50202.

In cockroachdb#50202, we saw that we would accidentally allow batches to be split
on range boundaries and continue to carry the `CanForwardReadTimestamp`
flag. This can lead to serializability violations under very specific
conditions:
1. the first operation that the transaction performs is a batch of
   locking read requests (due to implicit or explicit SFU)
2. this batch of locking reads spans multiple ranges
3. this batch of locking reads is issued in parallel by the DistSender
4. this locking read hits contention and is bumped on at least one of
   the ranges due to a WriteTooOld error
5. an unreplicated lock from one of the non-refreshed sub-batches is
   lost during a lease transfer.

It turns out that the `kv/contention` roachtest meets these requirements
perfectly when implicit SFU support is added to UPSERT statements: cockroachdb#50180.
It creates a tremendous amount of contention and issues a batch
of locking ScanRequests during a LookupJoin as its first operation. This
materializes as ConditionFailedErrors (which should be impossible) in
the CPuts that the UPSERT issues to maintain the table's secondary
index.

This commit fixes this bug by ensuring that if a batch is going to be
split across ranges and any of its requests would need to refresh on
read timestamp bumps, it does not have its CanForwardReadTimestamp flag
set. It would be incorrect to allow part of a batch to perform a
server-side refresh if another part of the batch might have returned a
different result at the higher timestamp, which is a fancy way of saying
that it needs to refresh because it is using optimistic locking. Such
behavior could cause a transaction to observe an inconsistent snapshot
and violate serializability.

Release note (bug fix): fix a rare bug where a multi-Range SELECT FOR
UPDATE statement containing an IN clause could fail to observe a
consistent snapshot and violate serializability.
nvanbenschoten added a commit to nvanbenschoten/cockroach that referenced this issue Jun 29, 2020
…anges

Fixes cockroachdb#50202.

In cockroachdb#50202, we saw that we would accidentally allow batches to be split
on range boundaries and continue to carry the `CanForwardReadTimestamp`
flag. This can lead to serializability violations under very specific
conditions:
1. the first operation that the transaction performs is a batch of
   locking read requests (due to implicit or explicit SFU)
2. this batch of locking reads spans multiple ranges
3. this batch of locking reads is issued in parallel by the DistSender
4. this locking read hits contention and is bumped on at least one of
   the ranges due to a WriteTooOld error
5. an unreplicated lock from one of the non-refreshed sub-batches is
   lost during a lease transfer.

It turns out that the `kv/contention` roachtest meets these requirements
perfectly when implicit SFU support is added to UPSERT statements: cockroachdb#50180.
It creates a tremendous amount of contention and issues a batch
of locking ScanRequests during a LookupJoin as its first operation. This
materializes as ConditionFailedErrors (which should be impossible) in
the CPuts that the UPSERT issues to maintain the table's secondary
index.

This commit fixes this bug by ensuring that if a batch is going to be
split across ranges and any of its requests would need to refresh on
read timestamp bumps, it does not have its CanForwardReadTimestamp flag
set. It would be incorrect to allow part of a batch to perform a
server-side refresh if another part of the batch might have returned a
different result at the higher timestamp, which is a fancy way of saying
that it needs to refresh because it is using optimistic locking. Such
behavior could cause a transaction to observe an inconsistent snapshot
and violate serializability.

Release note (bug fix): fix a rare bug where a multi-Range SELECT FOR
UPDATE statement containing an IN clause could fail to observe a
consistent snapshot and violate serializability.
craig bot pushed a commit that referenced this issue Jun 30, 2020
50633: kv: unset CanForwardReadTimestamp flag on batches that spans ranges r=nvanbenschoten a=nvanbenschoten

Fixes #50202.

In #50202, we saw that we would accidentally allow batches to be split on range boundaries and continue to carry the `CanForwardReadTimestamp` flag. This can lead to serializability violations under very specific conditions:
1. the first operation that the transaction performs is a batch of locking read requests (due to implicit or explicit SFU)
2. this batch of locking reads spans multiple ranges
3. this batch of locking reads is issued in parallel by the DistSender
4. this locking read hits contention and is bumped on at least one of the ranges due to a WriteTooOld error
5. an unreplicated lock from one of the non-refreshed sub-batches is lost during a lease transfer.

It turns out that the `kv/contention` roachtest meets these requirements perfectly when implicit SFU support is added to UPSERT statements: #50180. It creates a tremendous amount of contention and issues a batch of locking ScanRequests during a LookupJoin as its first operation. This materializes as ConditionFailedErrors (which should be impossible) in the CPuts that the UPSERT issues to maintain the table's secondary index.

This PR fixes this bug by ensuring that if a batch is going to be split across ranges and any of its requests would need to refresh on read timestamp bumps, it does not have its CanForwardReadTimestamp flag set. It would be incorrect to allow part of a batch to perform a server-side refresh if another part of the batch might have returned a different result at the higher timestamp, which is a fancy way of saying that it needs to refresh because it is using optimistic locking. Such behavior could cause a transaction to observe an inconsistent snapshot and violate serializability.

It then adds support for locking scans to kvnemesis, which would have caught to bug fairly easily.

Finally, it fixes a KV API UX issue around locking scans and retry errors. Before this change, it was possible for a non-transactional locking scan (which itself doesn't make much sense) to hit a WriteTooOld  retry error. This was caused by eager propagation of WriteTooOld errors from MVCC when FailOnMoreRecent was enabled for an MVCCScan. I'd appreciate if @itsbilal could give that last commit a review.

Release note (bug fix): fix a rare bug where a multi-Range SELECT FOR UPDATE statement containing an IN clause could fail to observe a consistent snapshot and violate serializability.

Co-authored-by: Nathan VanBenschoten <nvanbenschoten@gmail.com>
nvanbenschoten added a commit to nvanbenschoten/cockroach that referenced this issue Jun 30, 2020
…anges

Fixes cockroachdb#50202.

In cockroachdb#50202, we saw that we would accidentally allow batches to be split
on range boundaries and continue to carry the `CanForwardReadTimestamp`
flag. This can lead to serializability violations under very specific
conditions:
1. the first operation that the transaction performs is a batch of
   locking read requests (due to implicit or explicit SFU)
2. this batch of locking reads spans multiple ranges
3. this batch of locking reads is issued in parallel by the DistSender
4. this locking read hits contention and is bumped on at least one of
   the ranges due to a WriteTooOld error
5. an unreplicated lock from one of the non-refreshed sub-batches is
   lost during a lease transfer.

It turns out that the `kv/contention` roachtest meets these requirements
perfectly when implicit SFU support is added to UPSERT statements: cockroachdb#50180.
It creates a tremendous amount of contention and issues a batch
of locking ScanRequests during a LookupJoin as its first operation. This
materializes as ConditionFailedErrors (which should be impossible) in
the CPuts that the UPSERT issues to maintain the table's secondary
index.

This commit fixes this bug by ensuring that if a batch is going to be
split across ranges and any of its requests would need to refresh on
read timestamp bumps, it does not have its CanForwardReadTimestamp flag
set. It would be incorrect to allow part of a batch to perform a
server-side refresh if another part of the batch might have returned a
different result at the higher timestamp, which is a fancy way of saying
that it needs to refresh because it is using optimistic locking. Such
behavior could cause a transaction to observe an inconsistent snapshot
and violate serializability.

Release note (bug fix): fix a rare bug where a multi-Range SELECT FOR
UPDATE statement containing an IN clause could fail to observe a
consistent snapshot and violate serializability.
craig bot pushed a commit that referenced this issue Aug 20, 2020
50053: Script for the PublishRelease TC build configuration r=pbardea a=jlinder

Before: the script wasn't implemented.

Now:

Part of the new release process, this script

- tags the selected SHA to the provided name
- compiles the binaries and archive and uploads them to S3 as the
  versioned name
- uploads the docker image to docker.io/cockroachdb/cockroach
- pushes the tag to github.com/cockroachdb/cockroach
- push all the artificats to their respective `latest` locations as
  appropriate

Release note: None

51567: kvserver: Allow rebalances between stores on the same nodes. r=lunevalex a=lunevalex

Closes #6782

This change modifies the replica_queue to allow rebalances between multiple stores within a single node. 
This is possible because we previously introduced atomic rebalances in #12768. 

The first step was to remove the constraints in the allocator that prevented same node rebalances and update the validation in the replica_queue to accept these rebalance proposals. There is one caveat that with 1x replication an atomic rebalance is not possible, so we now support adding multiple replicas of the range to the same node under this condition. 

With the constraints removed there would be nothing in the allocator to prevent it from placing multiple replicas of a range on the same node across multiple stores. This is not desired because in a simple 3x replication scenario a single node crash may result in a whole range becoming unavailable. Allocator uses locality tags to model failure domains, but a node was not considered to be a locality. It is thus natural to extend the failure domain definition to the node and model it as a locality tier. Now stores on the same node would be factored into the diversity_score and repel each other, just like nodes in the same datacenter do in a multi-region setup. 

Release note (performance improvement): This change removes the last roadblock to running CockroachDB with multiple stores (i.e. disks) per node. The allocation algorithm now supports intra-node rebalances, which means CRDB can fully utilize the additional stores on the same node.

52754: importccl: speed up revert of IMPORT INTO empty table r=dt a=dt

When IMPORT INTO fails, it reverts the tables to their pre-IMPORT state.
Typically this requires running a somewhat expensive RevertRange operation
that finds the keys written by the IMPORT in amongst all the table data
and deletes just those keys. This is somewhat expensive -- we need to
iterate the keys in the target table and check them to see if they
need to be reverted.

Non-INTO style IMPORTs create the table into which they will IMPORT and
thus can just drop it wholesale on failure, instead of doing this expensive
revert. However INTO-style IMPORTs could use a similarly fast/cheap
wholesale delete *if they knew the table was empty* when the IMPORT was
started.

This change tracks which tables were empty when the IMPORT started and
then deletes, rather than reverts, the table span on failure.

Release note (performance improvement): Cleaning up after a failure during IMPORT INTO a table which was empty is now faster.

53023: opt: add index acceleration support for ~ and && bounding box operators r=rytaft a=rytaft

This commit adds index acceleration support for the bounding box comparison
operators, `~` and `&&`. It maps `~` to Covers and `&&` to Intersects.

Release note (performance improvement): The ~ and && geospatial bounding
box operations can now benefit from index acceleration if one of the
operands is an indexed geometry column.

53049: bulkio: Fix transaction semantics in job scheduler. r=miretskiy a=miretskiy

Fixes #53033
Fixes #52959 

Use transaction when querying for the schedules to run.
In addition, ensure that a single bad schedule does not cause
all of the previous work to be wasted by using transaction savepoints.


Release Notes: None

53132: sql/opt: add implicit SELECT FOR UPDATE support for UPSERT statements r=nvanbenschoten a=nvanbenschoten

Fixes #50180.

This commit adds support for implicit SELECT FOR UPDATE support for UPSERT statements with a VALUES clause. This should improve throughput and latency for contended UPSERT statements in much the same way that 435fa43 did for UPDATE statements. However, this only has an effect on UPSERT statements into tables with multiple indexes because UPSERT statements into single-index tables hit a fast-path where they perform a blind-write without doing an initial row scan.

Conceptually, if we picture an UPSERT statement as the composition of a SELECT statement and an INSERT statement (with loosened semantics around existing rows) then this change performs the following transformation:
```
UPSERT t = SELECT FROM t + INSERT INTO t
=>
UPSERT t = SELECT FROM t FOR UPDATE + INSERT INTO t
```

I plan to test this out on a contended `indexes` workload at some point in the future.

Release note (sql change): UPSERT statements now acquire locks using the FOR UPDATE locking mode during their initial row scan, which improves performance for contended workloads when UPSERTing into tables with multiple indexes. This behavior is configurable using the enable_implicit_select_for_update session variable and the sql.defaults.implicit_select_for_update.enabled cluster setting.

Co-authored-by: James H. Linder <jamesl@cockroachlabs.com>
Co-authored-by: Alex Lunev <alexl@cockroachlabs.com>
Co-authored-by: David Taylor <tinystatemachine@gmail.com>
Co-authored-by: Rebecca Taft <becca@cockroachlabs.com>
Co-authored-by: Yevgeniy Miretskiy <yevgeniy@cockroachlabs.com>
Co-authored-by: Nathan VanBenschoten <nvanbenschoten@gmail.com>
@craig craig bot closed this as completed in 471e856 Aug 20, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-sql-optimizer SQL logical planning and optimizations. C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception)
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant