sql: add non-trivial implementations of FOR UPDATE, FOR NO KEY UPDATE, FOR SHARE, FOR NO KEY SHARE #40205

jordanlewis · 2019-08-26T03:16:45Z

#40206 added parsing for the various row-level lock modes, and treats them like no-ops. This is acceptable for CockroachDB, which only supports SERIALIZABLE transactions, because from a client's perspective there's no way to observe whether any particular transaction needed to be restarted because of a conflict on the rows that were declared as FOR UPDATE, or because of some other conflict - the transaction will have to be restarted in either case.

This issue tracks improving this behavior for performance reasons. There's ongoing work in core to support at least some of these semantics.

The text was updated successfully, but these errors were encountered:

40206: sql: add row lock modes; make FOR UPDATE a no-op r=jordanlewis a=jordanlewis This commit adds parsing for the row locking modes: FOR UPDATE, FOR NO KEY UPDATE, FOR SHARE, FOR KEY UPDATE. All return unimplemented errors, except for FOR UPDATE, which behaves as a no-op. Closes #6583. Added #40205 to track the remaining kinds of row locking. Release note (sql change): support parsing the FOR UPDATE modifier on SELECT clauses, treating it as a no-op, since CockroachDB's transactions only operate in SERIALIZABLE mode. Co-authored-by: Jordan Lewis <jordanthelewis@gmail.com>

The spreadsheet we discussed is unwieldy - hard to edit and impossible to keep up to date. If we write down blacklists in code, then we can use an approach like this to always have an up to date aggregation. So far it seems like there's just a lot of unknowns to categorize still. The output today: ``` === RUN TestBlacklists 648: unknown (unknown) 493: cockroachdb#5807 (sql: Add support for TEMP tables) 151: cockroachdb#17511 (sql: support stored procedures) 86: cockroachdb#26097 (sql: make TIMETZ more pg-compatible) 56: cockroachdb#10735 (sql: support SQL savepoints) 55: cockroachdb#32552 (multi-dim arrays) 55: cockroachdb#26508 (sql: restricted DDL / DML inside transactions) 52: cockroachdb#32565 (sql: support optional TIME precision) 39: cockroachdb#243 (roadmap: Blob storage) 33: cockroachdb#26725 (sql: support postgres' API to handle blob storage (incl lo_creat, lo_from_bytea)) 31: cockroachdb#27793 (sql: support custom/user-defined base scalar (primitive) types) 24: cockroachdb#12123 (sql: Can't drop and replace a table within a transaction) 24: cockroachdb#26443 (sql: support user-defined schemas between database and table) 20: cockroachdb#21286 (sql: Add support for geometric types) 18: cockroachdb#6583 (sql: explicit lock syntax (SELECT FOR {SHARE,UPDATE} {skip locked,nowait})) 17: cockroachdb#22329 (Support XA distributed transactions in CockroachDB) 16: cockroachdb#24062 (sql: 32 bit SERIAL type) 16: cockroachdb#30352 (roadmap:when CockroachDB will support cursor?) 12: cockroachdb#27791 (sql: support RANGE types) 8: cockroachdb#40195 (pgwire: multiple active result sets (portals) not supported) 8: cockroachdb#6130 (sql: add support for key watches with notifications of changes) 5: Expected Failure (unknown) 5: cockroachdb#23468 (sql: support sql arrays of JSONB) 5: cockroachdb#40854 (sql: set application_name from connection string) 4: cockroachdb#35879 (sql: `default_transaction_read_only` should also accept 'on' and 'off') 4: cockroachdb#32610 (sql: can't insert self reference) 4: cockroachdb#40205 (sql: add non-trivial implementations of FOR UPDATE, FOR NO KEY UPDATE, FOR SHARE, FOR NO KEY SHARE) 4: cockroachdb#35897 (sql: unknown function: pg_terminate_backend()) 4: cockroachdb#4035 (sql/pgwire: missing support for row count limits in pgwire) 3: cockroachdb#27796 (sql: support user-defined DOMAIN types) 3: cockroachdb#3781 (sql: Add Data Type Formatting Functions) 3: cockroachdb#40476 (sql: support `FOR {UPDATE,SHARE} {SKIP LOCKED,NOWAIT}`) 3: cockroachdb#35882 (sql: support other character sets) 2: cockroachdb#10028 (sql: Support view queries with star expansions) 2: cockroachdb#35807 (sql: INTERVAL output doesn't match PG) 2: cockroachdb#35902 (sql: large object support) 2: cockroachdb#40474 (sql: support `SELECT ... FOR UPDATE OF` syntax) 1: cockroachdb#18846 (sql: Support CIDR column type) 1: cockroachdb#9682 (sql: implement computed indexes) 1: cockroachdb#31632 (sql: FK options (deferrable, etc)) 1: cockroachdb#24897 (sql: CREATE OR REPLACE VIEW) 1: pass? (unknown) 1: cockroachdb#36215 (sql: enable setting standard_conforming_strings to off) 1: cockroachdb#32562 (sql: support SET LOCAL and txn-scoped session variable changes) 1: cockroachdb#36116 (sql: psychopg: investigate how `'infinity'::timestamp` is presented) 1: cockroachdb#26732 (sql: support the binary operator: <int> / <float>) 1: cockroachdb#23299 (sql: support coercing string literals to arrays) 1: cockroachdb#36115 (sql: psychopg: investigate if datetimetz is being returned instead of datetime) 1: cockroachdb#26925 (sql: make the CockroachDB integer types more compatible with postgres) 1: cockroachdb#21085 (sql: WITH RECURSIVE (recursive common table expressions)) 1: cockroachdb#36179 (sql: implicity convert date to timestamp) 1: cockroachdb#36118 (sql: Cannot parse '24:00' as type time) 1: cockroachdb#31708 (sql: support current_time) ``` Release justification: non-production change Release note: None

41252: roachtest: add test that aggregates orm blacklist failures r=jordanlewis a=jordanlewis The spreadsheet we discussed is unwieldy - hard to edit and impossible to keep up to date. If we write down blacklists in code, then we can use an approach like this to always have an up to date aggregation. So far it seems like there's just a lot of unknowns to categorize still. The output today: ``` === RUN TestBlacklists 648: unknown (unknown) 493: #5807 (sql: Add support for TEMP tables) 151: #17511 (sql: support stored procedures) 86: #26097 (sql: make TIMETZ more pg-compatible) 56: #10735 (sql: support SQL savepoints) 55: #32552 (multi-dim arrays) 55: #26508 (sql: restricted DDL / DML inside transactions) 52: #32565 (sql: support optional TIME precision) 39: #243 (roadmap: Blob storage) 33: #26725 (sql: support postgres' API to handle blob storage (incl lo_creat, lo_from_bytea)) 31: #27793 (sql: support custom/user-defined base scalar (primitive) types) 24: #12123 (sql: Can't drop and replace a table within a transaction) 24: #26443 (sql: support user-defined schemas between database and table) 20: #21286 (sql: Add support for geometric types) 18: #6583 (sql: explicit lock syntax (SELECT FOR {SHARE,UPDATE} {skip locked,nowait})) 17: #22329 (Support XA distributed transactions in CockroachDB) 16: #24062 (sql: 32 bit SERIAL type) 16: #30352 (roadmap:when CockroachDB will support cursor?) 12: #27791 (sql: support RANGE types) 8: #40195 (pgwire: multiple active result sets (portals) not supported) 8: #6130 (sql: add support for key watches with notifications of changes) 5: Expected Failure (unknown) 5: #23468 (sql: support sql arrays of JSONB) 5: #40854 (sql: set application_name from connection string) 4: #35879 (sql: `default_transaction_read_only` should also accept 'on' and 'off') 4: #32610 (sql: can't insert self reference) 4: #40205 (sql: add non-trivial implementations of FOR UPDATE, FOR NO KEY UPDATE, FOR SHARE, FOR NO KEY SHARE) 4: #35897 (sql: unknown function: pg_terminate_backend()) 4: #4035 (sql/pgwire: missing support for row count limits in pgwire) 3: #27796 (sql: support user-defined DOMAIN types) 3: #3781 (sql: Add Data Type Formatting Functions) 3: #40476 (sql: support `FOR {UPDATE,SHARE} {SKIP LOCKED,NOWAIT}`) 3: #35882 (sql: support other character sets) 2: #10028 (sql: Support view queries with star expansions) 2: #35807 (sql: INTERVAL output doesn't match PG) 2: #35902 (sql: large object support) 2: #40474 (sql: support `SELECT ... FOR UPDATE OF` syntax) 1: #18846 (sql: Support CIDR column type) 1: #9682 (sql: implement computed indexes) 1: #31632 (sql: FK options (deferrable, etc)) 1: #24897 (sql: CREATE OR REPLACE VIEW) 1: pass? (unknown) 1: #36215 (sql: enable setting standard_conforming_strings to off) 1: #32562 (sql: support SET LOCAL and txn-scoped session variable changes) 1: #36116 (sql: psychopg: investigate how `'infinity'::timestamp` is presented) 1: #26732 (sql: support the binary operator: <int> / <float>) 1: #23299 (sql: support coercing string literals to arrays) 1: #36115 (sql: psychopg: investigate if datetimetz is being returned instead of datetime) 1: #26925 (sql: make the CockroachDB integer types more compatible with postgres) 1: #21085 (sql: WITH RECURSIVE (recursive common table expressions)) 1: #36179 (sql: implicity convert date to timestamp) 1: #36118 (sql: Cannot parse '24:00' as type time) 1: #31708 (sql: support current_time) ``` Release justification: non-production change Release note: None Co-authored-by: Jordan Lewis <jordanthelewis@gmail.com>

nvanbenschoten · 2020-01-13T23:03:57Z

The issue lists four lock types, which are documented in PG's docs here. Here are a few takeways from thinking about how these will play into CRDB in the future:

CRDB does not distinguish between locking on a KV's key and locking its key and value. Because of that, I think it makes sense to upgrade all FOR NO KEY [SHARE/UPDATE] variants to its corresponding FOR [SHARE/UPDATE] locking mode. This is correct per PG's lock conflict table. The only false conflict it introduces is between FOR NO KEY UPDATE and FOR KEY SHARE. I don't think this is worth worrying about, especially because of the next point.
We can leave the FOR SHARE locking mode as a no-op, as CRDB's default concurrency control scheme is already similar to this. We wouldn't want to upgrade these locking modes to FOR UPGRADE and introduce synthetic conflicts between multiple FOR SHARE locking modes. This might cause deadlocks in user applications.
If/when we eventually introduce read-locks at the KV level, the FOR SHARE locking mode should line up with their semantics.

In addition to supporting these locking modes, we'll also want to use the same approach that Postgres uses for using these locking modes in UPDATE and DELETE statements. From Postgres' docs:

The FOR UPDATE lock mode is also acquired by any DELETE on a row, and also by an UPDATE that modifies the values on certain columns. Currently, the set of columns considered for the UPDATE case are those that have a unique index on them that can be used in a foreign key (so partial indexes and expressional indexes are not considered), but this may change in the future.

CockroachDB implements serializability, so this is not needed for correctness, but acquiring these locks implicitly in UPDATE and DELETE statements has the potential to:

reduce transaction retries
improve performance by reducing thrashing and enforcing better queueing

Relates to cockroachdb#40205. This change updates optbuilder to propagate row-level locking modes through the *scope object to achieve the same locking clause scoping that Postgres contains. The change propagates the locking mode all the way down to the ScanExpr level, where a single locking mode becomes an optional property of a Scan. With proper scoping rules, the change also improves upon the locking validation introduced in cockroachdb#43887. This completes the SELECT FOR [KEY] SHARE/UPDATE optbuilder interactions. The next step will be to hook the ScanExpr locking property into the execbuilder. Release note (sql change): More invalid usages of FOR UPDATE locking clauses are now rejected by the SQL optimizer.

sumeerbhola · 2020-01-16T14:05:58Z

CRDB does not distinguish between locking on a KV's key and locking its key and value. Because of that, I think it makes sense to upgrade all FOR NO KEY [SHARE/UPDATE] variants to its corresponding FOR [SHARE/UPDATE] locking mode. This is correct per PG's lock conflict table. The only false conflict it introduces is between FOR NO KEY UPDATE and FOR NO KEY SHARE. I don't think this is worth worrying about, especially because of the next point.

nit: I assume you meant FOR KEY SHARE.

We can leave the FOR SHARE locking mode as a no-op, as CRDB's default concurrency control scheme is already similar to this. We wouldn't want to upgrade these locking modes to FOR UPGRADE and introduce synthetic conflicts between multiple FOR SHARE locking modes. This might cause deadlocks in user applications.

If/when we eventually introduce read-locks at the KV level, the FOR SHARE locking mode should line up with their semantics.

I'm not clear about how not "worth worrying about" is not contradicted by bullet 3 -- if upgrading KEY SHARE to SHARE (for a transaction that is reading keys) and NO KEY UPDATE to UPDATE (for a transaction that is updating the non key columns) can introduce deadlocks when FOR SHARE is implemented with locking, isn't that something to be concerned about?

nvanbenschoten · 2020-01-16T14:35:06Z

nit: I assume you meant FOR KEY SHARE.

Yes, updated.

isn't that something to be concerned about?

I had the same thought when writing this. There is a bit of a contradiction in this reasoning, as you pointed out. We don't want to weaken these locking modes any further because then they become no-ops and not useful to expose. However, we don't want to strengthen them because then that presents the possibility of creating unexpected deadlocks for unsuspecting users. We also likely don't want to actually support them properly as that would be a large undertaking.

The way I've been reconciling this is that in my mind there are two primary row-level locking classes (SHARE and UPDATE) and each class contains two locking modes differentiated by the KEY vs. NO KEY distinction. This secondary distinction feels like a Postgres implementation detail. For instance, there is no such secondary distinction in MySQL: https://dev.mysql.com/doc/refman/8.0/en/innodb-locking-reads.html. It also doesn't map to CockroachDB today.

So I think we have two choices here:

reject the KEY/NO KEY specifier and only accept the FOR SHARE and FOR UPDATE. This isn't a terrible option as I highly doubt people use KEY/NO KEY frequently in practice. However, it does prevent people who currently do use these specifiers from running on CRDB without modification to their code.
ignore uses of the KEY/NO KEY specifier and document that it is ignored and that only the primary row-level locking classes are considered. This does create the opportunity for deadlock if a user has built their application in such a way that they are using these clauses for correctness and not just performance. However, it does not force people who are using the specifiers for performance only from needing to modify their code.

I think a strong argument could be made for both approached. @rafiss I'm curious if you have an opinion here.

Also, to add a little more context here, KEY/NO KEY were added to PG in this commit: postgres/postgres@0ac5ad5 and the motivation was primarily so that they could be used internally by PG to improve concurrency between UPDATEs that did not modify the key-portion of a row and FK checks. Interestingly, our attempt to solve the same problem is through the use of column families. To this day, PG does still occasionally ignore the KEY/NO KEY specifiers and collapse locking modes along locking classes. This is most common when interfacing with foreign data wrappers: https://github.com/postgres/postgres/blob/1281a5c907b41e992a66deb13c3aa61888a62268/contrib/postgres_fdw/deparse.c#L1274.

rafiss · 2020-01-16T19:46:23Z

I did a very unscientific Github search to see how often the various syntaxes appear in public codebases:

SELECT FROM "FOR UPDATE" - 523,512
SELECT FROM "FOR NO KEY UPDATE" - 4,602
SELECT FROM "FOR SHARE" - 112,272
SELECT FROM "FOR KEY SHARE" - 8,796

That is useful data that suggests that many people probably would not be affected by either choice. But apart from that, another consideration is to figure out if any ORMs generate queries with the KEY/NO KEY specifiers. I would guess none do it by default, and those that do support it probably see similar proportions of usage as above. For what it's worth, looking at some of the ORMs we are focusing on right now: Hibernate does not support it but has an open issue for it, and Django also does not support it but has an open issue/PR. For both of these we are creating a CockroachDB dialect/adapter, so if these issues ever do get implemented we should still be able to avoid using the KEY/NO KEY specifiers with CockroachDB, if necessary.

So I think my vote is for option 1 -- rejecting the KEY/NO KEY specifiers, and also adding telemetry on how often the unimplemented syntax is used. I worry that ignoring them (even with documentation elsewhere) will not be obvious to users, and I would claim that deadlocks happening randomly at runtime are worse than syntax errors happening deterministically at runtime. Though I'm definitely open to being convinced otherwise, and my position isn't held very strongly.

I do want other stakeholders to weigh in. My understanding is that we usually only go with the "accept the syntax but ignore it" approach if the syntax in question has no meaning in CockroachDB and causes no behavior change... which arguably does apply in this case.

nvanbenschoten · 2020-01-16T22:42:46Z

Thanks for gathering those numbers and finding references to this in Hibernate and Django Rafi!

I also have conflicted opinions here, but I'm actually leaning towards option 2 for the following reasons:

after looking at some of the uses in Github, it looks like this will come up almost exclusively in ORM test suites. Rejecting these modes will prevent some tests from succeeding when they otherwise could.
The Github search also confirmed that there is precedence for treating these two strengths the same. See: "This is still correct but less efficient (will conflict more often than required)."
the two ORM discussions you posted confirm that people look at these primarily as performance optimizations.
I suspect that any developer who would actually depend on the subtleties of these locking modes would be more comfortable reading our documentation on it than developers who observe them being rejected and don't understand the difference between the modes.
It's worth keeping in mind that we do have transaction deadlock detection, so this wouldn't cause an application to deadlock indefinitely. Instead, applications would get transaction aborted errors if their use of these locking modes does cause deadlocks.

Also, keep in mind that for the next release we won't even perform any FOR SHARE locking at all, so we don't need to resolve this immediately.

Relates to cockroachdb#40205. This change updates optbuilder to propagate row-level locking modes on the stack to achieve the same locking clause scoping that Postgres contains. The change propagates the locking mode all the way down to the ScanExpr level, where a single locking mode becomes an optional property of a Scan. With proper scoping rules, the change also improves upon the locking validation introduced in cockroachdb#43887. This completes the SELECT FOR [KEY] SHARE/UPDATE optbuilder interactions. The next step will be to hook the ScanExpr locking property into the execbuilder. Release note (sql change): More invalid usages of FOR UPDATE locking clauses are now rejected by the SQL optimizer.

44015: sql/opt/optbuilder: propagate FOR UPDATE locking in scope to ScanExprs r=nvanbenschoten a=nvanbenschoten Relates to #40205. This change updates optbuilder to propagate row-level locking modes through the `*scope` object to achieve the same locking clause scoping that Postgres contains. The change propagates the locking mode all the way down to the `ScanExpr` level, where a single locking mode becomes an optional property of a Scan. With proper scoping rules, the change also improves upon the locking validation introduced in #43887. This completes the SELECT FOR [KEY] SHARE/UPDATE optbuilder interactions. The next step will be to hook the ScanExpr locking property into the execbuilder. Co-authored-by: Nathan VanBenschoten <nvanbenschoten@gmail.com>

Relates to cockroachdb#40205. This change introduces a new option to `MVCCGetOptions` and `MVCCScanOptions` that causes reads to throw an error if they observe an MVCC version at a timestamp above their read timestamp. Specifically, when the option is enabled, a `WriteTooOldError` will be returned if a scan observes an MVCC version with a timestamp above the read. Similarly, a `WriteIntentError` will be returned if a scan observes another transaction's intent, even if that intent has a timestamp above the scan's read timestamp. This option will be used in the future by `ScanRequests` and `ReverseScanRequests` that intend to acquire unreplicated key-level locks, which will power `SELECT FOR UPDATE`. These scans do not want to ignore existing MVCC versions at higher timestamps like traditional scans do. Instead, they want to throw errors and force their transaction to increase its timestamp through either a refresh or a retry. This was previously prototyped in 4a8e8dc. Interestingly, this is not new logic to the MVCC layer. This behavior is exactly the same as that of the initial key-value lookup performed during MVCC writes (see `mvccPutInternal`). It's fitting that behavior needed for `SELECT FOR UPDATE` would mirror that already exhibited by the read portion of read-write operations. This also hints at an opportunity to potentially use this option to merge the two implementations and get rid of custom logic like `mvccGetInternal` that only exists on the write path. We'd need to be careful about doing so though, as this code is heavily tuned. Release note: None

44473: storage/engine: expose FailOnMoreRecent MVCC{Get/Scan}Option r=nvanbenschoten a=nvanbenschoten Relates to #40205. This change introduces a new option to `MVCCGetOptions` and `MVCCScanOptions` that causes reads to throw an error if they observe an MVCC version at a timestamp above their read timestamp. Specifically, when the option is enabled, a `WriteTooOldError` will be returned if a scan observes an MVCC version with a timestamp above the read. Similarly, a `WriteIntentError` will be returned if a scan observes another transaction's intent, even if that intent has a timestamp above the scan's read timestamp. This option will be used in the future by `ScanRequests` and `ReverseScanRequests` that intend to acquire unreplicated key-level locks, which will power `SELECT FOR UPDATE`. These scans do not want to ignore existing MVCC versions at higher timestamps like traditional scans do. Instead, they want to throw errors and force their transaction to increase its timestamp through either a refresh or a retry. This was previously prototyped in 4a8e8dc. Interestingly, this is not new logic to the MVCC layer. This behavior is exactly the same as that of the initial key-value lookup performed during MVCC writes (see `mvccPutInternal`). It's fitting that behavior needed for `SELECT FOR UPDATE` would mirror that already exhibited by the read portion of read-write operations. This also hints at an opportunity to potentially use this option to merge the two implementations and get rid of custom logic like `mvccGetInternal` that only exists on the write path. We'd need to be careful about doing so though, as this code is heavily tuned. Release note: None Co-authored-by: Nathan VanBenschoten <nvanbenschoten@gmail.com>

Relates to cockroachdb#40205. This change adds FOR UPDATE locking mode information to the EXPLAIN output of scanNodes. This is useful for testing that locking modes are being correctly propagated when a FOR [KEY] UPDATE/SHARE locking clause is specified. It will be even more useful for testing that locking clauses are being implicitly applied to UPDATE, UPSERT, and DELETE statements when we start performing that implicit transformation to their underlying scans.

44790: sql: add FOR UPDATE locking modes to EXPLAIN output r=nvanbenschoten a=nvanbenschoten Relates to #40205. This change adds FOR UPDATE locking mode information to the EXPLAIN output of scanNodes. This is useful for testing that locking modes are being correctly propagated when a FOR [KEY] UPDATE/SHARE locking clause is specified. It will be even more useful for testing that locking clauses are being implicitly applied to UPDATE, UPSERT, and DELETE statements when we start performing that implicit transformation to their underlying scans. We currently reject both non-standard locking wait-policies, so we can't easily test that code. I'm open to suggestions to address that, but I also don't think it's a very big issue at the moment and I don't think it makes sense to hold off on making the change while we're already here. If/when we let non-standard locking wait policies through the optimizer, we'll be able to add similar tests for them. Co-authored-by: Nathan VanBenschoten <nvanbenschoten@gmail.com>

Related to cockroachdb#41720. Related to cockroachdb#44976. This commit integrates the new concurrency package into the storage package. Each Replica is given a concurrency manager, which replaces its existing latch manager and txn wait queue. The change also uses the concurrency manager to simplify the role of the intent resolver. The intent resolver no longer directly handles WriteIntentErrors. As a result, we are able to delete the contention queue entirely. With this change, all requests are now sequenced through the concurrency manager. When sequencing, latches are acquired and conflicting locks are detected. If any locks are found, the requests wait in lock wait-queues for the locks to be resolved. This is a major deviation from how things currently work because today, even with the contention queue, requests end up waiting for conflicting transactions to commit/abort in the txnWaitQueue after at least one RPC. Now, requests wait directly next to the intents/locks that they are waiting on and react directly to the resolution of these intents/locks. Once requests are sequenced by the concurrency manager, they are theoretically fully isolated from all conflicting requests. However, this is not strictly true today because we have not yet pulled all replicated locks into the concurrency manager's lock table. We will do so in a future change. Until then, the concurrency manager maintains a notion of "intent discovery", which is integrated into the Replica-level concurrency retry loop. Performance numbers will be published shortly. This will be followed by performance numbers using the SELECT FOR UPDATE locking (cockroachdb#40205) improvements that this change enables.

45269: importccl: Parallelize avro import r=miretskiy a=miretskiy Parallelize avro importer to improve its throughput (2.8x improvement). Touches #40374. Fixes #45097. Release notes (performance): Faster avro import 45482: storage: integrate Concurrency Manager into Replica request path r=nvanbenschoten a=nvanbenschoten Related to #41720. Related to #44976. This commit integrates the new concurrency package into the storage package. Each Replica is given a concurrency manager, which replaces its existing latch manager and txn wait queue. The change also uses the concurrency manager to simplify the role of the intent resolver. The intent resolver no longer directly handles WriteIntentErrors. As a result, we are able to delete the contention queue entirely. With this change, all requests are now sequenced through the concurrency manager. When sequencing, latches are acquired and conflicting locks are detected. If any locks are found, the requests wait in lock wait-queues for the locks to be resolved. This is a major deviation from how things currently work because today, even with the contention queue, requests end up waiting for conflicting transactions to commit/abort in the txnWaitQueue after at least one RPC. Now, requests wait directly next to the intents/locks that they are waiting on and react directly to the resolution of these intents/locks. Once requests are sequenced by the concurrency manager, they are theoretically fully isolated from all conflicting requests. However, this is not strictly true today because we have not yet pulled all replicated locks into the concurrency manager's lock table. We will do so in a future change. Until then, the concurrency manager maintains a notion of "intent discovery", which is integrated into the Replica-level concurrency retry loop. Performance numbers will be published shortly. This will be followed by performance numbers using the SELECT FOR UPDATE locking (#40205) improvements that this change enables. 45484: sql: simplify connection state machine - stop tracking retry intent r=andreimatei a=andreimatei Before this patch, the SQL connection state machine had an optimization: if a transaction that hadn't used "SAVEPOINT cockroach_restart" encountered a retriable error that we can't auto-retry, then we'd release the txn's locks eagerly and enter the Aborted state. As opposed to transactions that had used the "SAVEPOINT cockroach_restart", which go to RestartWait. This optimization is a significant complication for the state machine, so this patch is removing it. All transactions now go to RestartWait, and wait for a ROLLBACK to release the locks. On the flip side, doing "RELEASE SAVEPOINT cockroach_restart" and "ROLLBACK SAVEPOINT cockroach_restart" now works even for transactions that haven't explicitly declared that savepoint, which is nice. Although I don't promise I'll keep it working. Release note: None Co-authored-by: Yevgeniy Miretskiy <yevgeniy@cockroachlabs.com> Co-authored-by: Nathan VanBenschoten <nvanbenschoten@gmail.com> Co-authored-by: Andrei Matei <andrei@cockroachlabs.com>

Closes cockroachdb#40205. Informs cockroachdb#41720. This change teaches the KV client and the KV API about unreplicated locks. It then adds a KeyLocking mode to ScanRequest and ReverseScanRequest, which allows their users to select the locking strength that they would like the scan to use. This locking strength defaults to None, which corresponds to the current behavior. However, some users will want to acquire locks on each row scanned, which is now possible by setting the locking strength to a stronger level. For now, only the Exclusive strength is supported. The change then revisits SQL's row-level locking support, which is supported all the way down to the row fetcher for implicit (e.g. UPDATE) and explicit (e.g. SELECT ... FOR UPDATE) upgrade locking. The change uses the new key-locking functionality in the KV API to hook up row-level locking, completing the integration of SELECT FOR UPDATE with the KV layer and, in particular, the new lock-table structure. cockroachdb#43775 described the three main benefits of this change: - higher throughput under contention - lower latency and improved fairness under contention - a reduction in transaction retries under contention I've revisited those results a few times in the last two months and seen that the results continue to hold, and in some cases they have improved. I intend to update this PR with a more complete analysis of its impact on those three areas. Release note (sql change): SELECT FOR UPDATE now hooks into a new leaseholder-only locking mechanism. This allows the feature to be used to improve performance of transactional that read, modify, and write contended to rows. Similarly, UPDATE statements now use this new mechanism by default, meaning that their performance under contention is improved.

Closes cockroachdb#40205. Informs cockroachdb#41720. This change teaches the KV client and the KV API about unreplicated locks. It then adds a KeyLocking mode to ScanRequest and ReverseScanRequest, which allows their users to select the locking strength that they would like the scan to use. This locking strength defaults to None, which corresponds to the current behavior. However, some users will want to acquire locks on each row scanned, which is now possible by setting the locking strength to a stronger level. For now, only the Exclusive strength is supported. The change then revisits SQL's row-level locking support, which is supported all the way down to the row fetcher for implicit (e.g. UPDATE) and explicit (e.g. SELECT ... FOR UPDATE) upgrade locking. The change uses the new key-locking functionality in the KV API to hook up row-level locking, completing the integration of SELECT FOR UPDATE with the KV layer and, in particular, the new lock-table structure. cockroachdb#43775 described the three main benefits of this change: - higher throughput under contention - lower latency and improved fairness under contention - a reduction in transaction retries under contention I've revisited those results a few times in the last two months and seen that the results continue to hold, and in some cases they have improved. I intend to update this PR with a more complete analysis of its impact on those three areas. Release note (sql change): SELECT FOR UPDATE now hooks into a new leaseholder-only locking mechanism. This allows the feature to be used to improve performance of transactional that read, modify, and write contended to rows. Similarly, UPDATE statements now use this new mechanism by default, meaning that their performance under contention is improved. This is only enabled for UPDATE statements that can push their filter all the way into their key-value scan. To determine whether an UPDATE statement is implicitly using SELECT FOR UPDATE locking, look for a "locking strength" field in the EXPLAIN output for the statement.

Closes cockroachdb#40205. Informs cockroachdb#41720. This change teaches the KV client and the KV API about unreplicated locks. It then adds a KeyLocking mode to ScanRequest and ReverseScanRequest, which allows their users to select the locking strength that they would like the scan to use. This locking strength defaults to None, which corresponds to the current behavior. However, some users will want to acquire locks on each row scanned, which is now possible by setting the locking strength to a stronger level. For now, only the Exclusive strength is supported. The change then revisits SQL's row-level locking support, which is supported all the way down to the row fetcher for implicit (e.g. UPDATE) and explicit (e.g. SELECT ... FOR UPDATE) upgrade locking. The change uses the new key-locking functionality in the KV API to hook up row-level locking, completing the integration of SELECT FOR UPDATE with the KV layer and, in particular, the new lock-table structure. cockroachdb#43775 described the three main benefits of this change: - higher throughput under contention - lower latency and improved fairness under contention - a reduction in transaction retries under contention I've revisited those results a few times in the last two months and seen that the results continue to hold, and in some cases they have improved. I intend to update this PR with a more complete analysis of its impact on those three areas. Release note (sql change): SELECT FOR UPDATE now hooks into a new leaseholder-only locking mechanism. This allows the feature to be used to improve performance of transactions that read, modify, and write contended to rows. Similarly, UPDATE statements now use this new mechanism by default, meaning that their performance under contention is improved. This is only enabled for UPDATE statements that can push their filter all the way into their key-value scan. To determine whether an UPDATE statement is implicitly using SELECT FOR UPDATE locking, look for a "locking strength" field in the EXPLAIN output for the statement.

jordanlewis added C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) A-sql-pgcompat Semantic compatibility with PostgreSQL X-anchored-telemetry The issue number is anchored by telemetry references. labels Aug 26, 2019

jordanlewis mentioned this issue Aug 26, 2019

sql: add row lock modes; make FOR UPDATE a no-op #40206

Merged

jordanlewis changed the title ~~sql: implement FOR NO KEY UPDATE, FOR SHARE, FOR NO KEY SHARE~~ sql: add non-trivial implementations of FOR UPDATE, FOR NO KEY UPDATE, FOR SHARE, FOR NO KEY SHARE Aug 27, 2019

nvanbenschoten self-assigned this Jan 13, 2020

nvanbenschoten mentioned this issue Jan 15, 2020

sql/opt/optbuilder: propagate FOR UPDATE locking in scope to ScanExprs #44015

Merged

nvanbenschoten mentioned this issue Jan 15, 2020

sql: support FOR {UPDATE,SHARE} {SKIP LOCKED,NOWAIT} #40476

Closed

nvanbenschoten mentioned this issue Jan 29, 2020

storage/engine: expose FailOnMoreRecent MVCC{Get/Scan}Option #44473

Merged

nvanbenschoten mentioned this issue Feb 6, 2020

sql: add FOR UPDATE locking modes to EXPLAIN output #44790

Merged

nvanbenschoten mentioned this issue Feb 11, 2020

storage/concurrency: prep lock-table for 20.1 release #44976

Closed

35 tasks

nvanbenschoten mentioned this issue Feb 26, 2020

storage: integrate Concurrency Manager into Replica request path #45482

Merged

nvanbenschoten mentioned this issue Mar 4, 2020

kv: support unreplicated locks, hook up SELECT FOR UPDATE end-to-end #45701

Merged

nvanbenschoten closed this as completed in #45701 Mar 6, 2020

nvanbenschoten mentioned this issue Aug 5, 2020

sql/kv: support FOR NO KEY UPDATE and FOR KEY SHARE #52420

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

sql: add non-trivial implementations of FOR UPDATE, FOR NO KEY UPDATE, FOR SHARE, FOR NO KEY SHARE #40205

sql: add non-trivial implementations of FOR UPDATE, FOR NO KEY UPDATE, FOR SHARE, FOR NO KEY SHARE #40205

jordanlewis commented Aug 26, 2019 •

edited

Loading

nvanbenschoten commented Jan 13, 2020 •

edited

Loading

sumeerbhola commented Jan 16, 2020

nvanbenschoten commented Jan 16, 2020

rafiss commented Jan 16, 2020 •

edited

Loading

nvanbenschoten commented Jan 16, 2020

sql: add non-trivial implementations of FOR UPDATE, FOR NO KEY UPDATE, FOR SHARE, FOR NO KEY SHARE #40205

sql: add non-trivial implementations of FOR UPDATE, FOR NO KEY UPDATE, FOR SHARE, FOR NO KEY SHARE #40205

Comments

jordanlewis commented Aug 26, 2019 • edited Loading

nvanbenschoten commented Jan 13, 2020 • edited Loading

sumeerbhola commented Jan 16, 2020

nvanbenschoten commented Jan 16, 2020

rafiss commented Jan 16, 2020 • edited Loading

nvanbenschoten commented Jan 16, 2020

jordanlewis commented Aug 26, 2019 •

edited

Loading

nvanbenschoten commented Jan 13, 2020 •

edited

Loading

rafiss commented Jan 16, 2020 •

edited

Loading