Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sql: deadlock between DDL in high-prio txns and descriptor leases #46414

Open
knz opened this issue Mar 23, 2020 · 8 comments
Open

sql: deadlock between DDL in high-prio txns and descriptor leases #46414

knz opened this issue Mar 23, 2020 · 8 comments

Comments

@knz
Copy link
Member

@knz knz commented Mar 23, 2020

summary: ROLLBACK TO SAVEPOINT can cause a deadlock in a high-priority txn that contains DDL, so CockroachDB is set to disallow it altogether for the time being.

Ideally, a better solution would enable lease acquisition to operate at an ever higher priority.

Details

Prior to #46170, we had a deadlock when rolling back a txn for a restart / for a savepoint if the txn contained DDL, because subsequent lease acquisitions would wait on the main SQL txn which encloses it.

(This is because lease acquisition uses a separate kv txn.)

With #46170 committed, the lease acquisition occurs at a higher priority, which guarnatees deadlock avoidance in the common case where DDL is ran at normal priority.

However the deadlock still exists when the DDL is executed at high priority too.

Solution

We need two "high priority" levels, where lease acquisition is at a level higher than anything a SQL txn can be set to.

@knz knz added this to Triage in SQL Schema via automation Mar 23, 2020
knz added a commit to knz/cockroach that referenced this issue Mar 23, 2020
See issue cockroachdb#46414

Because lease acquisition must operate at a high prio
than schema changes, and SQL "HIGH PRIORITY" operates at the same
prio as lease acquisition, we can't let ROLLBACK TO SAVEPOINT
after schema change in HIGH PRIORITY txns.

Release justification: fixes for high-priority or high-severity bugs in existing functionality

Release note (sql change): ROLLBACK TO SAVEPOINT (for either regular
savepoint or "restart savepoints" defined with `cockroach_restart`)
causes a "feature not supported" error after a DDL statement in a HIGH
PRIORITY transaction, in order to avoid a transaction deadlock. See
issue cockroachdb#46414 for details.
knz added a commit to knz/cockroach that referenced this issue Mar 23, 2020
See issue cockroachdb#46414

Because lease acquisition must operate at a high prio
than schema changes, and SQL "HIGH PRIORITY" operates at the same
prio as lease acquisition, we can't let ROLLBACK TO SAVEPOINT
after schema change in HIGH PRIORITY txns.

Release justification: fixes for high-priority or high-severity bugs in existing functionality

Release note (sql change): ROLLBACK TO SAVEPOINT (for either regular
savepoint or "restart savepoints" defined with `cockroach_restart`)
causes a "feature not supported" error after a DDL statement in a HIGH
PRIORITY transaction, in order to avoid a transaction deadlock. See
issue cockroachdb#46414 for details.
knz added a commit to knz/cockroach that referenced this issue Mar 23, 2020
See issue cockroachdb#46414

Because lease acquisition must operate at a high prio
than schema changes, and SQL "HIGH PRIORITY" operates at the same
prio as lease acquisition, we can't let ROLLBACK TO SAVEPOINT
after schema change in HIGH PRIORITY txns.

Release justification: fixes for high-priority or high-severity bugs in existing functionality

Release note (sql change): ROLLBACK TO SAVEPOINT (for either regular
savepoint or "restart savepoints" defined with `cockroach_restart`)
causes a "feature not supported" error after a DDL statement in a HIGH
PRIORITY transaction, in order to avoid a transaction deadlock. See
issue cockroachdb#46414 for details.
knz added a commit to knz/cockroach that referenced this issue Mar 23, 2020
See issue cockroachdb#46414

Because lease acquisition must operate at a high prio
than schema changes, and SQL "HIGH PRIORITY" operates at the same
prio as lease acquisition, we can't let ROLLBACK TO SAVEPOINT
after schema change in HIGH PRIORITY txns.

Release justification: fixes for high-priority or high-severity bugs in existing functionality

Release note (sql change): ROLLBACK TO SAVEPOINT (for either regular
savepoint or "restart savepoints" defined with `cockroach_restart`)
causes a "feature not supported" error after a DDL statement in a HIGH
PRIORITY transaction, in order to avoid a transaction deadlock. See
issue cockroachdb#46414 for details.
andreimatei added a commit to andreimatei/cockroach that referenced this issue Mar 24, 2020
See issue cockroachdb#46414

Because lease acquisition must operate at a high prio
than schema changes, and SQL "HIGH PRIORITY" operates at the same
prio as lease acquisition, we can't let ROLLBACK TO SAVEPOINT
after schema change in HIGH PRIORITY txns.

Release justification: fixes for high-priority or high-severity bugs in existing functionality

Release note (sql change): ROLLBACK TO SAVEPOINT (for either regular
savepoint or "restart savepoints" defined with `cockroach_restart`)
causes a "feature not supported" error after a DDL statement in a HIGH
PRIORITY transaction, in order to avoid a transaction deadlock. See
issue cockroachdb#46414 for details.
@jseldess
Copy link
Contributor

@jseldess jseldess commented Apr 13, 2020

@knz or @andreimatei, can either of you please provide a realistic example of when this deadlock would occur and a recommended workaround?

@jseldess
Copy link
Contributor

@jseldess jseldess commented Apr 13, 2020

Actually, I think I can use the example from one of the linked PRs. Does this look ok?

ROLLBACK TO SAVEPOINT in high-priority transactions containing DDL

Transactions with priority HIGH that contain DDL and ROLLBACK TO SAVEPOINT are not supported, as they could result in a deadlock. For example:

> BEGIN PRIORITY HIGH; SAVEPOINT s; CREATE TABLE t(x INT); ROLLBACK TO SAVEPOINT s;
ERROR: unimplemented: cannot use ROLLBACK TO SAVEPOINT in a HIGH PRIORITY transaction containing DDL
SQLSTATE: 0A000
HINT: You have attempted to use a feature that is not yet implemented.
See: https://github.com/cockroachdb/cockroach/issues/46414

Tracking Github Issue

@jseldess jseldess added the docs-done label Apr 13, 2020
@knz
Copy link
Member Author

@knz knz commented Apr 13, 2020

@jseldess I changed the behavior in PR #46415 - the deadlock is now avoided and reported as "unsupported feature". The relevant release note is this:

Release note (sql change): ROLLBACK TO SAVEPOINT (for either regular
savepoint or "restart savepoints" defined with cockroach_restart)
causes a "feature not supported" error after a DDL statement in a HIGH
PRIORITY transaction, in order to avoid a transaction deadlock. See
issue #46414 for details.

@knz
Copy link
Member Author

@knz knz commented Apr 13, 2020

Your text in #46414 (comment) is thus correct and adequate.

@jseldess
Copy link
Contributor

@jseldess jseldess commented Apr 13, 2020

Thanks, @knz.

@lucy-zhang lucy-zhang added this to Incoming in KV Backlog via automation May 5, 2020
@lucy-zhang lucy-zhang removed this from Needs Triage (4/28/2020) in SQL Schema May 5, 2020
@lucy-zhang
Copy link
Contributor

@lucy-zhang lucy-zhang commented May 5, 2020

@knz as I understand it, this mostly requires adding a new priority level for transactions for lease acquisition to use, so I'm moving it from SQL Schema to KV; let me know if you disagree.

@knz
Copy link
Member Author

@knz knz commented May 5, 2020

mostly requires adding a new priority level for transactions for lease acquisition to use

IT's both adding the new priority and teaching SQL how to use it. It's cross-team if you will. The question is "who should drive this". The current behavior is only visible to SQL users from the fact that certain operations will now be impossible (rejected) when issued within SQL transactions. Who's going to see that pain firsthand? I suppose it's going to be the SQL PM, not the KV PM.

Maybe you can assign this to the PM nearest to you to see what they want to do with it.

@andreimatei
Copy link
Member

@andreimatei andreimatei commented May 5, 2020

For the sake of the recording, one thing we've floated around is for KV to recognize the notion of dependent transactions, and incorporate that knowledge in the pushing protocol: If txn A depends on B and now B is trying to wait on A, don't actually wait. This seemed like a useful concept in general because we have other examples in the system of dependent transactions, and we've hit similar deadlocking issues with them too (e.g. resolving ranges, or any sort of "semi-sync" work).

@lunevalex lunevalex moved this from Incoming to Transactions in KV Backlog Jul 14, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
KV Backlog
Transactions
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
4 participants
You can’t perform that action at this time.