sql: DROP CONSTRAINT doesn't get rolled back correctly #47323
Labels
A-schema-changes
C-bug
Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior.
S-3-erroneous-edge-case
Database produces or stores erroneous data without visible error/warning, in rare edge cases.
T-sql-foundations
SQL Foundations Team (formerly SQL Schema + SQL Sessions)
Projects
The output below is from v20.1.0-beta.3, though v19.2.5 has basically the same problem. Create a table, insert some values, and add a check constraint:
As a reliable way to induce
DROP CONSTRAINT
to be rolled back, try to drop the index and create a unique index which will definitely fail in the same transaction:The constraint-related error gets swallowed, but it does show up in the logs:
At this point the constraint still shows up in
SHOW CREATE TABLE
but isn't enforced for new writes anymore, which is bad. (I think this is because at this point, the constraint only exists on the table descriptor as a mutation.)Ultimately, I'm pretty sure that when we reverse the mutations during a rollback, we're not correctly changing the
Validity
of the constraint mutation fromDropping
toValidating
, analogously to how we change the mutation direction fromDROP
toADD
. We haven't caught this because we don't actually have a test for rolling back dropping a constraint, per se. The one good way we have of simulating this in a logic test is the approach above, where some other schema change in the transaction fails, but then that failure is the error that gets returned to the client. But It could become a schema changer test now that we have more testing knobs to use.In 19.2 the async schema changer would keep trying and failing to do process this permanently stuck mutation. In 20.1 it turns out the job fails and thus will never be retried, which is not the expected behavior and makes me worried about how we handle retries of rollbacks when there are transient errors. This needs more investigation and should probably get its own issue.
This turned up when I was stressing a branch with some fixes to the schema change job mutation for 20.1, hit a
Leaked goroutine
error inTestMigrateSchemaChanges
, and started looking at the logs. I actually doubt that error is related to this bug, but it turns out that the schema change job migration, in itsDROP CONSTRAINT
subtest, hits this bug every single time.Jira issue: CRDB-5018
The text was updated successfully, but these errors were encountered: