Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sql: DROP CONSTRAINT doesn't get rolled back correctly #47323

Open
thoszhang opened this issue Apr 10, 2020 · 1 comment
Open

sql: DROP CONSTRAINT doesn't get rolled back correctly #47323

thoszhang opened this issue Apr 10, 2020 · 1 comment
Labels
A-schema-changes C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. S-3-erroneous-edge-case Database produces or stores erroneous data without visible error/warning, in rare edge cases. T-sql-foundations SQL Foundations Team (formerly SQL Schema + SQL Sessions)

Comments

@thoszhang
Copy link
Contributor

thoszhang commented Apr 10, 2020

The output below is from v20.1.0-beta.3, though v19.2.5 has basically the same problem. Create a table, insert some values, and add a check constraint:

root@:26257/defaultdb> create table t (a int, b int);
CREATE TABLE

Time: 7.392ms

root@:26257/defaultdb> insert into t values (1, 1), (1, 1);
INSERT 2

Time: 5.752ms

root@:26257/defaultdb> alter table t add constraint c check (a > 0);
ALTER TABLE

Time: 187.609ms

As a reliable way to induce DROP CONSTRAINT to be rolled back, try to drop the index and create a unique index which will definitely fail in the same transaction:

root@:26257/defaultdb> begin; alter table t drop constraint c; create unique index on t(b); commit;
ERROR: transaction committed but schema change aborted with error: (23505): duplicate key value (b)=(1) violates unique constraint "t_b_key"
SQLSTATE: XXA00
HINT: Some of the non-DDL statements may have committed successfully, but some of the DDL statement(s) failed.
Manual inspection may be required to determine the actual state of the database.
--
See: https://github.com/cockroachdb/cockroach/issues/42061

The constraint-related error gets swallowed, but it does show up in the logs:

W200410 06:08:04.359539 3411 sql/schema_changer.go:673  [n1,job=545352856875892737] error purging mutation: assertion failure
  - error with attached stack trace:
    github.com/cockroachdb/cockroach/pkg/sql/sqlbase.(*MutableTableDescriptor).MakeMutationComplete
    	/go/src/github.com/cockroachdb/cockroach/pkg/sql/sqlbase/structured.go:2985
    github.com/cockroachdb/cockroach/pkg/sql.(*SchemaChanger).done.func2
    	/go/src/github.com/cockroachdb/cockroach/pkg/sql/schema_changer.go:883
    github.com/cockroachdb/cockroach/pkg/sql.LeaseStore.PublishMultiple.func1
    	/go/src/github.com/cockroachdb/cockroach/pkg/sql/lease.go:417
    github.com/cockroachdb/cockroach/pkg/kv.(*DB).Txn.func1
    	/go/src/github.com/cockroachdb/cockroach/pkg/kv/db.go:702
    github.com/cockroachdb/cockroach/pkg/kv.(*Txn).exec
    	/go/src/github.com/cockroachdb/cockroach/pkg/kv/txn.go:783
    github.com/cockroachdb/cockroach/pkg/kv.(*DB).Txn
    	/go/src/github.com/cockroachdb/cockroach/pkg/kv/db.go:701
    github.com/cockroachdb/cockroach/pkg/sql.LeaseStore.PublishMultiple
    	/go/src/github.com/cockroachdb/cockroach/pkg/sql/lease.go:387
    github.com/cockroachdb/cockroach/pkg/sql.(*SchemaChanger).done
    	/go/src/github.com/cockroachdb/cockroach/pkg/sql/schema_changer.go:981
    github.com/cockroachdb/cockroach/pkg/sql.(*SchemaChanger).runStateMachineAndBackfill
    	/go/src/github.com/cockroachdb/cockroach/pkg/sql/schema_changer.go:1050
    github.com/cockroachdb/cockroach/pkg/sql.(*SchemaChanger).rollbackSchemaChange
    	/go/src/github.com/cockroachdb/cockroach/pkg/sql/schema_changer.go:664
    github.com/cockroachdb/cockroach/pkg/sql.(*SchemaChanger).handlePermanentSchemaChangeError
    	/go/src/github.com/cockroachdb/cockroach/pkg/sql/schema_changer.go:552
    github.com/cockroachdb/cockroach/pkg/sql.schemaChangeResumer.OnFailOrCancel
    	/go/src/github.com/cockroachdb/cockroach/pkg/sql/schema_changer.go:1683
    github.com/cockroachdb/cockroach/pkg/jobs.(*Registry).stepThroughStateMachine
    	/go/src/github.com/cockroachdb/cockroach/pkg/jobs/registry.go:830
    github.com/cockroachdb/cockroach/pkg/jobs.(*Registry).stepThroughStateMachine
    	/go/src/github.com/cockroachdb/cockroach/pkg/jobs/registry.go:794
    github.com/cockroachdb/cockroach/pkg/jobs.(*Registry).resume.func1
    	/go/src/github.com/cockroachdb/cockroach/pkg/jobs/registry.go:899
    github.com/cockroachdb/cockroach/pkg/util/stop.(*Stopper).RunAsyncTask.func1
    	/go/src/github.com/cockroachdb/cockroach/pkg/util/stop/stopper.go:323
    runtime.goexit
    	/usr/local/go/src/runtime/asm_amd64.s:1357
  - error with embedded safe details: invalid constraint validity state: %d
    -- arg 1: <sqlbase.ConstraintValidity>
  - invalid constraint validity state: 3
while handling error: candidate pg code: 23505
  -
    (opaque error wrapper)
    type name: github.com/cockroachdb/cockroach/vendor/github.com/cockroachdb/errors/withstack/*withstack.withStack
    reportable 0:

    github.com/cockroachdb/cockroach/pkg/sql/row.NewUniquenessConstraintViolationError
    	/go/src/github.com/cockroachdb/cockroach/pkg/sql/row/errors.go:140
    github.com/cockroachdb/cockroach/pkg/sql/rowexec.(*indexBackfiller).wrapDupError
    	/go/src/github.com/cockroachdb/cockroach/pkg/sql/rowexec/indexbackfiller.go:135
    github.com/cockroachdb/cockroach/pkg/sql/rowexec.(*indexBackfiller).flush
    	/go/src/github.com/cockroachdb/cockroach/pkg/sql/rowexec/indexbackfiller.go:113
    github.com/cockroachdb/cockroach/pkg/sql/rowexec.(*backfiller).mainLoop
    	/go/src/github.com/cockroachdb/cockroach/pkg/sql/rowexec/backfiller.go:226
    github.com/cockroachdb/cockroach/pkg/sql/rowexec.(*backfiller).doRun
    	/go/src/github.com/cockroachdb/cockroach/pkg/sql/rowexec/backfiller.go:135
    github.com/cockroachdb/cockroach/pkg/sql/rowexec.(*backfiller).Run
    	/go/src/github.com/cockroachdb/cockroach/pkg/sql/rowexec/backfiller.go:120
    github.com/cockroachdb/cockroach/pkg/sql/flowinfra.(*FlowBase).Run
    	/go/src/github.com/cockroachdb/cockroach/pkg/sql/flowinfra/flow.go:370
    github.com/cockroachdb/cockroach/pkg/sql.(*DistSQLPlanner).Run
    	/go/src/github.com/cockroachdb/cockroach/pkg/sql/distsql_running.go:378
    github.com/cockroachdb/cockroach/pkg/sql.(*SchemaChanger).distBackfill.func4
    	/go/src/github.com/cockroachdb/cockroach/pkg/sql/backfill.go:870
    github.com/cockroachdb/cockroach/pkg/kv.(*DB).Txn.func1
    	/go/src/github.com/cockroachdb/cockroach/pkg/kv/db.go:702
    github.com/cockroachdb/cockroach/pkg/kv.(*Txn).exec
    	/go/src/github.com/cockroachdb/cockroach/pkg/kv/txn.go:783
    github.com/cockroachdb/cockroach/pkg/kv.(*DB).Txn
    	/go/src/github.com/cockroachdb/cockroach/pkg/kv/db.go:701
    github.com/cockroachdb/cockroach/pkg/sql.(*SchemaChanger).distBackfill
    	/go/src/github.com/cockroachdb/cockroach/pkg/sql/backfill.go:781
    github.com/cockroachdb/cockroach/pkg/sql.(*SchemaChanger).backfillIndexes
    	/go/src/github.com/cockroachdb/cockroach/pkg/sql/backfill.go:1266
    github.com/cockroachdb/cockroach/pkg/sql.(*SchemaChanger).runBackfill
    	/go/src/github.com/cockroachdb/cockroach/pkg/sql/backfill.go:260
    github.com/cockroachdb/cockroach/pkg/sql.(*SchemaChanger).runStateMachineAndBackfill
    	/go/src/github.com/cockroachdb/cockroach/pkg/sql/schema_changer.go:1045
    github.com/cockroachdb/cockroach/pkg/sql.(*SchemaChanger).exec
    	/go/src/github.com/cockroachdb/cockroach/pkg/sql/schema_changer.go:519
    github.com/cockroachdb/cockroach/pkg/sql.schemaChangeResumer.Resume.func1
    	/go/src/github.com/cockroachdb/cockroach/pkg/sql/schema_changer.go:1585
    github.com/cockroachdb/cockroach/pkg/sql.schemaChangeResumer.Resume
    	/go/src/github.com/cockroachdb/cockroach/pkg/sql/schema_changer.go:1639
    github.com/cockroachdb/cockroach/pkg/jobs.(*Registry).stepThroughStateMachine
    	/go/src/github.com/cockroachdb/cockroach/pkg/jobs/registry.go:772
    github.com/cockroachdb/cockroach/pkg/jobs.(*Registry).resume.func1
    	/go/src/github.com/cockroachdb/cockroach/pkg/jobs/registry.go:899
    github.com/cockroachdb/cockroach/pkg/util/stop.(*Stopper).RunAsyncTask.func1
    	/go/src/github.com/cockroachdb/cockroach/pkg/util/stop/stopper.go:323
    runtime.goexit
    	/usr/local/go/src/runtime/asm_amd64.s:1357
  - error with embedded safe details: duplicate key value (%s)=(%s) violates unique constraint %q
    -- arg 1: <string>
    -- arg 2: <string>
    -- arg 3: <string>
  - duplicate key value (b)=(1) violates unique constraint "t_b_key"

At this point the constraint still shows up in SHOW CREATE TABLE but isn't enforced for new writes anymore, which is bad. (I think this is because at this point, the constraint only exists on the table descriptor as a mutation.)

root@:26257/defaultdb> show create table t;
  table_name |          create_statement
-------------+--------------------------------------
  t          | CREATE TABLE t (
             |     a INT8 NULL,
             |     b INT8 NULL,
             |     FAMILY "primary" (a, b, rowid),
             |     CONSTRAINT c CHECK (a > 0)
             | )
(1 row)

Time: 107.266ms

root@:26257/defaultdb> insert into t values (-1, -1);
INSERT 1

Time: 3.633ms

root@:26257/defaultdb> select * from t;
  a  | b
-----+-----
   1 |  1
   1 |  1
  -1 | -1
(3 rows)

Time: 902µs

Ultimately, I'm pretty sure that when we reverse the mutations during a rollback, we're not correctly changing the Validity of the constraint mutation from Dropping to Validating, analogously to how we change the mutation direction from DROP to ADD. We haven't caught this because we don't actually have a test for rolling back dropping a constraint, per se. The one good way we have of simulating this in a logic test is the approach above, where some other schema change in the transaction fails, but then that failure is the error that gets returned to the client. But It could become a schema changer test now that we have more testing knobs to use.

In 19.2 the async schema changer would keep trying and failing to do process this permanently stuck mutation. In 20.1 it turns out the job fails and thus will never be retried, which is not the expected behavior and makes me worried about how we handle retries of rollbacks when there are transient errors. This needs more investigation and should probably get its own issue.

This turned up when I was stressing a branch with some fixes to the schema change job mutation for 20.1, hit a Leaked goroutine error in TestMigrateSchemaChanges, and started looking at the logs. I actually doubt that error is related to this bug, but it turns out that the schema change job migration, in its DROP CONSTRAINT subtest, hits this bug every single time.

Jira issue: CRDB-5018

@thoszhang thoszhang self-assigned this Apr 10, 2020
@thoszhang thoszhang added this to Triage in SQL Foundations via automation Apr 10, 2020
@thoszhang thoszhang moved this from Triage to 20.2 Backlog in SQL Foundations Apr 27, 2020
@thoszhang thoszhang moved this from 20.2 Backlog to Triage in SQL Foundations Apr 27, 2020
@thoszhang thoszhang added the C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. label May 5, 2020
@thoszhang thoszhang moved this from Needs Triage (4/28/2020) to Backlog in SQL Foundations May 5, 2020
@thoszhang thoszhang added the S-3-erroneous-edge-case Database produces or stores erroneous data without visible error/warning, in rare edge cases. label May 5, 2020
@thoszhang thoszhang moved this from Backlog to 20.2 stability period: investigations in SQL Foundations Aug 27, 2020
@thoszhang thoszhang assigned ajwerner and unassigned thoszhang Sep 29, 2020
@thoszhang thoszhang moved this from 20.2 stability period: investigations to Backlog in SQL Foundations Oct 13, 2020
@jlinder jlinder added the T-sql-schema-deprecated Use T-sql-foundations instead label Jun 16, 2021
@postamar
Copy link
Contributor

postamar commented Mar 8, 2022

This is sadly still a thing. I'm hoping the declarative schema changer will make these irrelevant.

@postamar postamar moved this from Backlog to Declarative Schema Changer Graveyard in SQL Foundations Mar 8, 2022
@exalate-issue-sync exalate-issue-sync bot added T-sql-foundations SQL Foundations Team (formerly SQL Schema + SQL Sessions) and removed T-sql-schema-deprecated Use T-sql-foundations instead labels May 10, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-schema-changes C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. S-3-erroneous-edge-case Database produces or stores erroneous data without visible error/warning, in rare edge cases. T-sql-foundations SQL Foundations Team (formerly SQL Schema + SQL Sessions)
Projects
SQL Foundations
  
Declarative Schema Changer Will Fix T...
Development

No branches or pull requests

4 participants