Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sql: use a CockroachDB-specific error code for schg errs upon COMMIT #37791

Merged
merged 1 commit into from Jun 3, 2019

Conversation

Projects
None yet
4 participants
@knz
Copy link
Member

commented May 24, 2019

This patch changes the error reported upon schema change failures
raised at COMMIT (multi-stmt txns) to a new CockroachDB-specific code
called "transaction committed with schema change failure", XXA00.

Previously, this situation used the extant PostgreSQL error code
"40003", "statement completion unknown", however that was misleading:
the situation is actually pretty clear and the statement is actually
known to have completed. The other situation
where CockroachDB produces "completion unknown" is when KV fails to
receive an acknowledgement for the txn commit.

Compare:

Actual transaction status Does the gateway know? Error code reported to client
EITHER fully committed OR fully aborted no "completion unknown"
BOTH partially committed AND partially aborted yes "transaction committed with schema change failure"

Using a separate error code ensures the situation is more noticeable
and can be documented separately.

After this patch:

root@127.0.0.1:19343/defaultdb> create table t(x int); insert into t(x) values (0);
INSERT 1

root@127.0.0.1:19343/defaultdb> begin;
> alter table t add column z int default 123;
> insert into t(x) values (1);
> alter table t add column y float as (1::float / x::float) stored;
> commit;
pq: transaction committed but schema change aborted with error: division by zero
HINT: Some of the non-DDL statements may have committed successfully,
but some of the DDL statement(s) failed.
Manual inspection may be required to determine the actual state of the database.

(in this specific example, the ALTER failed but the INSERT was
committed. There are two rows in the table.)

Some context for the reader not familiar with this situation (I had to
reverse engineer the code to understand this, so I may as well write
this down here for posterity): in CockroachDB, a processing error
raised during the post-commit processing of schema changes is
"interesting" in that it really violates the "A" in ACID.

At that point:

  • all the non-DDL effects of the txns are successfully committed (and
    cannot be rolled back any more);
  • some the DDL up to the one that triggered the error may be committed
    (author note: I am unsure -- not investigating that);
  • the last DDL that caused the error was aborted.

In effect, the transaction is both committed and aborted at the same
time.

Release note (sql change): CockroachDB now reports a dedicated SQL
error code in the particular situation where a multi-statement
transaction is COMMITTed but one of the schema change (DDL) operations
failed. As in previous versions, CockroachDB leaves the transaction
partly committed and partly aborted in that case; the new error code
aims to facilitate the recognition of this case in client code.

@knz knz requested review from andreimatei and lucy-zhang May 24, 2019

@knz knz requested review from cockroachdb/sql-execution-prs as code owners May 24, 2019

@knz knz added the docs-todo label May 24, 2019

@cockroach-teamcity

This comment has been minimized.

Copy link
Member

commented May 24, 2019

This change is Reviewable

@knz knz force-pushed the knz:20190524-txn-ddl-abort branch 5 times, most recently from 9d4e5a0 to 1436309 May 24, 2019

@andreimatei
Copy link
Member

left a comment

I've told you my opinion on this - I think the "statement completion unknown" code was better. The situation addressed here (a fairly esoteric edge case of having both DDL and non-DDL statements in a txn) doesn't seem worth-while to me for introducing a 2nd (server-side) code that means "your transaction might have been committed even though you're getting an error". Having one such code is probably complicated enough for a user. Particularly since it's us introducing this code, not Postgres, so it seems unreasonable to expect anybody to know about it. If we were ruling the world yet and control some drivers, etc, I'd probably think differently :)
My 2c.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @andreimatei and @lucy-zhang)

@andreimatei
Copy link
Member

left a comment

Btw, I don't actually claim to know when Postgres uses the "statement completion unknown" code exactly. I believe I've tried to look into it in the past without much success. I think maybe Postgres doesn't use it at all and perhaps it's just for load-balancers.
This is only tangential to this PR, but if you happen to be interested in digging into that, I'd definitely be interested :)

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @lucy-zhang)

@lucy-zhang
Copy link
Contributor

left a comment

The code and tests look fine. I agree that this error is an entirely different type of error from our other use of "statement completion unknown", but aside from that, I don't have an opinion on whether it's worth it to create a new error code.

@knz I think you know this, but this is context for anyone else reading: The reason why DDL statements behave this way is that for some schema changes, including adding/dropping columns and indexes, most of the work is done asynchronously in a sequence of transactions after the user transaction has committed, even though the gateway node doesn't return results to the client until all the schema changes are finished. When one of these schema changes is "aborted" due to an error, what actually happens is that there are more transactions which undo the effects of the previous ones. (This will not, however, undo the effects of schema changes that were completed entirely in the user transaction, like RENAME COLUMN.)

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @lucy-zhang)

@knz

This comment has been minimized.

Copy link
Member Author

commented Jun 3, 2019

Thank you. I am going to go ahead and merge this (as approved by Lucy).
I disagree with Andrei's assessment that we do not need to introduce a separate code.

Andrei you wrote that a description of the new situation would be your transaction might have been committed even though you're getting an error. That is simply not true. The transaction has certainly committed and the client is getting an error. So some effects of the transactions were not aborted. This is a qualitatively different situation from the other case where the commit status is truly unknown.

@knz

This comment has been minimized.

Copy link
Member Author

commented Jun 3, 2019

bors r+

@knz

This comment has been minimized.

Copy link
Member Author

commented Jun 3, 2019

cc @rmloveland this may need some attention in docs.

sql: use a CockroachDB-specific error code for schg errs upon COMMIT
This patch changes the error reported upon schema change failures
raised at COMMIT (multi-stmt txns) to a new CockroachDB-specific code
called "transaction committed with schema change failure", XXA00.

Previously, this situation used the extant PostgreSQL error code
"40003", "statement completion unknown", however that was misleading:
the situation is actually pretty clear and the statement is actually
known to have completed. The other situation
where CockroachDB produces "completion unknown" is when KV fails to
receive an acknowledgement for the txn commit.

Compare:

| Actual transaction status                      | Does the gateway know? | Error code reported to client                      |
|------------------------------------------------|------------------------|----------------------------------------------------|
| EITHER fully committed OR fully aborted        | no                     | "completion unknown"                               |
| BOTH partially committed AND partially aborted | yes                    | "transaction committed with schema change failure" |

Using a separate error code ensures the situation is more noticeable
and can be documented separately.

After this patch:

```
root@127.0.0.1:19343/defaultdb> create table t(x int); insert into t(x) values (0);
INSERT 1

root@127.0.0.1:19343/defaultdb> begin;
> alter table t add column z int default 123;
> insert into t(x) values (1);
> alter table t add column y float as (1::float / x::float) stored;
> commit;
pq: transaction committed but schema change aborted with error: division by zero
HINT: Some of the non-DDL statements may have committed successfully,
but some of the DDL statement(s) failed.
Manual inspection may be required to determine the actual state of the database.
```

(in this specific example, the ALTER failed but the INSERT was
committed. There are two rows in the table.)

Some context for the reader not familiar with this situation (I had to
reverse engineer the code to understand this, so I may as well write
this down here for posterity): in CockroachDB, a processing error
raised during the post-commit processing of schema changes is
"interesting" in that it really violates the "A" in ACID.

At that point:

- all the non-DDL effects of the txns are successfully committed (and
  cannot be rolled back any more);
- some of the DDL up to the one that triggered the error may be committed
  too (unsure).
- the last DDL that caused the error was aborted.

In effect, the transaction is both committed and aborted at the same
time.

Release note (sql change): CockroachDB now reports a dedicated SQL
error code in the particular situation where a multi-statement
transaction is COMMITTed but one of the schema change (DDL) operations
failed. As in previous versions, CockroachDB leaves the transaction
partly committed and partly aborted in that case; the new error code
aims to facilitate the recognition of this case in client code.

@knz knz force-pushed the knz:20190524-txn-ddl-abort branch from 1436309 to 7987c3a Jun 3, 2019

@knz

This comment has been minimized.

Copy link
Member Author

commented Jun 3, 2019

bors r+

craig bot pushed a commit that referenced this pull request Jun 3, 2019

Merge #37791 #37914
37791: sql: use a CockroachDB-specific error code for schg errs upon COMMIT r=knz a=knz

This patch changes the error reported upon schema change failures
raised at COMMIT (multi-stmt txns) to a new CockroachDB-specific code
called "transaction committed with schema change failure", XXA00.

Previously, this situation used the extant PostgreSQL error code
"40003", "statement completion unknown", however that was misleading:
the situation is actually pretty clear and the statement is actually
known to have completed. The other situation
where CockroachDB produces "completion unknown" is when KV fails to
receive an acknowledgement for the txn commit.

Compare:

| Actual transaction status                      | Does the gateway know? | Error code reported to client                      |
|------------------------------------------------|------------------------|----------------------------------------------------|
| EITHER fully committed OR fully aborted        | no                     | "completion unknown"                               |
| BOTH partially committed AND partially aborted | yes                    | "transaction committed with schema change failure" |

Using a separate error code ensures the situation is more noticeable
and can be documented separately.


After this patch:

```
root@127.0.0.1:19343/defaultdb> create table t(x int); insert into t(x) values (0);
INSERT 1

root@127.0.0.1:19343/defaultdb> begin;
> alter table t add column z int default 123;
> insert into t(x) values (1);
> alter table t add column y float as (1::float / x::float) stored;
> commit;
pq: transaction committed but schema change aborted with error: division by zero
HINT: Some of the non-DDL statements may have committed successfully,
but some of the DDL statement(s) failed.
Manual inspection may be required to determine the actual state of the database.
```

(in this specific example, the ALTER failed but the INSERT was
committed. There are two rows in the table.)

Some context for the reader not familiar with this situation (I had to
reverse engineer the code to understand this, so I may as well write
this down here for posterity): in CockroachDB, a processing error
raised during the post-commit processing of schema changes is
"interesting" in that it really violates the "A" in ACID.

At that point:

- all the non-DDL effects of the txns are successfully committed (and
  cannot be rolled back any more);
- some the DDL up to the one that triggered the error may be committed
  (author note: I am unsure -- not investigating that);
- the last DDL that caused the error was aborted.

In effect, the transaction is both committed and aborted at the same
time.

Release note (sql change): CockroachDB now reports a dedicated SQL
error code in the particular situation where a multi-statement
transaction is COMMITTed but one of the schema change (DDL) operations
failed. As in previous versions, CockroachDB leaves the transaction
partly committed and partly aborted in that case; the new error code
aims to facilitate the recognition of this case in client code.

37914: opt: add PushSelectIntoWindow r=justinj a=justinj

This commit adds a rule to push select filters into a Window expression.
This is valid to do whenever a filter is bound by the partition columns
of the Window expression.

Release note: None

Co-authored-by: Raphael 'kena' Poss <knz@cockroachlabs.com>
Co-authored-by: Justin Jaffray <justin@cockroachlabs.com>
@craig

This comment has been minimized.

Copy link

commented Jun 3, 2019

Build succeeded

@craig craig bot merged commit 7987c3a into cockroachdb:master Jun 3, 2019

5 checks passed

GitHub CI (Cockroach) TeamCity build finished
Details
bors Build succeeded
Details
license/cla Contributor License Agreement is signed.
Details
security/snyk - Gopkg.lock (tim) No manifest changes detected
security/snyk - pkg/ui/package.json (tim) No manifest changes detected

@knz knz deleted the knz:20190524-txn-ddl-abort branch Jun 3, 2019

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.