Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[YSQL] ON CONFLICT (k) DO UPDATE can still throw conflict errors #1974

Closed
aphyr opened this issue Aug 6, 2019 · 3 comments

Comments

@aphyr
Copy link

commented Aug 6, 2019

With version 1.3.1.0, using Jepsen ed8ecfb9146816cb99e458c9cb92f59e19a7b78f, the append test uses insert ... on conflict (k) do update ... to insert or update records in place. This table has two columns: a primary integer key k, and a text v.

           :append
           (do (c/execute! conn
                           [(str "insert into " table " (k, v) values (?, ?) "
                                 "on conflict (k) do update set "
                                 "v = CONCAT(" table ".v, ',', ?)")
                            k (str v) (str v)])

You would think that when there exists a row with some key k, this would update the associated v using CONCAT, and indeed it does--some of the time. Other times, however, it throws:

ERROR: duplicate key value violates unique constraint "append1_pkey"

See, for instance, 20190806T095424.000-0400.zip

This happens whether we use on conflict (k) or explicitly specify the index viaon conflict on constraint append1_pkey", and it happens regardless of whether k is a primary key, or simply a unique column. You can reproduce it with any append test, e.g.

lein run test --os debian --version 1.3.1.0 --time-limit 300 -w ysql/append --concurrency 5n

@frozenspider frozenspider changed the title `on conflict (k) do update` can still throw conflict errors [YSQL] ON CONFLICT (k) DO UPDATE can still throw conflict errors Aug 6, 2019

@frozenspider frozenspider added this to To do in Jepsen Testing via automation Aug 6, 2019

@frozenspider frozenspider added this to To do in SQL Support via automation Aug 6, 2019

@aphyr

This comment has been minimized.

Copy link
Author

commented Aug 6, 2019

This might be related to #1611!

@frozenspider

This comment has been minimized.

Copy link
Contributor

commented Aug 7, 2019

Re-tested this under @nocaway 's recent partial fix for #1611 (commit ca3ebdb), this bug is still reproducible.
Logs: 20190807T154851.000Z.zip

@frozenspider

This comment has been minimized.

Copy link
Contributor

commented Aug 9, 2019

@nocaway discovered this bug to be unrelated to #1611
From his description:

The errors occurred whenever one or all of the following conditions occurred.

  • Client was disconnected and reconnected a few times (Server crashes?)
  • Transaction layer sends out two kinds of error messages
  • Error log also report transaction error.

ERROR log has

E0806 14:00:04.040017 47479 transaction_participant.cc:791] T b08b3de1c71a460088dc3f13408e796c P 271c085f0ca84a0d9ff3cbaa4c593f18: Transaction should not have metadata but present: b82c901b-a719-40c1-8119-a2cbab21a465

INFO log has transaction info

2019-08-06 13:57:20.058 UTC [3428] ERROR:  Operation failed. Try again.: Conflicts with higher priority transaction: 51539a97-818b-4369-8a06-9146b8915009
W0806 13:58:35.234104 16176 transaction.cc:747] 97ee01e9-6b88-4eb4-86b8-384b2a7896b9: Send heartbeat failed: Operation expired (yb/tablet/transaction_coordinator.cc:794): Transaction expired

@nocaway nocaway assigned spolitov and unassigned nocaway Aug 9, 2019

@bmatican bmatican added the area/docdb label Aug 13, 2019

@bmatican bmatican moved this from To do to In progress in Jepsen Testing Aug 14, 2019

spolitov added a commit that referenced this issue Aug 19, 2019
[#1974]: Propagate intents apply failure to client
Summary:
The transaction (T1) could be aborted while a transaction (T2) with higher priority was writing intents.
In this a case, T1 becomes invalid and could read invalid values, for instance, it could read values from T2,
and decide that a duplicate key value violates the unique constraint, which would not happen with non-aborted transaction.

From the point of view of consistency, there is no problem, because such aborted transaction would not be able to commit.

But it could provide an incorrect error message to the user, saying that constraint is violated instead of "Transaction aborted"

This diff addresses the issue by adding checks for aborted state, while the transaction is writing intents.

Provided fix should not affect performance, since all checks are done locally, i.e. without RPC calls.
Also added caching and preliminary rejection of the operation, so in the high contention scenario performance should only increase.

Test Plan: ybd --gtest_filter PgLibPqTest.OnConflict

Reviewers: alex, timur, mikhail

Reviewed By: timur

Subscribers: ybase, bogdan

Differential Revision: https://phabricator.dev.yugabyte.com/D7063

@spolitov spolitov closed this Aug 19, 2019

Jepsen Testing automation moved this from In progress to Done Aug 19, 2019

SQL Support automation moved this from To do to Done Aug 19, 2019

spolitov added a commit that referenced this issue Aug 20, 2019
[#1974] Remove a superfluous check from WriteOperationCompletionCallback
Summary:
As shown by Jepsen log investigation, the initial version of D7063 ( ee7bc3e ) was enough to fix #1974.
The extra check that was added to WriteOperationCompletionCallback is superfluous and also contains a bug for the case where this operation is the first operation of the transaction on this tablet. In such a case, this check always decides that the transaction has been aborted,
while (obviously) some transactions don't get aborted. The reason the call to TransactionParticipant::CheckAborted returns an aborted
status for the first operation of a transaction on a particular tablet, in this case, is that the transaction metadata should be written by the operation
that ended up failing. Just to clarify, if an operation fails, we don't write metadata for this transaction on the transaction participant.
And the failure of the operation is what initiates this check that eventually calls CheckAborted.

Just as a reminder, absent metadata for a particular transaction id in TransactionParticipant is indistinguishable from the situation when this transaction that has been aborted and cleaned up.

This revision removes the extra check.

Test Plan: ybd --gtest_filter PgLibPqTest.OnConflict -n 100

Reviewers: alex, neha, mikhail

Reviewed By: mikhail

Subscribers: ybase, bogdan

Differential Revision: https://phabricator.dev.yugabyte.com/D7090
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
6 participants
You can’t perform that action at this time.