With version 18.104.22.168, using Jepsen ed8ecfb9146816cb99e458c9cb92f59e19a7b78f, the append test uses insert ... on conflict (k) do update ... to insert or update records in place. This table has two columns: a primary integer key k, and a text v.
(do (c/execute! conn
[(str"insert into " table " (k, v) values (?, ?) ""on conflict (k) do update set ""v = CONCAT(" table ".v, ',', ?)")
k (str v) (str v)])
You would think that when there exists a row with some key k, this would update the associated v using CONCAT, and indeed it does--some of the time. Other times, however, it throws:
ERROR: duplicate key value violates unique constraint "append1_pkey"
This happens whether we use on conflict (k) or explicitly specify the index viaon conflict on constraint append1_pkey", and it happens regardless of whether k is a primary key, or simply a unique column. You can reproduce it with any append test, e.g.
lein run test --os debian --version 22.214.171.124 --time-limit 300 -w ysql/append --concurrency 5n
The text was updated successfully, but these errors were encountered:
@nocaway discovered this bug to be unrelated to #1611
From his description:
The errors occurred whenever one or all of the following conditions occurred.
Client was disconnected and reconnected a few times (Server crashes?)
Transaction layer sends out two kinds of error messages
Error log also report transaction error.
ERROR log has
E0806 14:00:04.040017 47479 transaction_participant.cc:791] T b08b3de1c71a460088dc3f13408e796c P 271c085f0ca84a0d9ff3cbaa4c593f18: Transaction should not have metadata but present: b82c901b-a719-40c1-8119-a2cbab21a465
The transaction (T1) could be aborted while a transaction (T2) with higher priority was writing intents.
In this a case, T1 becomes invalid and could read invalid values, for instance, it could read values from T2,
and decide that a duplicate key value violates the unique constraint, which would not happen with non-aborted transaction.
From the point of view of consistency, there is no problem, because such aborted transaction would not be able to commit.
But it could provide an incorrect error message to the user, saying that constraint is violated instead of "Transaction aborted"
This diff addresses the issue by adding checks for aborted state, while the transaction is writing intents.
Provided fix should not affect performance, since all checks are done locally, i.e. without RPC calls.
Also added caching and preliminary rejection of the operation, so in the high contention scenario performance should only increase.
Test Plan: ybd --gtest_filter PgLibPqTest.OnConflict
Reviewers: alex, timur, mikhail
Reviewed By: timur
Subscribers: ybase, bogdan
Differential Revision: https://phabricator.dev.yugabyte.com/D7063
As shown by Jepsen log investigation, the initial version of D7063 ( ee7bc3e ) was enough to fix#1974.
The extra check that was added to WriteOperationCompletionCallback is superfluous and also contains a bug for the case where this operation is the first operation of the transaction on this tablet. In such a case, this check always decides that the transaction has been aborted,
while (obviously) some transactions don't get aborted. The reason the call to TransactionParticipant::CheckAborted returns an aborted
status for the first operation of a transaction on a particular tablet, in this case, is that the transaction metadata should be written by the operation
that ended up failing. Just to clarify, if an operation fails, we don't write metadata for this transaction on the transaction participant.
And the failure of the operation is what initiates this check that eventually calls CheckAborted.
Just as a reminder, absent metadata for a particular transaction id in TransactionParticipant is indistinguishable from the situation when this transaction that has been aborted and cleaned up.
This revision removes the extra check.
Test Plan: ybd --gtest_filter PgLibPqTest.OnConflict -n 100
Reviewers: alex, neha, mikhail
Reviewed By: mikhail
Subscribers: ybase, bogdan
Differential Revision: https://phabricator.dev.yugabyte.com/D7090