-
Notifications
You must be signed in to change notification settings - Fork 645
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Distributed deadlock after issuing a simple UPDATE
statement
#7477
Comments
Do you have a log containing the "deadlock detected" error too? Afaik that's an error coming from Postgres itself. Which would indicate that it's actually not just a distributed deadlock, but a deadlock on the node itself (which sometimes we detect as a distributed deadlock if our distributed deadlock detection triggers faster than postgres its local deadlock detection). It's definitely possible for single statement UPDATE statement to cause deadlocks on one node, due to locking rows in different orders: https://stackoverflow.com/a/10246052/2570866 But it indeed sounds like that wouldn't apply to the situation you're describing. If I understand correctly you are not updating multiple rows with one statement? |
In general to get to the bottom of these issues its important to know which exact queries deadlock on eachother and why. One way to do that is by disabling the deadlock detection and looking at the queries that block eachother, but that is likely a bad idea on a production system. If you can somehow reproduce issue in a testing environment that would definitely be useful. |
I can confirm that we are indeed updating single row with single statement. This is log from worker 1 containing "deadlock detected", which mentions this occuring on the worker 2 (log below):
Worker 2 log:
As for disabling deadlock detection, it should be possible to do on current cluster but will have to schedule some time window for that and is likely to take a few days. |
Hmm, okay the exclusive lock on the advisory lock suggests that this is related to the way we serialize updates to reference tables. @agedemenli can you take a closer look? |
create table table2 (id int primary key);
select create_reference_table('table2');
alter table table2 add constraint self_fkey foreign key (id) references table2(id);
insert into table2 values (1);
create table table3 (id int primary key);
SELECT create_reference_table('table3');
insert into table3 values (1);
CREATE TABLE public.table1 (
id int8 NOT NULL GENERATED ALWAYS AS IDENTITY,
column1 int2 NOT NULL,
column2 int2 NOT NULL,
offending_column timestamptz NULL,
column3 uuid NULL,
column4 uuid NULL,
column5 int2 NOT NULL DEFAULT 0,
column6 int8 NOT NULL,
column7 int8 NULL,
column8 int2 NOT NULL DEFAULT 0,
column9 text NOT NULL DEFAULT '',
column10 int8 NOT NULL,
CONSTRAINT table1_pk PRIMARY KEY (id),
CONSTRAINT table1_table2_fk FOREIGN KEY (column6) REFERENCES public.table2(id),
CONSTRAINT table1_table3_fk FOREIGN KEY (column10) REFERENCES public.table3(id)
);
SELECT create_reference_table('public.table1');
insert into table1 values (default,1,1,null,'1740a945-d9a1-4bf9-ba48-50620b5109ed','1740
a945-d9a1-4bf9-ba48-50620b5109ed',1,1,1,1,'str',1); I'll run this insert statement frequently for a while to see if I can reproduce it |
After disabling deadlock detection, we can observe that there are indeed several transactions that are completely stuck. All of them have the same exact Below is the output from
|
Thanks for the lock information! I still can't reproduce the issue and the best way to debug it is actually first reproducing it. Any detail that helps with the reproduction would be appreciated. Could you please take a look at the schema snippet I shared above? Just in case you have an improvement suggestion to make it work. |
Unfortunately I don't have more that could help with reproduction right now, as I am unable to reproduce this on an another cluster with the same schema either. But I have looked into the locks that are leading into the deadlock and one thing that seems very suspicious are the Assuming I understand these types of locks correctly, the simple Could you confirm that this is NOT the intended locking behavior? |
It would be possible to grant an access to the faulty cluster temporarily if it would help with investigation of this issue. Let me know if this is something worth doing or if local reproduction is needed to continue. |
Okay we were able to reproduce locally and find a minimal reproducing example How to workaroundDon't run this problematic update query through your loadbalancer, but only How to reproduceCreate two files issue-7477.sql: create table table2 (id int primary key);
select create_reference_table('table2');
insert into table2 values (1);
CREATE TABLE public.table1 (
id int8 NOT NULL,
offending_column timestamptz NULL,
column6 int8 NOT NULL,
CONSTRAINT table1_table2_fk FOREIGN KEY (column6) REFERENCES public.table2(id)
);
SELECT create_reference_table('public.table1');
insert into table1 values (1,null,1); update.sql: UPDATE table1 SET offending_column = '2024-01-31T00:00:00Z' where id = 1; Then run: ./citus_dev make test --destroy --init-with issue-7477.sql
pgbench -f update.sql -c 10 -j 2 -T 200 -P 1 'host=localhost,localhost port=9701,9702 load_balance_hosts=random' Worker logs are then full of deadlock issues. |
That is great to hear, good job! We will verify whether the workaround indeed works for us and will await future version with the fix. Thank you for the cooperation. |
I can confirm that we have not experienced any more deadlocks since applying the workaround. Thank you. |
The order in which the locks are acquired depends on the worker initiating the UPDATE statement (first worker or others), which causes the deadlock. |
This PR changes the order in which the locks are acquired (for the target and reference tables), when a modify request is initiated from a worker node that is not the "FirstWorkerNode". To prevent concurrent writes, locks are acquired on the first worker node for the replicated tables. When the update statement originates from the first worker node, it acquires the lock on the reference table(s) first, followed by the target table(s). However, if the update statement is initiated in another worker node, the lock requests are sent to the first worker in a different order. This PR unifies the modification order on the first worker node. With the third commit, independent of the node that received the request, the locks are acquired for the modified table and then the reference tables on the first node. The first commit shows a sample output for the test prior to the fix. Fixes #7477 --------- Co-authored-by: Jelte Fennema-Nio <jelte.fennema@microsoft.com>
This PR changes the order in which the locks are acquired (for the target and reference tables), when a modify request is initiated from a worker node that is not the "FirstWorkerNode". To prevent concurrent writes, locks are acquired on the first worker node for the replicated tables. When the update statement originates from the first worker node, it acquires the lock on the reference table(s) first, followed by the target table(s). However, if the update statement is initiated in another worker node, the lock requests are sent to the first worker in a different order. This PR unifies the modification order on the first worker node. With the third commit, independent of the node that received the request, the locks are acquired for the modified table and then the reference tables on the first node. The first commit shows a sample output for the test prior to the fix. Fixes #7477 --------- Co-authored-by: Jelte Fennema-Nio <jelte.fennema@microsoft.com>
This PR changes the order in which the locks are acquired (for the target and reference tables), when a modify request is initiated from a worker node that is not the "FirstWorkerNode". To prevent concurrent writes, locks are acquired on the first worker node for the replicated tables. When the update statement originates from the first worker node, it acquires the lock on the reference table(s) first, followed by the target table(s). However, if the update statement is initiated in another worker node, the lock requests are sent to the first worker in a different order. This PR unifies the modification order on the first worker node. With the third commit, independent of the node that received the request, the locks are acquired for the modified table and then the reference tables on the first node. The first commit shows a sample output for the test prior to the fix. Fixes #7477 --------- Co-authored-by: Jelte Fennema-Nio <jelte.fennema@microsoft.com> (cherry picked from commit 8afa2d0)
This PR changes the order in which the locks are acquired (for the target and reference tables), when a modify request is initiated from a worker node that is not the "FirstWorkerNode". To prevent concurrent writes, locks are acquired on the first worker node for the replicated tables. When the update statement originates from the first worker node, it acquires the lock on the reference table(s) first, followed by the target table(s). However, if the update statement is initiated in another worker node, the lock requests are sent to the first worker in a different order. This PR unifies the modification order on the first worker node. With the third commit, independent of the node that received the request, the locks are acquired for the modified table and then the reference tables on the first node. The first commit shows a sample output for the test prior to the fix. Fixes #7477 --------- Co-authored-by: Jelte Fennema-Nio <jelte.fennema@microsoft.com> (cherry picked from commit 8afa2d0)
This PR changes the order in which the locks are acquired (for the target and reference tables), when a modify request is initiated from a worker node that is not the "FirstWorkerNode". To prevent concurrent writes, locks are acquired on the first worker node for the replicated tables. When the update statement originates from the first worker node, it acquires the lock on the reference table(s) first, followed by the target table(s). However, if the update statement is initiated in another worker node, the lock requests are sent to the first worker in a different order. This PR unifies the modification order on the first worker node. With the third commit, independent of the node that received the request, the locks are acquired for the modified table and then the reference tables on the first node. The first commit shows a sample output for the test prior to the fix. Fixes #7477 --------- Co-authored-by: Jelte Fennema-Nio <jelte.fennema@microsoft.com> (cherry picked from commit 8afa2d0)
This PR changes the order in which the locks are acquired (for the target and reference tables), when a modify request is initiated from a worker node that is not the "FirstWorkerNode". To prevent concurrent writes, locks are acquired on the first worker node for the replicated tables. When the update statement originates from the first worker node, it acquires the lock on the reference table(s) first, followed by the target table(s). However, if the update statement is initiated in another worker node, the lock requests are sent to the first worker in a different order. This PR unifies the modification order on the first worker node. With the third commit, independent of the node that received the request, the locks are acquired for the modified table and then the reference tables on the first node. The first commit shows a sample output for the test prior to the fix. Fixes #7477 --------- Co-authored-by: Jelte Fennema-Nio <jelte.fennema@microsoft.com> (cherry picked from commit 8afa2d0)
Hello, we have been running into a puzzling situation with distributed deadlocks and after confirmation from multiple sources that this shouldn't be occuring in our case I have decided to file this bug report.
The situation currently is that a fairly simple query can sporadically trigger deadlocks on one of our tables, resulting in two possible error messages
"canceling the transaction since it was involved in a distributed deadlock"
and"deadlock detected"
.The problem first surfaced when issuing an
UPDATE
(see below for details) in an explicit transaction, followed by aSELECT
that reads from a different table.But after trying to workaround this issue, it was found out that this can be triggered even by using a single
UPDATE
statement in an implicit transaction and also in a transaction after settingSET LOCAL citus.multi_shard_modify_mode TO 'sequential';
.We managed to trigger this by using different clients as well (our application which uses libpq under the hood and DBeaver).
Citus: 12.1.1 on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0, 64-bit
PostgreSQL: 16.1 (installed as 16.1, not upgraded from 16.0) + PostGIS 3.4.1
OS: Ubuntu 22.04.3 LTS
Installed packages:
Setup:
4 servers:
- 1 coordinator
- 2 workers (each in separate worker group)
- 1 HAProxy (used to loadbalance over the two workers, we are using the query from any node feature for workload)
This is more or less an initial testing cluster, so standby replicas are currently not involved
Note that names of tables/columns/database/users have been adjusted, but originally do NOT contain anything that should cause issues (like very long strings).
The query in our normal workload is run in parallel periodically for each row (so each row is updated roughly every 5 seconds), but all of them are scoped down by providing the primary key of the row and the updates for a given PK are never run at the same time. Currently the table is very small (20 rows) and we managed to reproduce this even by calling the query manually without any other workload touching that table.
Query triggering the deadlock:
UPDATE table1 SET offending_column = '2024-01-31T00:00:00Z' where id = ANY-VALID-ID;
Schema of the offending table:
PostgreSQL log section with deadlocks occuring:
We are not experiencing any issues with other tables (60+ tables, both reference and distributed) and everything else in the cluster seems to work correctly.
I understand that the info provided is very likely not enough to find the issue and would appreciate guidance on collecting more relevant data. It has been recommended to use pg-lock-tracer, so we will attempt to use that to hopefully gain more insight.
The text was updated successfully, but these errors were encountered: