Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Race conditions cause a deadlock #258

Open
DenisRazinkin opened this issue May 21, 2020 · 0 comments
Open

Race conditions cause a deadlock #258

DenisRazinkin opened this issue May 21, 2020 · 0 comments

Comments

@DenisRazinkin
Copy link

DenisRazinkin commented May 21, 2020

OS: debian:9 (stretch), kernel 4.4.195
postgresql server version: 11.5
pglogical version: 2.3.1

Have two nodes, node1 192.168.152.140, node2 192.168.152.141
Table 'connections_stat' contains 1 000 000 rows.
Setup of node2:

select pglogical.create_node( node_name := 'node2',
          dsn := 'host=192.168.152.141 port=5432 dbname=my_db' ;

select pglogical.create_replication_set(
          set_name := 'replication_data',
          replicate_insert := true, replicate_update := true,
          replicate_delete := true, replicate_truncate := true);

select pglogical.replication_set_add_table(
            set_name := 'replication_data', relation := 'connections_stat',
            synchronize_data := true,
            columns := null,
            row_filter := null);

Setup of node1:

select  pglogical.create_node(
        node_name := 'node1',
        dsn := 'host=192.168.152.140 port=5432 dbname=my_db'
     );

Then open two terminals on node1 and run infinite bash loops.
Terminal 1:
while true; do date; /opt/itcs/bin/psql -d my_db -c "select pglogical.drop_subscription( 'node1_node2_data');"; done
Terminal 2:
while true; do date; /opt/itcs/bin/psql -d my_db -c "select pglogical.create_subscription( subscription_name := 'node1_node2_data', replication_sets := array['replication_data'], synchronize_data :=true, provider_dsn := 'host=192.168.152.141 port=5432 dbname=my_db');"; done;

When terminals are blocked run query to view blocked transactions:

select a.application_name, a.query, a.pid, l.granted, l.mode, transactionid
from pg_locks l
join pg_stat_activity a on a.pid = l.pid
where locktype = 'transactionid';
application_name			query								pid	granted	mode		transactionid
-----------------------------------------------------------------------------------------------------------------------------------------------------
psql					select pglogical.drop_subscription( 'node1_node2_data');	22761	true	ExclusiveLock	43894
pglogical apply 16384:2247520895	""								22763	false	ShareLock	43894
pglogical apply 16384:2247520895	""								22763	true	ExclusiveLock	43896

Backtrace for locked processes.
Pid 22763, apply worker:

#0  0x00007f8385dbf2e3 in __epoll_wait_nocancel () at ../sysdeps/unix/syscall-template.S:84
#1  0x000055ddc98eea48 in WaitEventSetWaitBlock (nevents=1, occurred_events=0x7ffce76d1850, cur_timeout=-1, set=0x55ddcaf267e8) at latch.c:1048
#2  WaitEventSetWait (set=set@entry=0x55ddcaf267e8, timeout=timeout@entry=-1, occurred_events=occurred_events@entry=0x7ffce76d1850, nevents=nevents@entry=1, wait_event_info=wait_event_info@entry=50331652) at latch.c:1000
#3  0x000055ddc98eee85 in WaitLatchOrSocket (latch=0x7f83842d4c94, wakeEvents=wakeEvents@entry=1, sock=sock@entry=-1, timeout=-1, timeout@entry=0, wait_event_info=50331652) at latch.c:385
#4  0x000055ddc98eef30 in WaitLatch (latch=<optimized out>, wakeEvents=wakeEvents@entry=1, timeout=timeout@entry=0, wait_event_info=<optimized out>) at latch.c:339
#5  0x000055ddc98fde4e in ProcSleep (locallock=locallock@entry=0x55ddcae73860, lockMethodTable=lockMethodTable@entry=0x55ddc9e86e80 <default_lockmethod>) at proc.c:1272
#6  0x000055ddc98f8c22 in WaitOnLock (locallock=locallock@entry=0x55ddcae73860, owner=owner@entry=0x55ddcae51fd0) at lock.c:1744
#7  0x000055ddc98f9efb in LockAcquireExtended (locktag=locktag@entry=0x7ffce76d1bb0, lockmode=lockmode@entry=5, sessionLock=sessionLock@entry=false, dontWait=dontWait@entry=false, reportMemoryError=reportMemoryError@entry=true, 
    locallockp=locallockp@entry=0x0) at lock.c:1026
#8  0x000055ddc98fa5b4 in LockAcquire (locktag=locktag@entry=0x7ffce76d1bb0, lockmode=lockmode@entry=5, sessionLock=sessionLock@entry=false, dontWait=dontWait@entry=false) at lock.c:689
#9  0x000055ddc98f82e1 in XactLockTableWait (xid=xid@entry=43894, rel=rel@entry=0x7f8386af2d88, ctid=ctid@entry=0x7ffce76d1d14, oper=oper@entry=XLTW_Update) at lmgr.c:610
#10 0x000055ddc968f458 in heap_update (relation=relation@entry=0x7f8386af2d88, otid=otid@entry=0x55ddcaf25fcc, newtup=newtup@entry=0x55ddcaf26488, cid=<optimized out>, crosscheck=crosscheck@entry=0x0, wait=wait@entry=true, 
    hufd=0x7ffce76d1fe0, lockmode=0x7ffce76d1fdc) at heapam.c:3874
#11 0x000055ddc9690c60 in simple_heap_update (relation=relation@entry=0x7f8386af2d88, otid=otid@entry=0x55ddcaf25fcc, tup=tup@entry=0x55ddcaf26488) at heapam.c:4614
#12 0x000055ddc96f0949 in CatalogTupleUpdate (heapRel=0x7f8386af2d88, otid=0x55ddcaf25fcc, tup=0x55ddcaf26488) at indexing.c:234
#13 0x00007f838564cc0e in set_subscription_sync_status (subid=2247520895, status=status@entry=117 'u') at pglogical_sync.c:1390
#14 0x00007f838564d78b in pglogical_sync_subscription (sub=<optimized out>) at pglogical_sync.c:904
#15 0x00007f838563ea43 in pglogical_apply_main (main_arg=<optimized out>) at pglogical_apply.c:1930
#16 0x000055ddc9894af9 in StartBackgroundWorker () at bgworker.c:834
#17 0x000055ddc98a0835 in do_start_bgworker (rw=0x55ddcae7a010) at postmaster.c:5714
#18 maybe_start_bgworkers () at postmaster.c:5927
#19 0x000055ddc98a1315 in sigusr1_handler (postgres_signal_arg=<optimized out>) at postmaster.c:5088
#20 <signal handler called>
#21 0x00007f8385db75e3 in __select_nocancel () at ../sysdeps/unix/syscall-template.S:84
#22 0x000055ddc96518f3 in ServerLoop () at postmaster.c:1671
#23 0x000055ddc98a245f in PostmasterMain (argc=3, argv=0x55ddcae50200) at postmaster.c:1380
#24 0x000055ddc96532bd in main (argc=3, argv=0x55ddcae50200) at main.c:228

Pid 22761, psql:

#0  0x00007f8385dbf2e3 in __epoll_wait_nocancel () at ../sysdeps/unix/syscall-template.S:84
#1  0x000055ddc98eea48 in WaitEventSetWaitBlock (nevents=1, occurred_events=0x7ffce76d2b90, cur_timeout=1000, set=0x55ddcaf2f900) at latch.c:1048
#2  WaitEventSetWait (set=set@entry=0x55ddcaf2f900, timeout=timeout@entry=1000, occurred_events=occurred_events@entry=0x7ffce76d2b90, nevents=nevents@entry=1, wait_event_info=wait_event_info@entry=117440512) at latch.c:1000
#3  0x000055ddc98eee85 in WaitLatchOrSocket (latch=0x7f838427a854, wakeEvents=wakeEvents@entry=25, sock=sock@entry=-1, timeout=timeout@entry=1000, wait_event_info=wait_event_info@entry=117440512) at latch.c:385
#4  0x000055ddc98eef30 in WaitLatch (latch=<optimized out>, wakeEvents=wakeEvents@entry=25, timeout=timeout@entry=1000, wait_event_info=wait_event_info@entry=117440512) at latch.c:339
#5  0x00007f83856475aa in pglogical_drop_subscription (fcinfo=<optimized out>) at pglogical_functions.c:607
#6  0x000055ddc97d21e2 in ExecInterpExpr (state=0x55ddcaf2dd38, econtext=0x55ddcaf2da60, isnull=<optimized out>) at execExprInterp.c:678
#7  0x000055ddc97fa5ab in ExecEvalExprSwitchContext (isNull=0x7ffce76d2d7f, econtext=0x55ddcaf2da60, state=0x55ddcaf2dd38) at ../../../src/include/executor/executor.h:313
#8  ExecProject (projInfo=0x55ddcaf2dd30) at ../../../src/include/executor/executor.h:347
#9  ExecResult (pstate=<optimized out>) at nodeResult.c:136
#10 0x000055ddc97d567a in ExecProcNode (node=0x55ddcaf2d950) at ../../../src/include/executor/executor.h:247
#11 ExecutePlan (execute_once=<optimized out>, dest=0x55ddcaf2c238, direction=<optimized out>, numberTuples=0, sendTuples=<optimized out>, operation=CMD_SELECT, use_parallel_mode=<optimized out>, planstate=0x55ddcaf2d950, 
    estate=0x55ddcaf2d740) at execMain.c:1723
#12 standard_ExecutorRun (queryDesc=0x55ddcae792b0, direction=<optimized out>, count=0, execute_once=<optimized out>) at execMain.c:364
#13 0x000055ddc991136c in PortalRunSelect (portal=portal@entry=0x55ddcaeec3d0, forward=forward@entry=true, count=0, count@entry=9223372036854775807, dest=dest@entry=0x55ddcaf2c238) at pquery.c:932
#14 0x000055ddc9912798 in PortalRun (portal=portal@entry=0x55ddcaeec3d0, count=count@entry=9223372036854775807, isTopLevel=isTopLevel@entry=true, run_once=run_once@entry=true, dest=dest@entry=0x55ddcaf2c238, 
    altdest=altdest@entry=0x55ddcaf2c238, completionTag=0x7ffce76d3000 "") at pquery.c:773
#15 0x000055ddc990e71b in exec_simple_query (query_string=0x55ddcae55580 "select pglogical.drop_subscription( 'node1_node2_data');") at postgres.c:1145
#16 0x000055ddc9910493 in PostgresMain (argc=<optimized out>, argv=argv@entry=0x55ddcae7f488, dbname=<optimized out>, username=<optimized out>) at postgres.c:4182
#17 0x000055ddc96522c7 in BackendRun (port=0x55ddcae77e80) at postmaster.c:4358
#18 BackendStartup (port=0x55ddcae77e80) at postmaster.c:4030
#19 ServerLoop () at postmaster.c:1707
#20 0x000055ddc98a245f in PostmasterMain (argc=3, argv=0x55ddcae50200) at postmaster.c:1380
#21 0x000055ddc96532bd in main (argc=3, argv=0x55ddcae50200) at main.c:228
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant