Skip to content

Commit

Permalink
MDEV-24966 Galera multi-master regression
Browse files Browse the repository at this point in the history
After the merging of MDEV-24915, 10.6 branch has regressions with handling of
concurrent write load against two or more cluster nodes. These regressions may
surface as cluster hanging, node crashes or data inconsistency. With some test
scenarios, the only visible symptom could be that the BF victim aborting happens
only by innodb lock wait timeout expiration. This would result only to poor
performance (by default 50 sec hang for each BF conflict), and could be somewhat
difficult to diagnose.

This pull request has following fixes to handle concurrent write load from
multiple nodes:

In lock_wait_wsrep_kill(), the victim trx was expected to be only in
TRX_STATE_ACTIVE state. With the delayed BF conflict handling, it can happen
that victim has advanced into pre commit state. This was fixed by choosing
victim both in TRX_STATE_ACTIVE and TRX_STATE_PREPARED states.

Victim transaction may be in several different states at the time of detected
lock conflict, and due to delayed BF aborting practice in MDEV-24915, the victim
may advance further before the actual BF aborting takes place. The BF aborting
in MDEV-24915 did not wake the victim, if it was in the state of waiting for
some other lock (than the one that was blocking the high priority thread).
This anomaly caused the innodb lock wait timeout expiration delays and poor
performance symptom. To fix this, lock_wait_wsrep_kill() now looks if
victim is in lock waiting state, and uses lock_cancel_waiting_and_release()
to cancel this lock wait.

wsrep_bf_abort() checks if the victim has active transaction (in wsrep-lib),
and starts a new transaction if there was no active transaction before.
Due to late BF aborting, the victim may have e.g. failed in certification
and is already aborting or has aborted at this stage. This has caused
problems in testing where BF aborter tries to BF abort himself.
The fix in wsrep_bf_abort() now skips the BF abort, if victim is aborting
or has aborted. Victim may not have started transaction yet in wsrep context,
but it may have acquired MDL locks (due to DDL execution), and this has
caused BF conflict. Such case does not require aborting in wsrep or
replication provider state.

BF aborting could cause BF-BF conflict scenario, if victim was already aborted
and changed to replayer having high priority as well. This BF-BF conflict
scenario is now avoided in lock_wait_wsrep() where we now check if blocking
lock holder is also high priority and is ordered before, caller should wait
for the lock in this situation.

The natural innodb deadlock resolving algorithm could pick BF thread as
deadlock victim. This is fixed by giving max weigh to BF threads in
Deadlock::report().

MDEV-24341 has changed excution paths in do_command() and this affects BF
aborted victim execution. This PR fixes one assert in do_command():
 DBUG_ASSERT(!thd->async_state.pending_ops())
Which fired if the thd was BF aborted earlier. This assert is now changed
to allow pending_ops() if thd was BF aborted before.

With these fixes, long term highly conflicting write load could be run against
to node cluster. If binlogging is configured, log_slave_updates should be
also set.
  • Loading branch information
sjaakola authored and Jan Lindström committed Apr 13, 2021
1 parent f74704c commit a1e7038
Show file tree
Hide file tree
Showing 13 changed files with 129 additions and 501 deletions.
1 change: 1 addition & 0 deletions mysql-test/suite/galera/r/galera_UK_conflict.result
Original file line number Diff line number Diff line change
Expand Up @@ -119,6 +119,7 @@ SET debug_sync='RESET';
connection node_1;
SET GLOBAL wsrep_slave_threads = DEFAULT;
connection node_2;
SET SESSION wsrep_sync_wait=15;
SELECT * FROM t1;
f1 f2 f3
1 1 0
Expand Down
1 change: 1 addition & 0 deletions mysql-test/suite/galera/t/galera_UK_conflict.test
Original file line number Diff line number Diff line change
Expand Up @@ -266,6 +266,7 @@ SET debug_sync='RESET';
SET GLOBAL wsrep_slave_threads = DEFAULT;

--connection node_2
SET SESSION wsrep_sync_wait=15;
SELECT * FROM t1;

# replicate some transactions, so that wsrep slave thread count can reach
Expand Down
3 changes: 3 additions & 0 deletions mysql-test/suite/galera_3nodes/r/galera_join_with_cc_A.result
Original file line number Diff line number Diff line change
@@ -1,6 +1,9 @@
connection node_2;
connection node_1;
connection node_1;
connection node_2;
connection node_3;
connection node_1;
CREATE TABLE t1 (pk INT PRIMARY KEY, node INT) ENGINE=innodb;
INSERT INTO t1 VALUES (1, 1);
connection node_2;
Expand Down
8 changes: 8 additions & 0 deletions mysql-test/suite/galera_3nodes/t/galera_join_with_cc_A.test
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,12 @@
--let $galera_server_number = 3
--source include/galera_connect.inc

# Save original auto_increment_offset values.
--let $node_1=node_1
--let $node_2=node_2
--let $node_3=node_3
--source ../galera/include/auto_increment_offset_save.inc

--connection node_1
--let $wait_condition = SELECT VARIABLE_VALUE = 3 FROM INFORMATION_SCHEMA.GLOBAL_STATUS WHERE VARIABLE_NAME = 'wsrep_cluster_size';
--source include/wait_condition.inc
Expand Down Expand Up @@ -260,3 +266,5 @@ call mtr.add_suppression("WSREP: Rejecting JOIN message from \(.*\): new State T

--connection node_3
call mtr.add_suppression("WSREP: Rejecting JOIN message from \(.*\): new State Transfer required.");

--source ../galera/include/auto_increment_offset_restore.inc
278 changes: 0 additions & 278 deletions mysql-test/suite/wsrep/r/variables_debug.result

This file was deleted.

7 changes: 0 additions & 7 deletions mysql-test/suite/wsrep/t/variables_debug.cnf

This file was deleted.

Loading

0 comments on commit a1e7038

Please sign in to comment.