You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
fix a 3-way deadlock in galera_sr.galera-features#56
rarely (try --repeat 1000), the following happens:
* from wsrep_bf_abort (when a thread is being killed), wsrep-lib
starts streaming_rollback that wants to
convert_streaming_client_to_applier. wsrep_create_streaming_applier
creates a new THD(). All while the other THD is being killed,
so under LOCK_thd_kill and LOCK_thd_data. In particular, THD::init()
takes LOCK_global_system_variables under LOCK_thd_kill.
* updating @@wsrep_slave_threads takes LOCK_global_system_variables
and LOCK_wsrep_cluster_config (in that order) and invokes
wsrep_slave_threads_update() that takes LOCK_wsrep_slave_threads
* wsrep_replication_process() takes LOCK_wsrep_slave_threads and
invokes wsrep_close_applier(), that does thd->set_killed() which
takes LOCK_thd_kill.
et voilà.
As a fix I copied a workaround from wsrep_cluster_address_update()
to wsrep_slave_threads_update(). It seems to be safe: without mutexes
a race condition is possible and a concurrent SET might change
wsrep_slave_threads, but wsrep_slave_threads_update() always verifies
if there's a need to do something, so it will not run twice in this case,
it'll be a no-op.
0 commit comments