Skip to content

Commit

Permalink
MDEV-17346 parallel slave start and stop races to workers disappeared
Browse files Browse the repository at this point in the history
The bug appears as a slave SQL thread hanging in
rpl_parallel_thread_pool::get_thread() while there are no slave worker
threads to awake it.

The reason of the hang is that at the parallel slave worker pool
activation the being stared SQL thread could read the worker pool size
concurrently with pool deactivation. At reading the SQL thread did not
employ necessary protection from a race.

Fixed with making the SQL thread at the pool activation first
to grab the same lock as potential deactivator also does prior
to access the pool size.
  • Loading branch information
andrelkin committed Oct 8, 2018
1 parent 1eca495 commit f517d8c
Showing 1 changed file with 22 additions and 3 deletions.
25 changes: 22 additions & 3 deletions sql/rpl_parallel.cc
Expand Up @@ -1617,13 +1617,32 @@ int rpl_parallel_resize_pool_if_no_slaves(void)
}


/**
Pool activation is preceeded by taking a "lock" of pool_mark_busy
which guarantees the number of running slaves drops to zero atomicly
with the number of pool workers.
This resolves race between the function caller thread and one
that may be attempting to deactivate the pool.
*/
int
rpl_parallel_activate_pool(rpl_parallel_thread_pool *pool)
{
int rc= 0;

if ((rc= pool_mark_busy(pool, current_thd)))
return rc; // killed

if (!pool->count)
return rpl_parallel_change_thread_count(pool, opt_slave_parallel_threads,
0);
return 0;
{
pool_mark_not_busy(pool);
rc= rpl_parallel_change_thread_count(pool, opt_slave_parallel_threads,
0);
}
else
{
pool_mark_not_busy(pool);
}
return rc;
}


Expand Down

0 comments on commit f517d8c

Please sign in to comment.