New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Race in vnode worker pool #298

Closed
beerriot opened this Issue Apr 11, 2013 · 2 comments

Comments

Projects
None yet
2 participants
@beerriot
Contributor

beerriot commented Apr 11, 2013

When a riak_core_vnode_worker finishes work, it sends checkin messages to both poolboy and riak_core_vnode_worker_pool. The latter maintains a queue of work to be handled when there's room in the pool. As soon as RCVWP gets the checkin message, it asks poolboy if there is a worker available (expecting that the worker just checked in will now be available).

The problem is that poolboy may receive RCVWP's message before receiving the worker's checkin message. If this happens, it will tell RCVWP that the pool is full. RCVWP then sticks in the 'queueing' state until it receives another checkin message from a worker. Since another checkin may never arrive, the pool may become frozen.

The test defined by worker_pool_pulse.erl on the bwf-pool-race branch of riak_core demonstrates this race. Under PULSE execution, the test will fail with deadlock.

In order to run the test, you will need the pulse_otp beams from https://github.com/Quviq/pulse_otp on your path. You will also need to compile poolboy with PULSE annotations - the bwf-pool-race branch of basho/poolboy provides this.

Since I think PULSE's graphical output is quite cool, I'll include it here:

PULSE-prop-small-pool

Key:

  • "root" (red): test process
  • "spawn_opt" (blue): riak_core_vnode_worker_pool
  • "spawn_opt1" (cyan): poolboy fsm
  • "spawn_opt2" (green): poolboy_sup
  • "spawn_opt3" (magenta): poolboy_worker/riak_core_vnode_worker

The problem is illustrated by the last four steps of "spawn_opt1", the poolboy process (in cyan), and the last six steps of "spawn_opt", the riak_core_vnode_worker_pool_process (in blue). The $gen_all_state_event that is sent to spawn_opt from the last step in spawn_opt3 is one of the two checkin messages, and the $gen_sync_event sent to spawn_opt1 is the other. The $gen_event sent from spawn_opt to spawn_opt1 is the checkout message, and since spawn_opt1 receives that message before the checkin is delivered, you can see it reply {_,full}. After that, everything is stuck waiting with no way forward.

Now fixing begins…

@ghost ghost assigned beerriot Apr 11, 2013

@beerriot

This comment has been minimized.

Show comment
Hide comment
@beerriot

beerriot Apr 12, 2013

Contributor

A short update: the simple fix of making riak_core_vnode_worker_pool solely responsible for calling poolboy:checkin works for the happy path, but uncovers the exact same race when the worker dies. Both poolboy and RCVWP are monitoring the worker, and the 'DOWN' message and subsequent checkout request might arrive at the poolboy FSM in any order. Further test expansion and fixing underway…

Contributor

beerriot commented Apr 12, 2013

A short update: the simple fix of making riak_core_vnode_worker_pool solely responsible for calling poolboy:checkin works for the happy path, but uncovers the exact same race when the worker dies. Both poolboy and RCVWP are monitoring the worker, and the 'DOWN' message and subsequent checkout request might arrive at the poolboy FSM in any order. Further test expansion and fixing underway…

beerriot pushed a commit that referenced this issue Apr 15, 2013

Bryan Fink
demonstrate the race between r_c_vnode_worker_pool and poolboy's fsm
As described in #298:

When a riak_core_vnode_worker finishes work, it sends checkin messages
to both poolboy and riak_core_vnode_worker_pool. The latter maintains a
queue of work to be handled when there's room in the pool. As soon as
RCVWP gets the checkin message, it asks poolboy if there is a worker
available (expecting that the worker just checked in will now be
available).

The problem is that poolboy may receive RCVWP's message before receiving
the worker's checkin message. If this happens, it will tell RCVWP that
the pool is full. RCVWP then sticks in the 'queueing' state until it
receives another checkin message from a worker. Since another checkin
may never arrive, the pool may become frozen.

Crashing workers create a similar race condition to the double-checking
case, because 'DOWN' messages are delivered to both
riak_core_vnode_worker_pool and poolboy. RCVWP again asks poolboy to
checkout a worker (effectively immediately), which might happen before
poolboy receives its 'DOWN' and starts a replacement.

The test defined by worker_pool_pulse.erl demonstrates these races.
Under PULSE execution, the test will fail with deadlock. If it fails for
another reason (like timeout) you may have missed one of the
requirements described below.

In order to run the test, you will need the pulse_otp beams from
https://github.com/Quviq/pulse_otp on your path. The riak_core and
poolboy applications, as well as the worker_pool_pulse module, must also
be compiled with the 'PULSE' macro defined. The newly-added 'pulse' make
target will do this for you (and also run the test), but you will need
to start with a clean checkout (no beams built), or recompilation will
be skipped.

beerriot pushed a commit that referenced this issue Apr 15, 2013

Bryan Fink
fix races between riak_core_vnode_worker_pool and poolboy
This patch fixes both races demonstrated by the test in the previous
commit, and described in #298.

The fix for the checkin race is for the worker to only checkin to
riak_core_vnode_worker_pool, and allow riak_core_vnode_worker_pool to
decide when to checkin to poolboy. If there is outstanding work in the
queue, the worker will just be reused immediately instead of being
checked in.

The fix for the DOWN race is for the worker to send a 'worker_started'
message to riak_core_vnode_worker_pool, when the worker starts. While
the worker is sending this message, poolboy is blocking, so
riak_core_vnode_worker_pool can send poolboy a checkout message
immediately, and still be guaranteed that it will be received after
poolboy has put the worker in its pool.

Documentation about how this works, and about how
riak_core_vnode_worker_pool is acting as a queue wrapper around the
pool, is also included.

@ghost ghost assigned engelsanchez and beerriot Apr 15, 2013

@beerriot

This comment has been minimized.

Show comment
Hide comment
@beerriot

beerriot Apr 15, 2013

Contributor

Solved by #300

Contributor

beerriot commented Apr 15, 2013

Solved by #300

@beerriot beerriot closed this Apr 15, 2013

beerriot pushed a commit that referenced this issue Apr 30, 2013

Bryan Fink
cherry pick 5de7972 into 1.3 branch (backporting)
It was desired to have the riak_core_vnode_worker_pool race from
basho/riak_core#300 backported to the 1.3 branch. The parent commit
containing the regression tests for this race has been elided, since
it requires changes to poolboy that are not otherwise required to
fix the issue (so there is no need to roll a new poolboy version).

Original commit message:

fix races between riak_core_vnode_worker_pool and poolboy

This patch fixes both races demonstrated by the test in the previous
commit, and described in #298.

The fix for the checkin race is for the worker to only checkin to
riak_core_vnode_worker_pool, and allow riak_core_vnode_worker_pool to
decide when to checkin to poolboy. If there is outstanding work in the
queue, the worker will just be reused immediately instead of being
checked in.

The fix for the DOWN race is for the worker to send a 'worker_started'
message to riak_core_vnode_worker_pool, when the worker starts. While
the worker is sending this message, poolboy is blocking, so
riak_core_vnode_worker_pool can send poolboy a checkout message
immediately, and still be guaranteed that it will be received after
poolboy has put the worker in its pool.

Documentation about how this works, and about how
riak_core_vnode_worker_pool is acting as a queue wrapper around the
pool, is also included.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment