Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MAMA deadlock with subscription mute #412

Closed
fquinner opened this issue Jun 30, 2020 · 1 comment
Closed

MAMA deadlock with subscription mute #412

fquinner opened this issue Jun 30, 2020 · 1 comment
Labels

Comments

@fquinner
Copy link
Collaborator

From mailing list:

Tick42rmds bridge implements tick42rmdsBridgeMamaSubscription_mute method which does call Shutdown() on bridge thread subscription object.
Intention there appears to be to mark bridge thread subscription object as shut down so that bridge thread ignores it and is not trying to deliver a callbacks on mama subscription objects being destroyed.
Unfortunately there is no proper locking there so that seldom a Mute() call returns before Shutdown() takes effect.
If this happens, bridge thread proceeds to deliver next message to mama subscription callback while mama subscription object is being destroyed by the thread returned from Mute().

Without significant redesign of tick42rmds this race can be closed by adding proper locking between tick42rmdsBridgeMamaSubscription_mute and bridge thread, so that bridge thread is guaranteed to see result of Shutdown() before Mute() call returns. Simple implementation would be to add a mutex shared between Shutdown() call and bridge thread methods issuing callbacks to mama subscriptions.

So far so good. Yet unfortunately openmama locking scheme needs to be modified for this to work due to a problem with locking order of MAMA_THROTTLE_DEFAULT throttle.
There are many places where MAMA_THROTTLE_DEFAULT lock is acquired. According to my analysis with valgrind/helgrind/drd acquisition of corresponding mutex must always be a leaf call in locking hierarchy to prevent ABBA-style deadlock.

If Mute() and Shutdown() are synchronized with a mutex, then the following would happen:

client thread destroying mama subscription eventually calls mamaSubscription_destroy
which calls mamaSubscription_deactivate()
[A] .. mamaSubscription_deactivate acquires lock [A] on throttle object
then calls mamaSubscription_deactivate_internal()
.. which calls impl->mBridgeImpl->bridgeMamaSubscriptionMute (impl->mSubscBridge)
which is tick42rmdsBridgeMamaSubscription_mute
[B] .. tick42rmdsBridgeMamaSubscription_mute acquires bridge thread subscription object lock [B]
XX client thread gets suspended here for some reason so it is not calling Shutdown() yet

tick42rmds bridge thread calls RMDSBridgeSubscription::OnMessage (or anything else which requires calling mamaSubscription_processMsg later)
[B] .. bridge thread acquires bridge thread subscription object lock [B]
bridge thread checks if Shutdown() is called on this object
XX .. since client thread is suspended, Shutdown() is not in effect
bridge thread calls mamaSubscription_processMsg() to deliver message to mama subscription
..
[A] imageRequest_stopWaitForResponse() is called which tries to acquire throttle lock [A]

I put in my interpretation of the fix in https://github.com/fquinner/OpenMAMA/tree/feature/subscription-throttle-lock-scope-reduction but looks like this would not fix the issue:

Testing this change will take some time, but I think this change would introduce a deadlock due to change in locking order:

wombatThrottle_sendQueuedMessage
[A] acquires list_lock (self->mMsgQueue)
and calls info->mActionCb() which is wrapped mamaSubscriptionImpl_completeMarketDataInitialisation()
[B] which acquires impl->mCreateDestroyLock

.. while your implementation does locking in B-A order.

My implementation of this reduced locked scope preserves AB locking order. In fact it looks very much like the following pseudo-code

{
acquire throttle lock
acquire mCreateDestroyLock
/* protecting impl from changes via throttle sendQueuedMessage callback invocation /
check if impl state is MAMA_SUBSCRIPTION_ACTIVATING
if so remove throttle action and set impl state to MAMA_SUBSCRIPTION_DEACTIVATED, so throttle sendQueuedMessage callback invocation would skip this impl
drop mCreateDestroyLock
drop throttle lock
}
/
do not acquire throttle lock, since throttle sendQueuedMessage will already skip impl not in MAMA_SUBSCRIPTION_ACTIVATING state*/
acquire mCreateDestroyLock
unconditionally call mute subscription before mamaSubscription_deactivate_internal has any chance
here goes original code to handle impl states (now the case of MAMA_SUBSCRIPTION_ACTIVATING is a no-op since we handled it above)
drop mCreateDestroyLock

So some more work to do.

@fquinner fquinner added the bug label Jun 30, 2020
fquinner added a commit to fquinner/OpenMAMA that referenced this issue Jul 8, 2020
The idea is to avoid deadlock by not holding onto the throttle
subscription lock while proceding to clean up a subscription object,
since this may cause potential deadlocking with third party bridges
which may require their own locking during muting.

Signed-off-by: Frank Quinn <fquinn@cascadium.io>
@fquinner
Copy link
Collaborator Author

Work on this was completed and no further issues were reported on the matter.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant