-
-
Notifications
You must be signed in to change notification settings - Fork 305
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: race condition on join() in the redis broker. #481
Conversation
I think this likely doesn't get rid of the race. A better fix would be to change qsize so that it returns the size of the queue and its associated delay queue. Then we can replace the Re. testing, something like the following pseudocode might work: qsize_ev = threading.Event()
qsize_ev.set()
actor_ev = threading.Event()
old_qsize = broker.do_qsize
def qsize(*args, **kwargs):
qsize_ev.wait()
qsize_ev.clear()
res = old_qsize(*args, *kwargs)
actor_ev.set()
return res
broker.do_qsize = qsize
@actor
def a():
actor_ev.wait()
b.send_with_options(delay=1000)
qsize_ev.set() # this might need to be delayed somehow
@actor
def b():
pass
broker.join() The idea being to force an ordering of events that we know is wrong, specifially:
|
yes, the race is still there, but instead of not counting the message, it would be double counted (the delayed message in the delay queue and the normal message in the normal queue) fixing the issue from the That said I agree that's probably better to update qsize, since that would most likely get rid of the rage condition altogether. Thanks also for the test suggestion, I'll try working it and update the PR |
I've added a test for that specific regression, but I think we may as well improve qsize |
updated. the test now does not make much sense, but it continues to pass |
Thanks! I think we might as well drop the test at this point since that kind of ordering is no longer possible (and we're unlikely to ever regress). |
Maybe I can change it to check that qsize returns all the messages in the queue & delay queue |
btw @Bogdanp it seems that the ci on windows should be updated, since windows 2016 is being retired |
Updated the test. I think it's ready for a review |
doing a simple update breaks the libmemcached compile: see the action run on this commit CaselIT@39285f3 |
Yes, I'm afraid that's going to be stuck until someone can update https://github.com/ryansm1/libmemcached-win/tree/yshurik-win to work w/ 2019. |
I guess an option is to disable mamcached test on windows. In it doesn't compile chances are that people are not going to use it. I can open a PR of that's an acceptable solution |
@CaselIT would it be safe to install build the module from 1.13.0 to absorb this change? I'm getting a lot of issues with dramatiq that I suspect are resolved by this PR edit: using debian |
I don't think this patch is released on pypi yet |
@CaselIT Yeah I was planning on just building it from master offline if you don't think there is anything stopping this from being viable. I'm getting something related to this and getting errors the below and it seems to be related to when the queue is completely exhausted or something.
|
This fixes #480
I'm not sure how to test this, since the race condition is not easy to trigger. @Bogdanp do you have any suggestions?
Also looking at the stub broker it seems that it may have a similar race condition, but I've never triggered a similar issue with it, so I'm not sure if the change (swapping the order or the delay queue and normal one) is needed
dramatiq/dramatiq/brokers/stub.py
Lines 153 to 164 in 2d0ca25