You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This can be run with pytest using pytest -x. Note that it can be quite rare on some environments, so a single run may not be sufficient. parallelizing with -n seems to help reproducing this, but it's not required to reporduce.
To better illustrate investigate the issue I've added some logs in the redis broker join operation:
diff --git a/dramatiq/brokers/redis.py b/dramatiq/brokers/redis.py
index 0f37313..838595d 100644
--- a/dramatiq/brokers/redis.py+++ b/dramatiq/brokers/redis.py@@ -233,7 +233,10 @@ class RedisBroker(Broker):
size = 0
for name in (queue_name, dq_name(queue_name)):
- size += self.do_qsize(name)+ current = self.do_qsize(name)+ self.logger.debug('queue %s size %s', name, current)+ size += current+ self.logger.debug('join total size %s', size)
if size == 0:
return
with this change in place a failure logs the following:
===================================================================== FAILURES ======================================================================
_____________________________________________________________________ test[300] _____________________________________________________________________
[gw1] win32 -- Python 3.8.12 path\to\python.exe
n = 300, wM = <dramatiq.worker.Worker object at 0x0000012105930070>, clean = None
@pytest.mark.parametrize("n", range(500))
def test(n, wM, clean):
bar.send_with_options(kwargs={"s": str(n)}, delay=10)
wM.broker.join(queue_name="default")
wM.join()
> assert pathlib.Path(f"file{n}.canary").exists()
E AssertionError: assert False
E + where False = <bound method Path.exists of WindowsPath('file300.canary')>()
E + where <bound method Path.exists of WindowsPath('file300.canary')> = WindowsPath('file300.canary').exists
E + where WindowsPath('file300.canary') = <class 'pathlib.Path'>('file300.canary')
E + where <class 'pathlib.Path'> = pathlib.Path
tests\a_test.py:70: AssertionError
----------------------------------------------------------------- Captured log call -----------------------------------------------------------------
DEBUG dramatiq.broker.RedisBroker:redis.py:184 Enqueueing message '0619fc48-c787-4125-999e-8d439321e0ed' on queue 'default.DQ'.
DEBUG dramatiq.broker.RedisBroker:redis.py:238 queue default size 0
DEBUG dramatiq.broker.RedisBroker:redis.py:238 queue default.DQ size 1
DEBUG dramatiq.broker.RedisBroker:redis.py:240 join total size 1
DEBUG dramatiq.worker.ConsumerThread(default.DQ):worker.py:320 Pushing message '0619fc48-c787-4125-999e-8d439321e0ed' onto delay queue.
DEBUG dramatiq.broker.RedisBroker:redis.py:184 Enqueueing message '0619fc48-c787-4125-999e-8d439321e0ed' on queue 'default'.
DEBUG dramatiq.broker.RedisBroker:redis.py:238 queue default size 0
DEBUG dramatiq.worker.ConsumerThread(default.DQ):worker.py:350 Acknowledging message '0619fc48-c787-4125-999e-8d439321e0ed'.
DEBUG dramatiq.broker.RedisBroker:redis.py:238 queue default.DQ size 0
DEBUG dramatiq.broker.RedisBroker:redis.py:240 join total size 0
============================================================== short test summary info ==============================================================
The race condition seems to be the following:
the message delay has elapsed and processing of the message is in progress in the worker thread
in the main thread the default queue size gets checked and returns 0
the worker thread pushes the message to the default queue, then acks the message in the delay queue
in the main thread the delay queue size gets checked and returns 0
the join returns, but a message is present in the default queue.
The fix seems to be very easy, and it's just to swap in which order the queues are checked.
diff --git a/dramatiq/brokers/redis.py b/dramatiq/brokers/redis.py
index 0f37313..ce4aa98 100644
--- a/dramatiq/brokers/redis.py+++ b/dramatiq/brokers/redis.py@@ -232,7 +232,7 @@ class RedisBroker(Broker):
raise QueueJoinTimeout(queue_name)
size = 0
- for name in (queue_name, dq_name(queue_name)):+ for name in (dq_name(queue_name), queue_name):
size += self.do_qsize(name)
if size == 0:
This this change in place I've run the test above 20 times without any failure.
I'm opening a PR with the suggested fix
The text was updated successfully, but these errors were encountered:
Issues
GitHub issues are for bugs. If you have questions, please ask them on the discussion board.
Checklist
What OS are you using?
windows 10 / ubuntu 20.04
What version of Dramatiq are you using?
1.12.0 and 1.12.3
What did you do?
A test in an automated pipeline tests that an actor enqueued with a delay is executed after the
join
of the broker returns.What did you expect would happen?
The actor is correctly run
What happened?
Sometimes the actor does not run, because of a race condition.
Following it's an example code can reproduces the race condition:
This can be run with pytest using
pytest -x
. Note that it can be quite rare on some environments, so a single run may not be sufficient. parallelizing with-n
seems to help reproducing this, but it's not required to reporduce.To better illustrate investigate the issue I've added some logs in the redis broker join operation:
with this change in place a failure logs the following:
The race condition seems to be the following:
The fix seems to be very easy, and it's just to swap in which order the queues are checked.
This this change in place I've run the test above 20 times without any failure.
I'm opening a PR with the suggested fix
The text was updated successfully, but these errors were encountered: