You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
fromceleryimportCeleryimporttimeapp=Celery(
"tasks",
broker_url="sqs://",
broker_transport_options={
"region": "eu-west-1",
},
task_create_missing_queues=False,
)
app.conf.task_default_queue="test-queue"@app.task(queue="test-queue", bind=True)defsimulate_async_task(self, name):
print(f"Starting {name} with id {self.request.id} ")
time.sleep(3)
print(f"Ending {name} with id {self.request.id} ")
Expected Behavior
All messages added to SQS are processed within maximum a few minutes after test script starts (there are multiple workers and 80 messages to process, each message takes 3 seconds to process).
Actual Behavior
With given setup (number of workers and number of messages) I can reliably reproduce a situation where a few messages will not be processed until reaching visibility timeout.
Why this matters? In our setup the SQS queue has visibility timeout of 30 minutes. Celery workers are launched in Kubernetes where they are autoscaled based on number of messages in the queue. Once number of messages in the queue gets low enough, most Celery workers are shutdown (SIGTERM, initiating warm shutdown). This usually provokes a situation where certain number of messages are left in the queue and are processed only after visibility timeout is reached.
Number of received "ReceiveMessage" calls in AWS Cloudtrail.
I can observe that when issue is reproduced number 1 and number 3 are the same, while number 2 is slightly lower. This tells me that Celery did ask to receive messages from SQS but never got chance to process them (neither to actually execute the task, nor put them back onto the queue). When that happens, these messages are sitting in SQS queue until visibility timeout is reached and only then are taken off the queue and processed by a worker.
The text was updated successfully, but these errors were encountered:
Checklist
main
branch of Celery.contribution guide
on reporting bugs.
for similar or identical bug reports.
for existing proposed fixes.
to find out if the bug was already fixed in the main branch.
in this issue (If there are none, check this box anyway).
Mandatory Debugging Information
celery -A proj report
in the issue.(if you are not able to do this, then at least specify the Celery
version affected).
main
branch of Celery.pip freeze
in the issue.to reproduce this bug.
Optional Debugging Information
and/or implementation.
result backend.
broker and/or result backend.
ETA/Countdown & rate limits disabled.
and/or upgrading Celery and its dependencies.
Related Issues and Possible Duplicates
Related Issues
Possible Duplicates
Environment & Settings
Celery version:
celery report
Output:Steps to Reproduce
Required Dependencies
Python Packages
pip freeze
Output:https://gist.github.com/wcislo-saleor/969d036545879f28b24b5594dc9b49ab
Other Dependencies
SQS queue created with Terraform
Minimally Reproducible Test Case
Run
run_test.py
.run_test.py
tasks.py
Expected Behavior
All messages added to SQS are processed within maximum a few minutes after test script starts (there are multiple workers and 80 messages to process, each message takes 3 seconds to process).
Actual Behavior
With given setup (number of workers and number of messages) I can reliably reproduce a situation where a few messages will not be processed until reaching visibility timeout.
Why this matters? In our setup the SQS queue has visibility timeout of 30 minutes. Celery workers are launched in Kubernetes where they are autoscaled based on number of messages in the queue. Once number of messages in the queue gets low enough, most Celery workers are shutdown (SIGTERM, initiating warm shutdown). This usually provokes a situation where certain number of messages are left in the queue and are processed only after visibility timeout is reached.
By adding print statements I compared following:
I can observe that when issue is reproduced number 1 and number 3 are the same, while number 2 is slightly lower. This tells me that Celery did ask to receive messages from SQS but never got chance to process them (neither to actually execute the task, nor put them back onto the queue). When that happens, these messages are sitting in SQS queue until visibility timeout is reached and only then are taken off the queue and processed by a worker.
The text was updated successfully, but these errors were encountered: