-
-
Notifications
You must be signed in to change notification settings - Fork 912
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RuntimeError: Set changed size during iteration
with celery prefork
#1774
Comments
would be happy to get a PR which fix the issue |
@auvipy I'll give it a try :) |
On that exact point where we observe
I've inspected the content of
to
When I've done a shallow copy of I also tried to force redis to timeout while running cpu-heavy operations locally, but the worker returns the right error |
Hey eveyone, quick update. Steps:
@app.task
def test_stress():
curr_time = timezone.now()
while curr_time + timedelta(seconds=60) > timezone.now():
10 * 10
Now that I can reliably reproduce it, I'll dig into this. |
In celery and kombu
v5.3.1
, we are constantly being hit by the error belowThere are around 4 workers distributed across 3 different servers, listening to the same queues in prefork mode, with multiple processes (4 to 16 processes each).
Most of the workers are daemonized using celeryd init-script. Some of them are manually spawned through CLI for debug monitoring.
All of them are connected to the same redis broker
Some workers simply throw the error below and stop working, randomly.
We've been forced to restart those every few hours.
This was happening before
v5.3
, but from celery/celery#7162 we thought that issue would be fixed after that update, but that unfortunately didn't happen.[2023-07-27 01:28:05,206: CRITICAL/MainProcess] Unrecoverable error: RuntimeError('Set changed size during iteration')
Traceback (most recent call last):
File "/opt/Conversus/venv/lib/python3.8/site-packages/celery/worker/worker.py", line 202, in start
self.blueprint.start(self)
File "/opt/Conversus/venv/lib/python3.8/site-packages/celery/bootsteps.py", line 116, in start
step.start(parent)
File "/opt/Conversus/venv/lib/python3.8/site-packages/celery/bootsteps.py", line 365, in start
return self.obj.start()
File "/opt/Conversus/venv/lib/python3.8/site-packages/celery/worker/consumer/consumer.py", line 336, in start
blueprint.start(self)
File "/opt/Conversus/venv/lib/python3.8/site-packages/celery/bootsteps.py", line 116, in start
step.start(parent)
File "/opt/Conversus/venv/lib/python3.8/site-packages/celery/worker/consumer/consumer.py", line 726, in start
c.loop(*c.loop_args())
File "/opt/Conversus/venv/lib/python3.8/site-packages/celery/worker/loops.py", line 97, in asynloop
next(loop)
File "/opt/app/venv/lib/python3.8/site-packages/kombu/asynchronous/hub.py", line 310, in create_loop
for tick_callback in on_tick:
RuntimeError: Set changed size during iteration
main
branch of Celery.contribution guide
on reporting bugs.
for similar or identical bug reports.
for existing proposed fixes.
to find out if the bug was already fixed in the main branch.
in this issue (If there are none, check this box anyway).
Mandatory Debugging Information
celery -A proj report
in the issue.(if you are not able to do this, then at least specify the Celery
version affected).
main
branch of Celery.pip freeze
in the issue.to reproduce this bug.
Optional Debugging Information
and/or implementation (3.10).
result backend (SQS) .
broker and/or result backend.
ETA/Countdown & rate limits disabled.
and/or upgrading Celery and its dependencies. (from v4.3.x to v5.3.1)
Related Issues and Possible Duplicates
Related Issues
RuntimeError: Set changed size during iteration
with celery threads celery#7162Possible Duplicates
RuntimeError: Set changed size during iteration
with celery threads celery#7162Environment & Settings
Celery version:
celery report
Output:Steps to Reproduce
Required Dependencies
Python Packages
pip freeze
Output:``` absl-py==1.4.0 amqp==5.1.1 asgiref==3.4.1 astunparse==1.6.3 backcall==0.2.0 backports.zoneinfo==0.2.1 beautifulsoup4==4.9.3 billiard==4.1.0 blis==0.7.9 botocore==1.30.0 cachetools==4.2.2 catalogue==2.0.6 celery==5.3.1 certifi==2021.10.8 charset-normalizer==2.0.7 clarifai==2.6.2 click==8.1.3 click-didyoumean==0.3.0 click-plugins==1.1.1 click-repl==0.2.0 colorama==0.4.3 colour==0.1.5 confection==0.0.3 configparser==3.8.1 corextopic==1.1 cx-Oracle==8.2.1 cycler==0.11.0 cymem==2.0.5 decorator==5.0.9 dill==0.3.4 distro==1.6.0 Django==3.2.19 docutils==0.15.2 elasticsearch==7.16.2 et-xmlfile==1.1.0 flashtext==2.7 flatbuffers==23.3.3 flower==1.2.0 ftfy==6.0.3 future==0.18.2 gast==0.3.3 grpcio==1.32.0 h5py==2.10.0 hickle==4.0.4 httplib2==0.19.1 humanize==3.13.1 idna==3.3 ijson==3.1.4 inflection==0.5.1 jedi==0.18.0 Jinja2==3.0.1 jmespath==0.10.0 joblib==1.0.1 jsonschema==2.6.0 kiwisolver==1.3.2 kombu==5.3.1 langcodes==3.3.0 libclang==16.0.0 lxml==4.6.3 Markdown==3.3.4 MarkupSafe==2.0.1 mongoengine==0.23.1 mysqlclient==2.0.3 networkx==2.6.3 numpy==1.22.2 opt-einsum==3.3.0 packaging==21.0 pandas==1.3.2 parso==0.8.2 pathy==0.6.0 pendulum==2.1.2 pexpect==4.8.0 Pillow==9.4.0 preshed==3.0.5 prompt-toolkit==3.0.20 protobuf==3.17.3 psycopg2-binary==2.9.1 ptyprocess==0.7.0 pyasn1==0.4.8 pyasn1-modules==0.2.8 pycurl==7.44.1 pydantic==1.8.2 Pygments==2.10.0 pylibmc==1.6.3 pymemcache==3.5.0 pymongo==3.12.0 pyparsing==2.4.7 python-dateutil==2.8.2 python-Levenshtein==0.12.2 pytz==2022.5 pytzdata==2020.1 PyYAML==5.4.1 redis==3.5.3 regex==2022.10.31 requests==2.26.0 rsa==4.7.2 scipy==1.7.1 setuptools-rust==0.12.1 simplejson==3.17.5 six==1.15.0 smart-open==5.2.1 soupsieve==2.2.1 sqlparse==0.4.1 srsly==2.4.5 termcolor==1.1.0 thinc==8.1.5 threadpoolctl==2.2.0 ThreeScalePY==2.6.0 tokenizers==0.12.1 toml==0.10.2 tornado==6.1 tqdm==4.62.2 traitlets==5.1.0 treelib==1.6.1 typer==0.4.2 typing_extensions==4.1.1 tzdata==2023.3 uritemplate==3.0.1 urllib3==1.26.7 vine==5.0.0 wasabi==0.10.1 wcwidth==0.2.5 websocket-client==0.48.0 wrapt==1.12.1 zope.event==4.6 zope.interface==5.5.2 ```
Other Dependencies
Minimally Reproducible Test Case
It's not that easily reproducible as it happens after running and processing a bunch of tasks for a few hours
Expected Behavior
Keep running the worker non-stop
Actual Behavior
At some point, the worker throws the error and gracefully stop all the processes, as it claims to be an
Unrecoverable Error
The text was updated successfully, but these errors were encountered: