Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Celery worker aborts without any message #8310

Closed
11 tasks done
jrief opened this issue Jun 12, 2023 · 8 comments
Closed
11 tasks done

Celery worker aborts without any message #8310

jrief opened this issue Jun 12, 2023 · 8 comments

Comments

@jrief
Copy link

jrief commented Jun 12, 2023

Checklist

  • I have verified that the issue exists against the main branch of Celery.
  • This has already been asked to the discussions forum first.
  • I have read the relevant section in the
    contribution guide
    on reporting bugs.
  • I have checked the issues list
    for similar or identical bug reports.
  • I have checked the pull requests list
    for existing proposed fixes.
  • I have checked the commit log
    to find out if the bug was already fixed in the main branch.
  • I have included all related issues and possible duplicate issues
    in this issue (If there are none, check this box anyway).

Mandatory Debugging Information

  • I have included the output of celery -A proj report in the issue.
    (if you are not able to do this, then at least specify the Celery
    version affected).
  • I have verified that the issue exists against the main branch of Celery.
  • I have included the contents of pip freeze in the issue.
  • I have included all the versions of all the external dependencies required
    to reproduce this bug.
celery -A uniweb report

software -> celery:5.2.7 (dawn-chorus) kombu:5.3.0 py:3.9.17
            billiard:3.6.4.0 redis:4.5.4
platform -> system:Linux arch:64bit
            kernel version:3.10.0-1160.88.1.el7.x86_64 imp:CPython
loader   -> celery.loaders.app.AppLoader
settings -> transport:sentinel results:redis://:**@django-cms-test-redis-headless:6379/7

Our setup is Celery with Redis over Sentinel in a Kubernetes cluster. After upgrading Redis from version 4.5.4 to 4.5.5 the Celery worker crashed without any message. Return code after crashing is 0.

The same image running on Docker-Swarm without Sentinel works flawlessly.

pip freeze gives (I removed irrelevant packages):

billiard==3.6.4.0
…
django-redis==5.2.0
…
hiredis==2.2.3
…
kombu==5.3.0
…
redis==4.5.5  # <= the culprit, by downgrading to 4.5.4, Celery works fine
requests==2.31.0
…
urllib3==1.26.16
…

#8268 seems unrelated, because it has a different message.

@open-collective-bot
Copy link

Hey @jrief 👋,
Thank you for opening an issue. We will get back to you as soon as we can.
Also, check out our Open Collective and consider backing us - every little helps!

We also offer priority support for our sponsors.
If you require immediate assistance please consider sponsoring us.

@auvipy
Copy link
Member

auvipy commented Jun 12, 2023

Our setup is Celery with Redis over Sentinel in a Kubernetes cluster. After upgrading Redis from version 4.5.4 to 4.5.5 the Celery worker crashed without any message. Return code after crashing is 0.

should we restrict that version of redis in next point release?

@auvipy auvipy added this to the 5.3.x milestone Jun 12, 2023
@jrief
Copy link
Author

jrief commented Jun 12, 2023

I should have mentioned redis-py. A hint on this would be useful, it took me almost a day to find the culprit. As I said, on a this only happens in K8S using Sentinel, not in a plain old Docker container.

@auvipy auvipy mentioned this issue Jun 15, 2023
4 tasks
@auvipy
Copy link
Member

auvipy commented Jun 15, 2023

are you OK with #8317?

@sevdog
Copy link
Contributor

sevdog commented Jun 19, 2023

What in redis is causing this crash? Is there an open issue on redis-py which can help?

@jrief
Copy link
Author

jrief commented Jun 19, 2023

I'm sorry. I have no expertise to debug this.

What I've seen from their ChangeLog, is that in version 4.5.5 a lot has changed.

@sevdog
Copy link
Contributor

sevdog commented Jun 20, 2023

This is because in my system with the following dependencies I am not experiencing any problem with the workers (but I am not yet using sentinel)

celery==5.2.7
kombu==5.2.4
redis==4.5.5
hiredis==2.2.3

Maybe there was a change in the 5.3 series of celery/kombu which caused this issue?

Do you have any DEBUG level logs regarding those crash you experienced?

@sevdog
Copy link
Contributor

sevdog commented Jun 20, 2023

Found the problem:

celery.worker - start - CRITICAL - Unrecoverable error: TypeError("SentinelManagedConnection.read_response() got an unexpected keyword argument 'disconnect_on_error'")
Traceback (most recent call last):
  File "/opt/venv/lib/python3.10/site-packages/celery/worker/worker.py", line 203, in start
    self.blueprint.start(self)
  File "/opt/venv/lib/python3.10/site-packages/celery/bootsteps.py", line 116, in start
    step.start(parent)
  File "/opt/venv/lib/python3.10/site-packages/celery/bootsteps.py", line 365, in start
    return self.obj.start()
  File "/opt/venv/lib/python3.10/site-packages/celery/worker/consumer/consumer.py", line 332, in start
    blueprint.start(self)
  File "/opt/venv/lib/python3.10/site-packages/celery/bootsteps.py", line 116, in start
    step.start(parent)
  File "/opt/venv/lib/python3.10/site-packages/celery/worker/consumer/consumer.py", line 628, in start
    c.loop(*c.loop_args())
  File "/opt/venv/lib/python3.10/site-packages/celery/worker/loops.py", line 97, in asynloop
    next(loop)
  File "/opt/venv/lib/python3.10/site-packages/kombu/asynchronous/hub.py", line 373, in create_loop
    cb(*cbargs)
  File "/opt/venv/lib/python3.10/site-packages/kombu/transport/redis.py", line 1336, in on_readable
    self.cycle.on_readable(fileno)
  File "/opt/venv/lib/python3.10/site-packages/kombu/transport/redis.py", line 566, in on_readable
    chan.handlers[type]()
  File "/opt/venv/lib/python3.10/site-packages/kombu/transport/redis.py", line 910, in _receive
    ret.append(self._receive_one(c))
  File "/opt/venv/lib/python3.10/site-packages/kombu/transport/redis.py", line 920, in _receive_one
    response = c.parse_response()
  File "/opt/venv/lib/python3.10/site-packages/redis/client.py", line 1542, in parse_response
    response = self._execute(conn, try_read)
  File "/opt/venv/lib/python3.10/site-packages/redis/client.py", line 1518, in _execute
    return conn.retry.call_with_retry(
  File "/opt/venv/lib/python3.10/site-packages/redis/retry.py", line 46, in call_with_retry
    return do()
  File "/opt/venv/lib/python3.10/site-packages/redis/client.py", line 1519, in <lambda>
    lambda: command(*args, **kwargs),
  File "/opt/venv/lib/python3.10/site-packages/redis/client.py", line 1540, in try_read
    return conn.read_response(disconnect_on_error=False)
TypeError: SentinelManagedConnection.read_response() got an unexpected keyword argument 'disconnect_on_error'

This was caused by redis/redis-py@c0833f6 which changed how internally redis-py handles connection, unfortunatelly the Sentinel wrapper was not updated to reflect that change (see SentinelManagedConnection.read_response@4.5.5).

The issue is fixed on the master branch in redis/redis-py@35b7e09

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants