New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Celery does not start all amqp connections with heartbeats #7250
Comments
Hey @massover 👋, We also offer priority support for our sponsors. |
I have same issue. The rabbitmq won't close dided connections, so there are ten millions of died connections makes rabbitmq slowly. If set hearbeat in amqp transport, the rabbitmq will close connection due to the celery won't send heartbeat in AMQP protocol. Is there have some workaround for this? |
I have some new discoveries. My environment: When Pool = eventlet the heartbeat_check is registered when load **Pool = threads ** class AMQPHeart(bootsteps.StartStopStep):
requires = ('celery.worker.consumer:Connection',)
def start(self, c):
tick = c.connection.heartbeat_check
c.timer.call_repeatedly(HEARTBEAT / 2, tick, (2,))
# do not forget add step
app.steps['consumer'].add(AMQPHeart) |
I have exactly the same problem. Our RabbitMQ instance sometimes has more than 60k connections for... 300 consumers. Some connections have the correct default heartbeat set, some don't, for the same consumer. These connections are then never closed properly and make the RMQ RAM instance grow, leading to OOM/memory alarm. We had to make a bash script to periodically kill connections from the RMQ instance by pinging every IP:port associated to a RMQ connection. This is very annoying. |
emm connection still dropped after adding this bookstep... but probably because I am looking at the publisher step I had to use overrdie AMQP when initing Celery instance to make sure the hb is sent on producer side... app = Celery('myapp',**dict(amqp="myproj.amqp_util:AMQP"))
import traceback
HEARTBEAT_RATE= 10
from kombu import Connection, Consumer, Exchange, Producer, Queue, pools
from celery.app.amqp import AMQP
from kombu.asynchronous.timer import Timer
import traceback
def _enable_amqheartbeats(timer, connection, rate=2.0):
'''
src: celery.worker.loops
'''
heartbeat_error = [None]
if not connection:
return heartbeat_error
heartbeat = connection.get_heartbeat_interval() # negotiated
if not (heartbeat and connection.supports_heartbeats):
return heartbeat_error
def tick(rate):
# assert 0
DBG = 0
try:
if DBG:
print('[dbg_sent_hb]-----------------------------------------')
connection.heartbeat_check(rate)
except Exception as e:
# heartbeat_error is passed by reference can be updated
# no append here list should be fixed size=1
# print(tr)
if DBG:
print(f'[heartbeat_check_exception]{e}')
traceback.print_exc()
heartbeat_error[0] = e
timer.call_repeatedly(heartbeat / rate, tick, (rate,))
return heartbeat_error
class Connection(Connection):
timer = Timer()
def __init__(self,*a,**kw):
super().__init__(*a,**kw)
def _ensure_connection(self,*a,**kw):
ret = super()._ensure_connection()
_enable_amqheartbeats(self.timer, self, HEARTBEAT_RATE/2.)
return ret
class AMQP(AMQP):
Connection = Connection
Consumer = Consumer
Producer = Producer
#: compat alias to Connection
BrokerConnection = Connection
pass |
In consumer/events.py: def start(self, c):
# flush events sent while connection was down.
prev = self._close(c)
dis = c.event_dispatcher = c.app.events.Dispatcher(
c.connection_for_write(), Is it necessary to transmit heartbeat to c.connection_for_write? In kombu/connection.py there is a default_channel property. There, too, a connection is created without a heartbeat. |
AMQP heatbeat tick is started conditionally with |
https://www.rabbitmq.com/docs/heartbeats That is, if the celery process ends unexpectedly, then all these timer checks on client side will stop working. And the connection will remain. |
if the heartbeat on RabbitMQ was set properly for these connections, it will detect that your Celery process is not here anymore and close the connections. What we can see in the first two screenshots is that the heartbeat interval is not set at all for some connections inside the same Celery worker. Some has it, some don't (heartbeat with a value of 0 means this connection will never get checked and will remained in RMQ memory even after Celery worker is no longer up). We don't understand why, In a same Celery worker, the different connections established by the worker don't all respect the heartbeat interval given by the RabbitMQ instance. Fortunately, forcing It with |
Maybe because the desired parameter is not transmitted everywhere when connecting.
It helped. Thank you very much! |
Checklist
master
branch of Celery.contribution guide
on reporting bugs.
for similar or identical bug reports.
for existing proposed fixes.
to find out if the bug was already fixed in the master branch.
in this issue (If there are none, check this box anyway).
Mandatory Debugging Information
celery -A proj report
in the issue.(if you are not able to do this, then at least specify the Celery
version affected).
master
branch of Celery.pip freeze
in the issue.to reproduce this bug.
Optional Debugging Information
and/or implementation.
result backend.
broker and/or result backend.
ETA/Countdown & rate limits disabled.
and/or upgrading Celery and its dependencies.
Related Issues and Possible Duplicates
Related Issues
Possible Duplicates
Environment & Settings
Celery version: 5.2.3 (dawn-chorus)
celery report
Output:Steps to Reproduce
Required Dependencies
Python Packages
pip freeze
Output:Other Dependencies
N/A
Minimally Reproducible Test Case
Expected Behavior
I think all of the rabbitmq connections created should have amqp heartbeats.
Why does this matter? When machines are connected to rabbitmq and are terminated (more likely in an ephemeral cloud environment), the connections never close on the rabbitmq side. Eventually the rabbitmq service will run out of new available connections or maybe worse crash.
Actual Behavior
The main worker connection gets a 60s heartbeat, but the supplemental worker connections (such as mingle) have no heartbeat.
Passing hearbeat as a broker_transport_option applies the heartbeat to all connections
The text was updated successfully, but these errors were encountered: