Redis result backend connections leak #6819

ronlut · 2021-06-21T14:35:53Z

Checklist

I have verified that the issue exists against the master branch of Celery.
This has already been asked to the discussion group first.
I have read the relevant section in the
contribution guide
on reporting bugs.
I have checked the issues list
for similar or identical bug reports.
I have checked the pull requests list
for existing proposed fixes.
I have checked the commit log
to find out if the bug was already fixed in the master branch.
I have included all related issues and possible duplicate issues
in this issue (If there are none, check this box anyway).

Mandatory Debugging Information

I have included the output of celery -A proj report in the issue.
(if you are not able to do this, then at least specify the Celery
version affected).
I have verified that the issue exists against the master branch of Celery.
I have included the contents of pip freeze in the issue.
I have included all the versions of all the external dependencies required
to reproduce this bug.

Optional Debugging Information

I have tried reproducing the issue on more than one Python version
and/or implementation.
I have tried reproducing the issue on more than one message broker and/or
result backend.
I have tried reproducing the issue on more than one version of the message
broker and/or result backend.
I have tried reproducing the issue on more than one operating system.
I have tried reproducing the issue on more than one workers pool.
I have tried reproducing the issue with autoscaling, retries,
ETA/Countdown & rate limits disabled.
I have tried reproducing the issue after downgrading
and/or upgrading Celery and its dependencies.

Related Issues and Possible Duplicates

Related Issues

- #4465

Possible Duplicates

None

Environment & Settings

Celery version:

celery report Output:

software -> celery:5.1.0 (sun-harmonics) kombu:5.1.0 py:3.8.8
            billiard:3.6.4.0 redis:3.5.3
platform -> system:Darwin arch:64bit
            kernel version:20.4.0 imp:CPython
loader   -> celery.loaders.app.AppLoader
settings -> transport:redis results:redis:///

broker_url: 'redis://localhost:6379//'
result_backend: 'redis:///'
task_queue_max_priority: 10
deprecated_settings: None

Steps to Reproduce

Required Dependencies

Minimal Python Version: N/A or Unknown
Minimal Celery Version: N/A or Unknown
Minimal Kombu Version: N/A or Unknown
Minimal Broker Version: N/A or Unknown
Minimal Result Backend Version: N/A or Unknown
Minimal OS and/or Kernel Version: N/A or Unknown
Minimal Broker Client Version: N/A or Unknown
Minimal Result Backend Client Version: N/A or Unknown

Python Packages

pip freeze Output:

amqp==5.0.6
billiard==3.6.4.0
celery==5.1.0
certifi==2021.5.30
chardet==4.0.0
click==7.1.2
click-didyoumean==0.0.3
click-plugins==1.1.1
click-repl==0.2.0
dnspython==1.16.0
eventlet==0.31.0
Flask==2.0.1
greenlet==1.1.0
idna==2.10
itsdangerous==2.0.1
Jinja2==3.0.1
kombu==5.1.0
MarkupSafe==2.0.1
prompt-toolkit==3.0.18
pytz==2021.1
redis==3.5.3
requests==2.25.1
six==1.16.0
urllib3==1.26.5
vine==5.0.0
wcwidth==0.2.5
Werkzeug==2.0.1

Other Dependencies

N/A

Minimally Reproducible Test Case

tasks.py

import time
from celery import Celery

app = Celery('tasks', backend='redis://', broker='redis://', task_queue_max_priority=10)


@app.task
def sleep():
    time.sleep(1)

web.py

from flask import Flask
from tasks import sleep
app = Flask(__name__)

@app.route('/')
def hello_world():
    result = sleep.apply_async()
    result.wait()
    return '', 200

if __name__ == '__main__':
    app.run(port=4444)

simulate.py

import requests
for i in range(20):
    requests.get("http://localhost:4444")

pip install celery[redis] requests flask
Start redis
docker run -p 6379:6379 --name redis redis
Run celery celery -A tasks worker --loglevel=INFO
Run web server python web.py
connect to redis

docker exec -it redis /bin/bash
redis-cli
info Clients

Note the connected_clients number

simulate requests python simulate.py
Run info Clients again in redis-cli.
Note the connected_clients number which is now a lot higher

Expected Behavior

Connections close after getting/waiting for a result

Actual Behavior

I'm experiencing a very strange problem with redis connections staying open after each request (after each result.wait() or result.get()).

Help will be super appreciated.
Running flask with threaded=False solves the issue.
Obviously this is a simplified reproduction code. Our real environment is gunicorn, eventlet, flask, redis, celery.
In production we are getting 60k open connections to redis quite fast and we had to restart our server a few times to reset the leaks.

The text was updated successfully, but these errors were encountered:

open-collective-bot · 2021-06-21T14:35:55Z

Hey @ronlut 👋,
Thank you for opening an issue. We will get back to you as soon as we can.
Also, check out our Open Collective and consider backing us - every little helps!

We also offer priority support for our sponsors.
If you require immediate assistance please consider sponsoring us.

pomo-mondreganto · 2021-06-21T23:11:45Z

I've just experienced the same issue. Setting redis_max_connections does not fix the issue. We're using gevent workers and RabbitMQ as a broker, so the leak is definitely in the result backend part. The only app writing to the Redis db experiencing the leak is Celery. Redis's CLIENT LIST looks like this (truncated):

id=28 addr=172.24.0.13:58812 fd=32 name= age=309 idle=295 flags=N db=1 sub=0 psub=0 multi=-1 qbuf=0 qbuf-free=0 obl=0 oll=0 omem=0 events=r cmd=exec
id=755 addr=172.24.0.13:60638 fd=437 name= age=9 idle=9 flags=N db=1 sub=0 psub=0 multi=-1 qbuf=0 qbuf-free=0 obl=0 oll=0 omem=0 events=r cmd=unsubscribe
id=756 addr=172.24.0.13:60640 fd=438 name= age=9 idle=9 flags=N db=1 sub=0 psub=0 multi=-1 qbuf=0 qbuf-free=0 obl=0 oll=0 omem=0 events=r cmd=exec
id=757 addr=172.24.0.13:60642 fd=439 name= age=9 idle=9 flags=N db=1 sub=0 psub=0 multi=-1 qbuf=0 qbuf-free=0 obl=0 oll=0 omem=0 events=r cmd=unsubscribe
id=758 addr=172.24.0.13:60644 fd=440 name= age=9 idle=9 flags=N db=1 sub=0 psub=0 multi=-1 qbuf=0 qbuf-free=0 obl=0 oll=0 omem=0 events=r cmd=unsubscribe
id=759 addr=172.24.0.13:60646 fd=441 name= age=9 idle=9 flags=N db=1 sub=0 psub=0 multi=-1 qbuf=0 qbuf-free=0 obl=0 oll=0 omem=0 events=r cmd=unsubscribe
id=760 addr=172.24.0.13:60648 fd=442 name= age=9 idle=9 flags=N db=1 sub=0 psub=0 multi=-1 qbuf=0 qbuf-free=0 obl=0 oll=0 omem=0 events=r cmd=unsubscribe
id=761 addr=172.24.0.13:60650 fd=443 name= age=9 idle=9 flags=N db=1 sub=0 psub=0 multi=-1 qbuf=0 qbuf-free=0 obl=0 oll=0 omem=0 events=r cmd=unsubscribe
id=762 addr=172.24.0.13:60652 fd=444 name= age=9 idle=9 flags=N db=1 sub=0 psub=0 multi=-1 qbuf=0 qbuf-free=0 obl=0 oll=0 omem=0 events=r cmd=unsubscribe
id=29 addr=172.24.0.13:58814 fd=33 name= age=309 idle=309 flags=N db=1 sub=0 psub=0 multi=-1 qbuf=0 qbuf-free=0 obl=0 oll=0 omem=0 events=r cmd=exec
id=30 addr=172.24.0.13:58816 fd=34 name= age=309 idle=309 flags=N db=1 sub=0 psub=0 multi=-1 qbuf=0 qbuf-free=0 obl=0 oll=0 omem=0 events=r cmd=exec
id=31 addr=172.24.0.13:58818 fd=35 name= age=309 idle=29 flags=N db=1 sub=0 psub=0 multi=-1 qbuf=0 qbuf-free=0 obl=0 oll=0 omem=0 events=r cmd=exec
id=20 addr=172.24.0.13:58792 fd=24 name= age=309 idle=230 flags=N db=1 sub=0 psub=0 multi=-1 qbuf=0 qbuf-free=0 obl=0 oll=0 omem=0 events=r cmd=unsubscribe
id=132 addr=172.24.0.13:59084 fd=91 name= age=270 idle=230 flags=N db=1 sub=0 psub=0 multi=-1 qbuf=0 qbuf-free=0 obl=0 oll=0 omem=0 events=r cmd=unsubscribe
id=133 addr=172.24.0.13:59086 fd=92 name= age=270 idle=230 flags=N db=1 sub=0 psub=0 multi=-1 qbuf=0 qbuf-free=0 obl=0 oll=0 omem=0 events=r cmd=unsubscribe
id=21 addr=172.24.0.13:58794 fd=25 name= age=309 idle=230 flags=N db=1 sub=0 psub=0 multi=-1 qbuf=0 qbuf-free=0 obl=0 oll=0 omem=0 events=r cmd=unsubscribe
id=468 addr=172.24.0.13:59928 fd=270 name= age=130 idle=129 flags=N db=1 sub=0 psub=0 multi=-1 qbuf=0 qbuf-free=0 obl=0 oll=0 omem=0 events=r cmd=unsubscribe
id=469 addr=172.24.0.13:59930 fd=271 name= age=130 idle=110 flags=N db=1 sub=0 psub=0 multi=-1 qbuf=0 qbuf-free=0 obl=0 oll=0 omem=0 events=r cmd=unsubscribe
id=470 addr=172.24.0.13:59932 fd=272 name= age=130 idle=130 flags=N db=1 sub=0 psub=0 multi=-1 qbuf=0 qbuf-free=0 obl=0 oll=0 omem=0 events=r cmd=exec
id=279 addr=172.24.0.13:59466 fd=176 name= age=210 idle=210 flags=N db=1 sub=0 psub=0 multi=-1 qbuf=0 qbuf-free=0 obl=0 oll=0 omem=0 events=r cmd=exec
id=280 addr=172.24.0.13:59468 fd=177 name= age=210 idle=210 flags=N db=1 sub=0 psub=0 multi=-1 qbuf=0 qbuf-free=0 obl=0 oll=0 omem=0 events=r cmd=exec
id=306 addr=172.24.0.13:59532 fd=179 name= age=191 idle=190 flags=N db=1 sub=0 psub=0 multi=-1 qbuf=0 qbuf-free=0 obl=0 oll=0 omem=0 events=r cmd=unsubscribe
id=353 addr=172.24.0.13:59648 fd=202 name= age=171 idle=110 flags=N db=1 sub=0 psub=0 multi=-1 qbuf=0 qbuf-free=0 obl=0 oll=0 omem=0 events=r cmd=unsubscribe
id=354 addr=172.24.0.13:59650 fd=203 name= age=170 idle=110 flags=N db=1 sub=0 psub=0 multi=-1 qbuf=0 qbuf-free=0 obl=0 oll=0 omem=0 events=r cmd=unsubscribe

P.S. Celery report output:

software -> celery:5.1.1 (sun-harmonics) kombu:5.1.0 py:3.9.5
            billiard:3.6.4.0 librabbitmq:2.0.0
platform -> system:Linux arch:64bit, ELF
            kernel version:5.10.25-linuxkit imp:CPython
loader   -> celery.loaders.app.AppLoader
settings -> transport:librabbitmq results:redis://:**@redis:6379/1

include: ['tasks.actions', 'tasks.handlers']
deprecated_settings: None
broker_url: 'amqp://forcad:********@rabbitmq:5672/forcad'
result_backend: 'redis://:********@redis:6379/1'
timezone: 'Europe/Moscow'
worker_prefetch_multiplier: 1
redis_socket_timeout: 10
redis_socket_keepalive: True
redis_retry_on_timeout: True
accept_content: ['pickle']
result_serializer: 'pickle'
task_serializer: 'pickle'
redis_max_connections: 10

ronlut · 2021-06-30T18:29:11Z

An update:
I tried making oid and backend properties of celery shared between all threads by using if-lock-if but got into trouble as I think there are other places which also count on the fact that backend and oid are per thread.
Didn't have enough time to play with that more yet.

At the moment I set redis idle connections timeout to 60 seconds which kills all leaked connections after a minute.
To do that, in redis-cli: config set timeout 60.
Or use the correct way in your cloud provider (parameter group for aws elasticache)

pomo-mondreganto · 2021-09-11T12:09:16Z

Is there any progress on this?

reederz · 2021-09-15T05:36:36Z

5.0.2 is the max version that doesn't have this leak (at least for me). Something happened after that.

d0d0 · 2021-10-05T13:09:05Z

Agree with @reederz, 5.0.2 version is fine, after upgrade to 5.0.3, I am hitting connection limit

EDIT: definitely broken by this PR #6416 I reverted it and connections do not leak anymore

auvipy · 2021-10-05T15:36:17Z

@matusvalo

d0d0 · 2021-10-06T08:55:08Z

Well I played around it and found out, that if we change

diff --git a/celery/app/base.py b/celery/app/base.py
index a00d46513..cc244e77d 100644
--- a/celery/app/base.py
+++ b/celery/app/base.py
@@ -1243,7 +1243,7 @@ class Celery:
         """AMQP related functionality: :class:`~@amqp`."""
         return instantiate(self.amqp_cls, app=self)
 
-    @property
+    @cached_property
     def backend(self):
         """Current backend instance."""
         try:

connections do not leak anymore. Not sure if it is valid fix, but at least it makes celery usable with redis again.

What do you think @matusvalo?

auvipy · 2021-10-06T09:40:03Z

celery/kombu@96ca00f was added, you can try both

d0d0 · 2021-10-06T09:52:39Z

celery/kombu@96ca00f was added, you can try both

Well, it did not help, only cached_property for backend

auvipy · 2021-10-06T09:58:54Z

you can come with your proposed change as well. as it is solving the problem

Tries to fix celery#6819

matusvalo · 2021-10-10T21:05:41Z

Let me have a look on the issue. I will get back asap when I find something.

bright2227 · 2021-10-24T02:59:26Z

The 'leak' is caused by the thread mechanism of the flask.

When the flask runs with default threaded=True, it creates a new thread when handling every incoming new request.

class ThreadingMixIn:
    daemon_threads = False
    block_on_close = True
    _threads = _NoThreads()

    def process_request(self, request, client_address):
        if self.block_on_close:
            vars(self).setdefault('_threads', _Threads())
        t = threading.Thread(target = self.process_request_thread,
                             args = (request, client_address))
        t.daemon = self.daemon_threads
        self._threads.append(t)
        t.start()

For Celery that version is above 5.0.3, the backend is created and used per threads. It uses threading.local to separate backend for every thread.

@property
def backend(self):
    """Current backend instance."""
    try:
        return self._local.backend
    except AttributeError:
        self._local.backend = new_backend = self._get_backend()
        return new_backend

Because every request thread is new, celery creates a new backend and connections every time these new requests send tasks.

In my opinion, I am not sure it is a problem with celery. I thought flask have a thread pool to reuse the threads.

pomo-mondreganto · 2021-10-24T13:14:57Z

@bright2227

As per my comment, the leak exists in a pure Celery setup with gevent workers, too, making it essentially a problem in Celery itself.

matusvalo · 2021-10-24T18:54:40Z

I checked the backends and they seemed to be destroyed properly (__del__ method was called) but I still was not able to find why the connections were not closed. I need to spend more time with investigation.

d0d0 · 2021-10-24T19:26:33Z

I checked the backends and they seemed to be destroyed properly (__del__ method was called) but I still was not able to find why the connections were not closed. I need to spend more time with investigation.

I was thinking, can it be caused by WSL? I am running celery under WSL Ubuntu 20.04 (Windows Server 2019) using gevent as pool

matusvalo · 2021-10-25T20:52:21Z

OK I have checked more deep the issue using master branch of celery and here are my findings:

Running the reproducer over flask develop server

As mentioned by @bright2227 the flask developer server spawns new thread for each http request. This causes creation of new RedisBackend and ResultConsumer class instances. Both of them are stored in thread local storage. ResultConsumer instance takes a connection from redis connection pool and have it allocated until it is destroyed. After request is served the thread is destroyed and with it pointers to ResultConsumer and RedisBackend. Hence, both instances are waiting to be destroyed by Garbage Collector and GC destroys them after some time and with them the redis connection is returned to the connection pool - see the connection information of redis for 3 clients pushing data to flask:

# omitted multiple runs of redis-cli with increasing connected_clients
matus@matus-debian:~$ redis-cli info Clients
# Clients
connected_clients:497
cluster_connections:0
maxclients:10000
client_recent_max_input_buffer:95
client_recent_max_output_buffer:0
blocked_clients:1
tracking_clients:0
clients_in_timeout_table:1

matus@matus-debian:~$ redis-cli info Clients
# Clients
connected_clients:77
cluster_connections:0
maxclients:10000
client_recent_max_input_buffer:95
client_recent_max_output_buffer:0
blocked_clients:0
tracking_clients:0
clients_in_timeout_table:0

It is clearly seen that for some time it rose to more than 400 connections but after that GC destroys some of unused instances and frees connections to the pool. To be honest, I am totally fine with this behaviour for flask development web server since it is not intended to be used in production.

Running the reproducer in gunicorn with threads

I have also checked the deployment with gunicorn using threads. I have installed latest stable version of gunicorn and run web.py under gunicorn using following command:

matus@matus-debian:~/dev/celery$ gunicorn --threads=5 web:app
[2021-10-25 22:26:44 +0200] [13162] [INFO] Starting gunicorn 20.1.0
[2021-10-25 22:26:44 +0200] [13162] [INFO] Listening at: http://127.0.0.1:8000 (13162)
[2021-10-25 22:26:44 +0200] [13162] [INFO] Using worker: gthread
[2021-10-25 22:26:44 +0200] [13169] [INFO] Booting worker with pid: 13169

I have run 3 parallel instances of simulator.py posting the data indefinitely. I have checked gunicorn with pstree to verify that threads were spawned:

|-sshd---sshd---sshd---bash---screen---screen-+-bash---gunicorn---gunicorn---5*[{gunicorn}]
|                                             |-2*[bash---python]
|                                             |-bash---celery---4*[celery]
|                                             |-bash
|                                             `-bash---pstree

From pstree snippet can be seen that 5 threads were spawned serving the request. I have waited for longer time and the number of redis connection was stable at 30:

matus@matus-debian:~$ redis-cli info Clients
# Clients
connected_clients:30
cluster_connections:0
maxclients:10000
client_recent_max_input_buffer:95
client_recent_max_output_buffer:0
blocked_clients:0
tracking_clients:0
clients_in_timeout_table:0

So this deployment was runnning flawlesly, I also increased the load by lowering the sleep in task:

@app.task
def sleep():
    time.sleep(0.01)

and still the number of connections was stable at 30 with 3 clients posting the data to the gunicorn. This deployment works 100% fine.

Hence, I was not able to reproduce any serious problem with connection leaks. I did not checked the gevent case so it is possible that this problem is specific to gevent deployment. @pomo-mondreganto @ronlut could you provide reproducer for gevent case?

bennullgraham · 2021-11-03T06:21:27Z

G'day folks, we've been seeing what I think is the same issue. We're doing daily reboots of our busiest production site to prevent hitting the OS file handle limit from all the Redis connections. This gets me out of bed each day with purpose.

Checking the Redis client list, we have thousands of old connections all with cmd=unsubscribe:

id=14458162  ...  age=70890  idle=70886  ...  events=r  cmd=unsubscribe  user=default
id=14460081  ...  age=61500  idle=61497  ...  events=r  cmd=unsubscribe  user=default
id=14463207  ...  age=56382  idle=56300  ...  events=r  cmd=unsubscribe  user=default
<snip>

These connections are all for the same Redis database, which contains keys of the pattern celery-task-meta-<uuid>.

There is a minimal repro over here with instructions: https://github.com/LivePreso/redis-leak

The issue is reproducible using gunicorn -k gevent but not gunicorn -k sync. It's also not reproducible on Celery 5.0.2 but becomes so somewhere up to 5.0.5, the version we are currently using in production, and remains so on 5.1.2, the version in the repro.

bennullgraham · 2021-11-16T10:35:39Z

I've confirmed the repro at https://github.com/LivePreso/redis-leak still holds with the new 5.2.0 release.

hiimdoublej-swag · 2022-07-19T09:14:41Z

Hello @matusvalo , I've created a new rebased PR (#7631) with the fix in #6895.
However it introduced another problem, so maybe the #6895 fix isn't the right fix.
I've put together a pure-celery reproducible case here without involving any web framework, it would be great if you can take a look and see if it brings you some ideas to fix this bug.

uzi0espil · 2022-07-27T12:13:42Z

I am facing the error but with different settings. The celery workers in my application are running fine, however if I chain two tasks with the main task, I receive this error. In addition, the error persist even if the next time I run any task (even with no chained tasks) it still fail with same error, the only fix then is to restart celery and redis.

There are two strange things I noticed:

If I chain only single task with the main task, then it works fine no matter how many times I run it.
If I chain two tasks with the main task, the error only shows on the first task while the other linked tasks run normally.

This is how I am chaining the tasks:

link_tasks  = []
link_tasks.append(task1.signature(args=(...)))
link_tasks.append(task2.signature(args=(...)))

main_task.apply_async((...),
                      kwargs=dict(...),
                      task_id=str(task.reference_id),
                      queue=queue,
                      priority=priority,
                      link_error=[error_handler.s()],
                      delivery_mode=2,
                      link=link_tasks or None)

I tried the following answers from above:

Set timeout of redis.
Revert to celery 5.0.2

However the error still persist.

wochinge · 2022-08-02T15:54:13Z

We're running into the same problem since adding redis as result backend. A fix or guidance on how to fix this would be much appreciated.

hiimdoublej-swag · 2022-08-03T03:59:51Z

@wochinge Can your fix pass the tests ?
I had a fix similar to yours but it didn't pass some tests under threaded environments.
Probably suggesting we shouldn't be reusing the pubsub object from redis.

hiimdoublej-swag · 2022-08-03T04:03:34Z

@uzi0espil
I'd think that if the revert to 5.0.2 didn't work for you then this is a different bug. All my reproducible cases didn't happen with 5.0.2.

wochinge · 2022-08-03T06:38:11Z

@hiimdoublej-swag I want to investigate them today 👍🏻

wochinge · 2022-08-03T11:30:21Z

@hiimdoublej-swag Here are my findings so far.

I think the tests are not failing because of the change but rather because the test itself is rather flakey. I ran test_multithread_producer (one of the failing tests) locally and they sometimes pass and sometimes don't.

We also don't think it's a connection leak but since this PR Redis connections are no longer shared between Threads/Geventlets/Eventlets . Especially when using gevent/eventlet this causes a spike in connections due to the larger pool size.

I've also received the warning below which in my opinion tells us that redis-py is not as thread-safe as it should be:

Ideally it shouldn't be a big deal if we have one/two connections per eventlet/gevent but we still can't explain the spikes in our production system by that 😬

wochinge · 2022-08-03T14:10:05Z

In our Grafana logs we saw a huge spike in Redis connections after the tasks were finished. The connections were closed once the results were deleted from the result backend after the default result expire time (1 day). Our quick fix is now to use ignore_result=True for all tasks where we don't require the result.

auvipy · 2022-08-04T08:40:23Z

can you help improve the flaky test?

wochinge · 2022-08-04T10:03:28Z

I think the issue is that it's spawning so many threads which are then scheduled by the OS in a non-ideal way sometimes so that waiting times can be quite large and then the test takes more time then normal. In my opinion it's not the most pressing issue as the @flakey notation with the timeout / re-run) seems to work reasonably well.

I think my personal take-aways from the investigation yesterday are:

It's correct that the Redis connection is no longer shared between threads/green threads
It's very weird that we observed a spike in connections after a result is published and that these connections persist until the result expires on the result backend

wochinge · 2022-08-04T15:01:40Z

@uzi0espil I think you're running into this issue: #6963

hiimdoublej-swag · 2022-08-09T06:49:03Z

It's very weird that we observed a spike in connections after a result is published and that these connections persist until the result expires on the result backend

I think this should be what we're targeting to fix, instead of trying to share the connections ?

wochinge · 2022-08-09T06:58:00Z

Agree! I created a draft PR with an integration test which (I think) reproduces the problem: #7685

Avamander · 2022-08-09T08:17:07Z

Our quick fix is now to use ignore_result=True for all tasks where we don't require the result.

All my tasks are like that but at some point even then Redis memory grows to a point where it becomes unusable.

If redis is periodically flushed (some data is lost, so it's bad) then at some point redis maximum connection limit is reached. If that is risen, the machine runs out of ports it can use for establishing connections.

At the moment I have resorted to restarting the entire stack periodically after five days or after a billion tasks.

jobec · 2022-08-09T08:20:08Z

I bumped into the same issue. Celery 5.2.1, Kombu 5.2.2, gevent 21.8.0

One sort-of-workaround seems to be setting the timeout setting in redis.conf (see here) which eventually closes the stale connections from Redis's side.

Like mentioned above, this is how I bypass it at the moment. It's not perfect, but it keeps things from piling up and collapsing...

- Fixes celery#7960 and celery#6819

auvipy · 2023-02-07T09:35:13Z

I would like more feedback on #8058

hzc989 · 2023-07-12T14:07:25Z

I would like more feedback on #8058

after upgrading to 5.3.1 , with setting result_backend_thread_safe=true, works for us.

p.s. we've try 4.4.7 & 5.2.7 , both failed with the same problem.

ronlut added the Issue Type: Bug Report label Jun 21, 2021

ronlut mentioned this issue Jun 22, 2021

Connections leak in flask + celery application redis/redis-py#1495

Closed

thedrow added the Component: Redis Results Backend label Jun 23, 2021

thedrow assigned matusvalo Sep 11, 2021

d0d0 added a commit to d0d0/celery that referenced this issue Oct 6, 2021

Fix celery connections leak

bb87901

Tries to fix celery#6819

d0d0 added a commit to d0d0/celery that referenced this issue Oct 6, 2021

Fix celery connections leak

6a5f8a6

Tries to fix celery#6819

d0d0 mentioned this issue Oct 6, 2021

Fix redis connections leak #6985

Closed

matusvalo added Category: Results Backend Status: Needs Verification ✘ labels Oct 25, 2021

auvipy added this to the 5.2.x milestone Nov 16, 2021

matusvalo removed the Status: Needs Verification ✘ label Nov 17, 2021

wochinge mentioned this issue Aug 2, 2022

fix: share redis connection as it's thread-safe #7673

Closed

wochinge mentioned this issue Aug 15, 2022

Fix redis connection leak with app.send_task #7685

Closed

1 task

auvipy modified the milestones: 5.3, 5.3.x Dec 2, 2022

chenseanxy mentioned this issue Dec 15, 2022

Repeated creation of result backend when using eventlet #7960

Open

4 tasks

chenseanxy added a commit to chenseanxy/celery-greenthread-backend that referenced this issue Dec 15, 2022

Switch between thread & global for app.backend

9b337a2

- Fixes celery#7960 and celery#6819

chenseanxy mentioned this issue Dec 15, 2022

Switch between thread & global for app.backend #7961

Open

asadali145 mentioned this issue Feb 1, 2023

(Redis) OperationalError: max number of clients reached mitodl/mitxonline#1403

Closed

2 tasks

CharlieTruong mentioned this issue Feb 7, 2023

result_backend_thread_safe config shares backend across threads #8058

Merged

auvipy unassigned matusvalo Feb 7, 2023

auvipy modified the milestones: 5.3.x, 5.3 Feb 9, 2023

auvipy closed this as completed Feb 9, 2023

Redis result backend connections leak #6819

Redis result backend connections leak #6819

Comments

ronlut commented Jun 21, 2021 • edited by sync-by-unito bot

Checklist

Mandatory Debugging Information

Optional Debugging Information

Related Issues and Possible Duplicates

Related Issues

- #4465

Possible Duplicates

Environment & Settings

Steps to Reproduce

Required Dependencies

Python Packages

Other Dependencies

Minimally Reproducible Test Case

Expected Behavior

Actual Behavior

open-collective-bot bot commented Jun 21, 2021

pomo-mondreganto commented Jun 21, 2021 • edited

ronlut commented Jun 30, 2021 • edited

pomo-mondreganto commented Sep 11, 2021

reederz commented Sep 15, 2021

d0d0 commented Oct 5, 2021 • edited

auvipy commented Oct 5, 2021

d0d0 commented Oct 6, 2021

auvipy commented Oct 6, 2021

d0d0 commented Oct 6, 2021 • edited

auvipy commented Oct 6, 2021

matusvalo commented Oct 10, 2021

bright2227 commented Oct 24, 2021

pomo-mondreganto commented Oct 24, 2021 • edited

matusvalo commented Oct 24, 2021 • edited

d0d0 commented Oct 24, 2021 • edited

matusvalo commented Oct 25, 2021 • edited

Running the reproducer over flask develop server

Running the reproducer in gunicorn with threads

bennullgraham commented Nov 3, 2021 • edited

bennullgraham commented Nov 16, 2021

hiimdoublej-swag commented Jul 19, 2022

uzi0espil commented Jul 27, 2022 • edited

wochinge commented Aug 2, 2022

hiimdoublej-swag commented Aug 3, 2022 • edited

hiimdoublej-swag commented Aug 3, 2022

wochinge commented Aug 3, 2022

wochinge commented Aug 3, 2022

wochinge commented Aug 3, 2022

auvipy commented Aug 4, 2022

wochinge commented Aug 4, 2022

wochinge commented Aug 4, 2022

hiimdoublej-swag commented Aug 9, 2022

wochinge commented Aug 9, 2022

Avamander commented Aug 9, 2022

jobec commented Aug 9, 2022 • edited

auvipy commented Feb 7, 2023

hzc989 commented Jul 12, 2023

ronlut commented Jun 21, 2021 •

edited by sync-by-unito bot

pomo-mondreganto commented Jun 21, 2021 •

edited

ronlut commented Jun 30, 2021 •

edited

d0d0 commented Oct 5, 2021 •

edited

d0d0 commented Oct 6, 2021 •

edited

pomo-mondreganto commented Oct 24, 2021 •

edited

matusvalo commented Oct 24, 2021 •

edited

d0d0 commented Oct 24, 2021 •

edited

matusvalo commented Oct 25, 2021 •

edited

bennullgraham commented Nov 3, 2021 •

edited

uzi0espil commented Jul 27, 2022 •

edited

hiimdoublej-swag commented Aug 3, 2022 •

edited

jobec commented Aug 9, 2022 •

edited