New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Celery rapidly leaking memory with Celery .get() on RPC backend #5344
Comments
This may already be resolved in #5332. I see that you already tried master so it maybe something else. |
@thedrow Yeah, must be something else since I have tested against master. |
I've done some further investigation.
Update: these are the objects that grow without bound when I switch from
|
Also, if I add |
Yes, the fact that we use dictionaries to hold results isn't a good idea. |
Great, that’ll certainly make diagnosing the memory leaks easier. In the meantime, how can I force Celery to purge the results from memory once I’ve handled them? I’m putting a bunch of Chains into a Group, so I need to force Celery to remove all references from all the tasks in all the Chains in the Group. |
I just ran on master (4.3.0rc2) with Perhaps #5332 resolved the memory leak for some backends but for some reason not Edit: I tested on another version of Celery (which I'm running in production). I wanted to give a summary of what I've found so far, in case it helps track down the issue: Celery 4.1.0: |
I am using RedisSentinel backend with Celery 4.3.0rc2 and the issue seems to be caused by constantly growing https://i.imgur.com/0z86v8Z.png Strangely enough, calling
I managed to circumvent the issue by running the following after
|
@Sewci0 I just tried adding I added the following code:
And I get the following across successive loop iterations (edit: and
Are you polling for the status of a task before calling |
@monstermac77 Can you try running that inside your task and let it run for a while, then upload the generated image here?
|
@Sewci0 Absolutely. Just did so. Here's the output near the start of program execution (after first loop iteration): https://imgur.com/a/3RNSZPZ Here's the output after a while of execution (by this point, the process' memory had about quadrupled in size): https://imgur.com/a/I6QRcDu Indeed, it seems like |
@monstermac77 |
@Sewci0 Yeah, I was looking at Early iteration: When memory isn't leaking (such as when I use Also, when I try your other suggestion after
I get the following error:
|
|
@monstermac77 also all of those methods in your case should be run against main celery instance which I guess is |
@Sewci0 I added the import and Just tried switching I'm doing the following at the top of my file:
|
@monstermac77 How exactly are you initializing celery and specifying the backend? |
@Sewci0 In my producer, I run:
And in app.py I have:
When I run your suggestions on capital C |
@monstermac77 |
@Sewci0 Yeah, so to summarize, running We were able to see that |
It's been quite a while since the last post in this issue; was it ever resolved for you @monstermac77 ? We appear to be hitting the same issue at the moment. |
@Korijn Appreciate the check-in. Celery still sits at the core of a lot of what we do, but we've resorted to using memcached instead of RPC due to this memory leak issue. Although, we haven't done any testing since these comments back in 2019, so I suppose it's possible that this was resolved in a later version of Celery. Hoping you were able to find this thread before you had torn out too much hair... |
We tried this morning with Celery 5.0.2, and the leak in the RPC backend is still present. The Redis backend also leaks, unless we apply the workaround from #3813 (to call Our tasks work on big chunks of data (medical imaging data) so memory grows out of control very quickly (~20 tasks is enough to reach a couple gigabytes). It's unworkable for us. |
@Korijn unfortunately the only workaround we've found is using memcached, which is almost as fast as RPC it seems but doesn't have the leak. We also found that Redis was slower than RPC for our use case, which is why we had been using RPC. Doesn't calling We're actually at the opposite end of the spectrum, interestingly. We have hundreds of thousands of tasks running every minute that are very small, but still within an hour the kernel kills our process because of the memory leak. |
Interestingly/frustratingly enough, I am so far unable to reproduce this problem in a toy example with celery 5.0.2, where I pass around huge strings (3MB each). |
I will say this: we had been using RPC for about 2 years with almost exactly the same workload before, when implementing Celery again for a very similar use case, we started seeing the memory leak occur. I spent no less than 30 hours trying to figure out what the difference was between the scripts that leaked and those that didn't (this was back when I opened this issue), even though both were using RPC. |
Well, in the end I managed to reproduce the issue. It's due to our usage of ThreadPoolExecutor. I didn't realize Celery isn't thread safe. See here for minimal example: https://github.com/Korijn/celeryrepro The reason we use ThreadPoolExecutor is to keep our main thread IO free in a Starlette/asyncio setup. Sadly that isn't going to work, I suppose. With
With
|
Thanks for labeling @auvipy, but I guess I wasn't clear: if you look at the reproduction README, I'm not using Celery's threadpool workers pool, but the python built-in one, to try and avoid Celery's blocking IO in the main thread. In the reproduction I'm using the I was using the rabbit mq (amqp) broker and the rpc result backend. |
@Korijn ah! It seems like thread safety could be the cause of our memory leak as well? Our use case that results in a runaway memory leak looks like this:
Where "datas" is length ~30 or so. We're using multiprocessing though, not multithreading, but I talked one of my coworkers who understands this stuff much better than me and he said that even though conventional wisdom has it that the memory space wouldn't be shared, the memory leak could still be happening with multiprocessing if Celery has a bug. Here's what he said:
|
Well, sadly, returning to main thread celery calls resolved a part of our leakages but not all of it. Also, now there is IO happening in our main thread which negatively affects our maximum number of concurrent users per server instance. So for us it's not yet in an acceptable state and we're continuing to search for more root causes... |
Please keep investigating this. |
I never was able to resolve the issue, sadly. I ran out of time on the project. |
Checklist
celery -A proj report
in the issue.pip freeze
in the issue.master
branch of Celery.Environment & Settings
Celery version: 4.2.0
Report:
software -> celery:4.2.0 (windowlicker) kombu:4.3.0 py:2.7.3
billiard:3.6.0.0 py-amqp:2.4.1
platform -> system:Linux arch:64bit, ELF
kernel version:3.2.0-4-amd64 imp:CPython
loader -> celery.loaders.app.AppLoader
settings -> transport:pyamqp results:rpc://[ip]/
task_queues: <generator object at 0x28ce780>
broker_url: u'amqp://guest:********@[ip]:5672//'
result_backend: u'rpc://[ip]/'
Steps to Reproduce
The script that I've written to add tasks to the my Celery queue is leaking memory (to the point where the kernel kills the process after 20 minutes). In this script, I'm just executing the same 300 tasks repeatedly, every 60 seconds (inside a
while True
loop).The parameters passed to the task,
makeGroupRequest()
, are dictionaries containing strings, and according to hpy and objgraph, dicts and strings are also what's growing uncontrollably in memory (specifically, they grow after the.get()
call). I've included the outputs of hpy below on successive iterations of the loop.I've spent days on this, and I can't understand why memory would grow uncontrollably, considering nothing is re-used between loops, and everything is overridden. If I skip the sending/retrieval of tasks, the memory doesn't appear to leak (so the leak appears to be with Celery or some combination of my code/objects and Celery). An issue like this was mentioned in #3813, but I'm seeing the memory leak even though I'm not polling the results, and just calling
.get()
. Is there something that could be causing the task results to stick around in memory, even though I no longer have references to them?Here is an outline of the code that's executing.
Here is the output of
hpy
on sucessive runs:Loop 2:
Loop 3:
Python Packages
The text was updated successfully, but these errors were encountered: