New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Memory leak on celery over Heroku (actually I don't think it's a celery issue), just can't figure out what's happening #3339
Comments
We're experiencing the same issue recently, but using an amqp backend. |
I'm not sure, but from my experience Python never releases memory back to the OS once allocated. Apparently the rationale is that releasing memory is expensive and Python will want to use it again. Try to call the task multiple times to see if the number keeps on growing, if the number grows then add a |
Btw, maxtasksperchild should be releasing the memory since that will kill the child process. Are you sure it's the child process here that is consuming the memory? |
@codingjoe somewhat good to know I'm not the only one with this issue. 😫 @ask I tried the Also, I'm sure its related to a child process. What I did to ensure that:
As a result, the celery machine was stable during (1) (its memory usage was on And in fact, I think |
@filwaitman sorry to disappoint you. It was actually a task, that was leaking memory and it ran until the entire machine was killed by heroku. You should get new relic to profile you tasks. The problem might just be there. |
@codingjoe got it. On my case I'm using a test task doing basically nothing. =( |
That you only generate a list of integers in this task, would suggest to me that this behavior is intrinsic to Python. Celery itself won't hold onto these numbers. The reason I suggested trying
which meant the gc collect cycles were too slow for the memory allocated by python With explicit gc.collect between tasks you'd see the expected:
and process RSS usage is constant. I was under the impression that doing an explicit @shared_task
def nothing_at_all():
a = range(3000000)
print a[0]
del(a) # < -- remove reference count.
import gc
gc.collect() |
Oh, man. Of course, the Anyway, I'm 99.9% sure this is not a celery issue (sounds like a infrastructure one). With this on mind and since you guys have a lot of real issues to solve: Do you want me to close it? I mean, I opened this as a hope you've seen this before, and I don't wanna bother you (anymore) if this is not the case 😆 |
closing for now |
@filwaitman: did you find a resolution to this? I'm seeing the same with Celery on Heroku |
@vesterbaek nope. I'm still facing this. I'm ignoring this because the project owner doesn't let me debug it decently ("dev environment is too busy to let it stuck to debug this"). 🙃 |
This blog might help debugging: http://chase-seibert.github.io/blog/2013/08/03/diagnosing-memory-leaks-python.html |
I'm also facing the same issue using Celery on Heroku. I end up restarting that dyno when it happens, but that's not ideal. |
Ditto. It's a nightmare. We have to restart twice a day. |
Yeah, same here. Restarting every other day. |
Since my last post, we tried @ask's approach as detailed above. So far so good. Memory usage barely creeps up at all any more. |
Hi @GDay and @filwaitman did you ever get to the bottom of this? I've got some very tasks that need access to a very large object (almost 5gb in memory) and am experiencing the same issue. The usage starts off fine, but over approximately 20 tasks it exhausts all my ram on a very high powered machine (32 core, 64gb ram) and is killed by the kernel. If I set the max tasks to 1 it does seem to help but unsure if it's simply mitigating the issue... I'd be very grateful to know if you did deduce more information on this. |
@DomHudson nope. =( |
@DomHudson The small trick I explained still works fine for me. It never exceeds the limits and does the job perfectly. Never had any issues ever since. |
Facing the same issue - with no solution yet. Bad things happen when I start getting R14 because the system seriously slows down - and is not restarted until R15 is raised (swap exceeded). I'm considering monitoring the logs for R14 and restart the dyno on first sight. |
Okay thanks both very much . I'll stick with the |
@DomHudson haven't tried it actually. At peak I'm running quite a few tasks per second and I'm concerned with the performance implications of having to spawn a new child thread or process (not sure which is used) for each task. I'm running my workers with |
Had similar bugs in a self-hosted deployment. Using RabbitMQ instead of Redis as a broker solved my problem. |
@vesterbaek agreed - I was also worried about this and indeed it does seem to be having an adverse effect, slowing down my task processing noticeably. Interesting @fjsj , did you find that the kernel was listing the RabbitMQ service as consuming the memory or the celery processes? |
@DomHudson I was using Redis as a broker, but the memory leaks were in Celery processes. When I've changed to RabbitMQ, the leak was gone. I guess there's something broken in Celery-Redis integration. I'm now using RabbitMQ as a broker and Redis as a result backend. It's fine now. |
Sorry yes, Rabbit was a typo - okay great thanks for the insight! I will investigate if the same occurs on my end. |
I'm on Celery 4.1.0 and using RabbitMQ as broker with no result backend. There had the same experience with leaks on Celery 3.x |
Anybody found a solution to this issue? We are facing the same issues with celery 4.1.0 on Heroku. |
In case anybody is coming into this at 4.0+, So the above suggestion documented by @GDay would be:
Which I'm going to be trying on my next push. |
Facing the same issue with Heroku. My task is actually a long running one but I don't know if that should affect the celery worker instance. |
Hey @ask! How are you doing?
I'm getting a weird behavior on celery working over Heroku.
Sounds like I'm having a memory leak. Actually it behaves like a memory leak, but I don't think this is the case - just don't know what's this about, though.
For some reason seems like my scheduled tasks are not releasing memory after they're finished.
It doesn't seem to be related to scheduled tasks code, since I removed all code running on it and this "leak" is still present (see details below).
I know this "memory releasing" part is Python's responsibility, and I don't expect the memory to be released right after task is executed. But I'm getting my celery machines on a memory usage rate of 170% (by using swap and getting a bunch of R14 errors). Check it out:
(I have restarted celery at 14pm UTC. That's why memory was released.)
My pip requirements:
How I'm using celery on this machine (Procfile):
My celery configs:
(let me know if you need any additional info)
I used to have a bunch of things on my celery scheduler. For debugging purposes I erased them (there's no scheduled tasks running). Without a single scheduled task running the "leak" has vanished.
After this I created a (stupid?) scheduled task just for testing purposes:
Random notes:
CELERYD_MAX_TASKS_PER_CHILD
on heroku machines. Not even this stopped the "leak".As I mentioned before I'm pretty convinced this is not an issue on your side. But maybe you know what's happening here. I confess I'm a bit lost now. 😆
Let me know if you have any clue what's going on.
Thanks!
The text was updated successfully, but these errors were encountered: