Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Delay memory leak #3279

Closed
ztsv opened this issue Jun 28, 2016 · 18 comments
Closed

Delay memory leak #3279

ztsv opened this issue Jun 28, 2016 · 18 comments

Comments

@ztsv
Copy link

ztsv commented Jun 28, 2016

I have celery task:

@shared_task(bind=True, default_retry_delay=10 * 60, max_retries=20, acks_late=True)
def some_task(self, tracker_id):
    # some django logic here
    print ('before to_update: ', resource.getrusage(resource.RUSAGE_SELF).ru_maxrss / 1000)
    while to_update: # to_update is list of 30000 id's (long integers)
        update.delay(to_update.pop())
    print ('after sending update tasks: ', resource.getrusage(resource.RUSAGE_SELF).ru_maxrss / 1000)
    # ~update_user_info.map(to_update).apply_async()
    # some django logic here
    return True

And i got leak about 650mb.

('before to_update: ', 165.376)
('after sending update tasks: ', 808.94)
@ask
Copy link
Contributor

ask commented Jun 28, 2016

What version?

@ztsv
Copy link
Author

ztsv commented Jun 28, 2016

kombu==3.0.34
celery==3.1.23
billiard==3.3.0.23

@ask
Copy link
Contributor

ask commented Jun 29, 2016

Probably have to wait for gc.collect first?

@ztsv
Copy link
Author

ztsv commented Jun 29, 2016

but why 650 mb after 30 000 calls of delay?

@yeago
Copy link

yeago commented Jul 5, 2016

I don't know if its related but i have been having some pretty nasty memory leakage in 3.1.23 Redis backend if that helps.

@roks0n
Copy link

roks0n commented Jul 14, 2016

Have been experiencing something similar. We are using SQS and are sending tasks to the queue using apply_async(). We've been profiling for a few days and weren't able to reproduce the behaviour in a test environment. We did however solve the problem in the end by setting max tasks per child. Not ideal but it did the trick.

What we were seeing is basically memory gradually increasing and a huge drop after some random time or after the task got killed because no more memory was available:

image

We use:
celery==3.1.23
kombu==3.0.34
billiard==3.3.0.23

@roks0n
Copy link

roks0n commented Aug 2, 2016

Just wanted to follow up on this. We've managed to solve our problem ... we didn't increase CPU on our instance (:facepalm:) and we've also made some changes to the worker so that it processes some things in several chunks.

@mkuchen
Copy link

mkuchen commented Aug 15, 2016

@yeago @ask @ztsv I've been experiencing similar issues with a Django app (deployed via Heroku) with redis backend, with versions:
Django==1.9.6
celery==3.1.23
kombu==3.0.35
billiard==3.3.0.23

I have a task on a process that executes at a rate of approximately 1000 tasks per minute. I've resigned to restarting this particular worker queue every 3 hours to prevent exceeding memory quotas.

Do any of you have suggestions for debugging this problem? It's persisted for a while but I'm getting the point where restarting the worker isn't going to be feasible. Any help is greatly appreciated!

@mkuchen
Copy link

mkuchen commented Aug 15, 2016

Note that I've tried using gc.collect() at the beginning and end of the task in question but to no avail.

@vesterbaek
Copy link

Ref #3339

@vesterbaek
Copy link

vesterbaek commented Sep 12, 2017

@mkuchen and @roks0n: did you resolve this?

@mkuchen
Copy link

mkuchen commented Sep 12, 2017

@vesterbaek still haven't resolved on my end. I'm planning on upgrading to Celery 4 sometime in the next few months to see if that fixes it. For now I've resigned myself to just having a cron that restarts my Celery workers every 3 hours to reset the memory consumption.

@vesterbaek
Copy link

Ok, thanks for getting back. I did upgrade to Celery 4 - unfortunately with no improvement in this area.

@danqing
Copy link

danqing commented Oct 8, 2017

Just noticed this today. On 4.1.0 with rabbit. Running a bunch of jobs and ran out of (2TB of) memory in a few hours...

@thedrow
Copy link
Member

thedrow commented Oct 8, 2017

@danqing @vesterbaek Do you have a test case that can reproduce this issue?

@vesterbaek
Copy link

@thedrow Unfortunately not. I've been trying to provoke this in development -- but have not found a good way to achieve this. On production - when it happens - is hard for me because this is running on Heroku. For now, I'm mitigating the problem by monitoring Heroku logs for R14 (OOM) and restarting the workers when this happens.

@auvipy
Copy link
Member

auvipy commented Dec 19, 2017

probably you could try latest master and check #3339 this issue

@auvipy auvipy closed this as completed Dec 19, 2017
@paulochf
Copy link

My colleagues at my work were getting this problem and they solved by switching to rq. The CELERYD_MAX_TASKS_PER_CHILD setting wasn't enough.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

10 participants