Delay memory leak #3279

ztsv · 2016-06-28T13:14:58Z

I have celery task:

@shared_task(bind=True, default_retry_delay=10 * 60, max_retries=20, acks_late=True)
def some_task(self, tracker_id):
    # some django logic here
    print ('before to_update: ', resource.getrusage(resource.RUSAGE_SELF).ru_maxrss / 1000)
    while to_update: # to_update is list of 30000 id's (long integers)
        update.delay(to_update.pop())
    print ('after sending update tasks: ', resource.getrusage(resource.RUSAGE_SELF).ru_maxrss / 1000)
    # ~update_user_info.map(to_update).apply_async()
    # some django logic here
    return True

And i got leak about 650mb.

('before to_update: ', 165.376)
('after sending update tasks: ', 808.94)

The text was updated successfully, but these errors were encountered:

ask · 2016-06-28T20:27:11Z

What version?

ztsv · 2016-06-28T20:29:35Z

kombu==3.0.34
celery==3.1.23
billiard==3.3.0.23

ask · 2016-06-29T22:48:41Z

Probably have to wait for gc.collect first?

ztsv · 2016-06-29T22:50:59Z

but why 650 mb after 30 000 calls of delay?

yeago · 2016-07-05T20:47:20Z

I don't know if its related but i have been having some pretty nasty memory leakage in 3.1.23 Redis backend if that helps.

roks0n · 2016-07-14T21:23:42Z

Have been experiencing something similar. We are using SQS and are sending tasks to the queue using apply_async(). We've been profiling for a few days and weren't able to reproduce the behaviour in a test environment. We did however solve the problem in the end by setting max tasks per child. Not ideal but it did the trick.

What we were seeing is basically memory gradually increasing and a huge drop after some random time or after the task got killed because no more memory was available:

We use:
celery==3.1.23
kombu==3.0.34
billiard==3.3.0.23

roks0n · 2016-08-02T21:02:02Z

Just wanted to follow up on this. We've managed to solve our problem ... we didn't increase CPU on our instance (:facepalm:) and we've also made some changes to the worker so that it processes some things in several chunks.

mkuchen · 2016-08-15T19:04:35Z

@yeago @ask @ztsv I've been experiencing similar issues with a Django app (deployed via Heroku) with redis backend, with versions:
Django==1.9.6
celery==3.1.23
kombu==3.0.35
billiard==3.3.0.23

I have a task on a process that executes at a rate of approximately 1000 tasks per minute. I've resigned to restarting this particular worker queue every 3 hours to prevent exceeding memory quotas.

Do any of you have suggestions for debugging this problem? It's persisted for a while but I'm getting the point where restarting the worker isn't going to be feasible. Any help is greatly appreciated!

mkuchen · 2016-08-15T19:06:05Z

Note that I've tried using gc.collect() at the beginning and end of the task in question but to no avail.

vesterbaek · 2017-09-12T11:03:41Z

Ref #3339

vesterbaek · 2017-09-12T11:04:15Z

@mkuchen and @roks0n: did you resolve this?

mkuchen · 2017-09-12T14:27:42Z

@vesterbaek still haven't resolved on my end. I'm planning on upgrading to Celery 4 sometime in the next few months to see if that fixes it. For now I've resigned myself to just having a cron that restarts my Celery workers every 3 hours to reset the memory consumption.

vesterbaek · 2017-09-13T06:54:37Z

Ok, thanks for getting back. I did upgrade to Celery 4 - unfortunately with no improvement in this area.

danqing · 2017-10-08T09:26:49Z

Just noticed this today. On 4.1.0 with rabbit. Running a bunch of jobs and ran out of (2TB of) memory in a few hours...

thedrow · 2017-10-08T09:56:44Z

@danqing @vesterbaek Do you have a test case that can reproduce this issue?

vesterbaek · 2017-10-09T08:31:20Z

@thedrow Unfortunately not. I've been trying to provoke this in development -- but have not found a good way to achieve this. On production - when it happens - is hard for me because this is running on Heroku. For now, I'm mitigating the problem by monitoring Heroku logs for R14 (OOM) and restarting the workers when this happens.

auvipy · 2017-12-19T15:47:08Z

probably you could try latest master and check #3339 this issue

paulochf · 2017-12-20T15:44:40Z

My colleagues at my work were getting this problem and they solved by switching to rq. The CELERYD_MAX_TASKS_PER_CHILD setting wasn't enough.

ask added the Status: Needs Verification ✘ label Jun 29, 2016

thedrow added Issue Type: Bug Report Status: Needs Testcase ✘ and removed Status: Needs Verification ✘ labels Oct 8, 2017

auvipy closed this as completed Dec 19, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Delay memory leak #3279

Delay memory leak #3279

ztsv commented Jun 28, 2016

ask commented Jun 28, 2016

ztsv commented Jun 28, 2016

ask commented Jun 29, 2016

ztsv commented Jun 29, 2016

yeago commented Jul 5, 2016 •

edited

roks0n commented Jul 14, 2016

roks0n commented Aug 2, 2016

mkuchen commented Aug 15, 2016 •

edited

mkuchen commented Aug 15, 2016

vesterbaek commented Sep 12, 2017

vesterbaek commented Sep 12, 2017 •

edited

mkuchen commented Sep 12, 2017

vesterbaek commented Sep 13, 2017

danqing commented Oct 8, 2017

thedrow commented Oct 8, 2017

vesterbaek commented Oct 9, 2017

auvipy commented Dec 19, 2017

paulochf commented Dec 20, 2017

Delay memory leak #3279

Delay memory leak #3279

Comments

ztsv commented Jun 28, 2016

ask commented Jun 28, 2016

ztsv commented Jun 28, 2016

ask commented Jun 29, 2016

ztsv commented Jun 29, 2016

yeago commented Jul 5, 2016 • edited

roks0n commented Jul 14, 2016

roks0n commented Aug 2, 2016

mkuchen commented Aug 15, 2016 • edited

mkuchen commented Aug 15, 2016

vesterbaek commented Sep 12, 2017

vesterbaek commented Sep 12, 2017 • edited

mkuchen commented Sep 12, 2017

vesterbaek commented Sep 13, 2017

danqing commented Oct 8, 2017

thedrow commented Oct 8, 2017

vesterbaek commented Oct 9, 2017

auvipy commented Dec 19, 2017

paulochf commented Dec 20, 2017

yeago commented Jul 5, 2016 •

edited

mkuchen commented Aug 15, 2016 •

edited

vesterbaek commented Sep 12, 2017 •

edited