Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DatabaseBackend.cleanup causes OOM kill for large expired result #232

Closed
daadu opened this issue Sep 4, 2021 · 7 comments · Fixed by #235
Closed

DatabaseBackend.cleanup causes OOM kill for large expired result #232

daadu opened this issue Sep 4, 2021 · 7 comments · Fixed by #235

Comments

@daadu
Copy link
Contributor

daadu commented Sep 4, 2021

Problem

DatabaeBackend.cleanup() method uses Django's QuerySet.delete method - which could be stupidly slow and cause a LOT of memory bloting since this operation loads all object in-memory before deleting. Following is the quote explaining it from the docs

The delete() method does a bulk delete and does not call any delete() methods on your models. It does, however, emit the pre_delete and post_delete signals for all deleted objects (including cascaded deletions).

Django needs to fetch objects into memory to send signals and handle cascades. However, if there are no cascades and no signals, then Django may take a fast-path and delete objects without fetching into memory. For large deletes this can result in significantly reduced memory usage. The amount of executed queries can be reduced, too.

ForeignKeys which are set to on_delete DO_NOTHING do not prevent taking the fast-path in deletion.

Note that the queries generated in object deletion is an implementation detail subject to change.

Possible Solution

  1. Bypassing Django's ORM and singals layer with either or the following:
    a. using raw SQL for deletion - with RawQuerySet to delete the selected raws (Note: that like QuerySet they are lazy too, so would need to make sure the query is fired)
    b. using raw SQL for deletion - with connection.cursor.execute API
    c. using QuerySet._raw_delete(using) abstraction - only issue being that the method is private

Other notes/reads

  • Ticker-9519 - discussing adding some bulk_delete method which is more efficient with deleting large volume of record

To conclude

Since celery-result could grow very fast (for system that executes lots of tiny celery tasks) - for such system this behaviour could cause OOM kills and the actual "delete/cleanup operation" is never performed.

PS: I would like to contribute this fix, if given green signal here.

@daadu
Copy link
Contributor Author

daadu commented Sep 4, 2021

I would say we should go with option 1c as mentioned above. To catch implementation changes in Django we can add proper "test" if possible.

Additionally the current behaviour could be kept same and a config flag given which will use the "new bypass ORM/signal delete mechanism"/

@auvipy
Copy link
Member

auvipy commented Sep 4, 2021

I will look deeply on the isseu. thanks for report btw

@daadu
Copy link
Contributor Author

daadu commented Sep 5, 2021

as the doc mentions, there needs to be a pre_delete or post_delete for this to happen.

In my case, we have a general "auditing middleware" which listens to this signal without any sender filter (therefore the signal handler is registered for TaskResult model aswell) - that is causing the delete to load instances in memory.

@daadu
Copy link
Contributor Author

daadu commented Sep 5, 2021

work around

the default "clean_up" task is scheduled at 4AM, so I have a custom task scheduled at 3:50AM which does the "raw delete", and when the celery.backend_cleanup is called - it has nothing much to clean (therefore not much memory usage)

@daadu
Copy link
Contributor Author

daadu commented Sep 24, 2021

@auvipy any update? should I go ahead with PR?

@auvipy
Copy link
Member

auvipy commented Sep 27, 2021

yes please

@auvipy
Copy link
Member

auvipy commented Oct 11, 2021

queryset._raw_delete(queryset.db) private API ;)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants