Fixed #28462 -- Decreased memory usage with ModelAdmin.list_editable. #9920

AdamDonna · 2018-05-03T14:04:38Z

Based on the bug https://code.djangoproject.com/ticket/28462

charettes · 2018-05-03T14:09:18Z

django/contrib/admin/options.py

@@ -1510,6 +1511,16 @@ def add_view(self, request, form_url='', extra_context=None):
    def change_view(self, request, object_id, form_url='', extra_context=None):
        return self.changeform_view(request, object_id, form_url, extra_context)

+    def get_edited_object_ids(self, request, prefix):


get_edited_object_pks

charettes · 2018-05-03T14:10:07Z

django/contrib/admin/options.py

+    def get_edited_object_ids(self, request, prefix):
+        # Get the objects ids to filter the queryset to get only the objects that will be updated.
+        # Matching on anything incase we come up against non-numeric fields
+        regexp = re.compile('{prefix}-(\w+)-id$'.format(prefix=prefix))


This must take into account that the primary key can be named something these than id? self.model._meta.pk.name should do. Also \w is too loose, you're not matching against the pk value but against the form index so \d+ should match. The capture group is also unnecessary.

It might also be worth considering bounding the index based on the values provided in the management form?

Could you elaborate on what advantage bounding the indexes based on the management form would provide?

charettes · 2018-05-03T14:11:07Z

django/contrib/admin/options.py

+        for key, value in request.POST.items():
+            if regexp.match(key):
+                object_ids.append(value)
+        return object_ids


Could be reduced to a list comprehension

return [ value for key, value in request.POST.items() if regexp.match(key) ]

charettes · 2018-05-03T14:11:55Z

django/contrib/admin/options.py

+    def get_edited_object_ids(self, request, prefix):
+        # Get the objects ids to filter the queryset to get only the objects that will be updated.
+        # Matching on anything incase we come up against non-numeric fields
+        regexp = re.compile('{prefix}-(\w+)-id$'.format(prefix=prefix))


Might want to give this a less generic name, pk_pattern maybe?

charettes · 2018-05-03T14:12:06Z

django/contrib/admin/options.py

@@ -1510,6 +1511,16 @@ def add_view(self, request, form_url='', extra_context=None):
    def change_view(self, request, object_id, form_url='', extra_context=None):
        return self.changeform_view(request, object_id, form_url, extra_context)

+    def get_edited_object_ids(self, request, prefix):
+        # Get the objects ids to filter the queryset to get only the objects that will be updated.
+        # Matching on anything incase we come up against non-numeric fields


Wrap comments at 79 chars, drop the comment about matching on anything, you're matching against the form index which will be numeric not the actual field value.

charettes · 2018-05-03T14:15:39Z

django/contrib/admin/options.py

@@ -1601,7 +1612,9 @@ def changelist_view(self, request, extra_context=None):
        # Handle POSTed bulk-edit data.
        if request.method == 'POST' and cl.list_editable and '_save' in request.POST:
            FormSet = self.get_changelist_formset(request)
-            formset = cl.formset = FormSet(request.POST, request.FILES, queryset=self.get_queryset(request))
+            object_ids = self.get_edited_object_ids(request, FormSet.get_default_prefix())


object_pks.

charettes · 2018-05-03T14:16:47Z

tests/admin_changelist/tests.py

+            'form-1-speed': '5.0',
+            'form-2-load': '5.0',
+            'form-2-speed': '4.0',
+            '_save': 'Save',


Don't we also have to pass along the management form data?

This test doesn't need that data. However to simulate real usage it should be added.

charettes · 2018-05-03T14:20:28Z

django/contrib/admin/options.py

-            formset = cl.formset = FormSet(request.POST, request.FILES, queryset=self.get_queryset(request))
+            object_ids = self.get_edited_object_ids(request, FormSet.get_default_prefix())
+            formset = cl.formset = FormSet(request.POST, request.FILES,
+                                           queryset=self.get_queryset(request).filter(pk__in=object_ids))


Are we sure the pk__in filter won't choke if passed non-numeric pks in a string form? What about uuids pk or even datetime that might require to be cleaned by the form?

I'll update the tests to have uuids as their primary keys which should provide some certainty

charettes · 2018-05-03T14:22:04Z

tests/admin_changelist/tests.py

+        }
+        request = self.factory.post(changelist_url, data=data)
+        ids = m.get_edited_object_ids(request, 'form')
+        self.assertItemsEqual(ids, [str(a.pk), str(b.pk), str(c.pk)])


assertEqual?

carltongibson · 2018-05-08T10:29:47Z

This looks pretty good to me. (Need to change the base branch, rebase, squash and adjust the commit message, but we can do all that.)

The general strategy seems fine. (There's no other way of getting the PKs from the request.)

I wondered about being stricter on using the submitted data (with a form to validate say) but I think that may be overkill. The formset.is_valid() call already raises is you mess with the request data in the obvious ways (skipping a form-0-id for example).

Are we sure the pk__in filter won't choke if passed non-numeric pks in a string form? What about uuids pk or even datetime that might require to be cleaned by the form?

String PKs work with the filter call no problem. I wasn't sure of an example for (e.g.) a datetime?

It might also be worth considering bounding the index based on the values provided in the management form?

I added the management form data to the test case, but wasn't sure if we needed to validate against it?

timgraham · 2018-05-09T01:20:41Z

A test that uses CaptureQueriesContext might be able to verify the presence of the WHERE clause for filtering by pk.

AdamDonna · 2018-05-09T01:59:17Z

@carltongibson I agree that formset.is_valid() should be enough to validate the forms, at least for the use cases i can think of.

I don't think we need to validate against the management form data. Would we add sanity to someone's workflows by validating against it?

I'm also not sure about date time examples choking the filter. I can't think of an appropriate time to use datetime as your PK. I suspect it's an edge case but I don't want to make assumption and break someone's build.

charettes · 2018-05-09T03:12:48Z

I'm also not sure about date time examples choking the filter. I can't think of an appropriate time to use datetime as your PK. I suspect it's an edge case but I don't want to make assumption and break someone's build.

I agree that it's an edge case and could probably be revisited later if it really breaks someone's setup.

The reason I initially mentioned it is that I wanted to make sure that:

Complex data types would be handled correctly; it looks like UUID is working fine so we should be good.
Make sure that editable=True primary keys are working fine. I could be missing something but this patch will cause a crash if an invalid primary key is submitted (e.g. TypeError) instead of a validation error like it used to. I guess we could try cleaning the submitted pks and and disable the optimization if an exception is raised instead of crashing?

carltongibson · 2018-05-09T13:36:56Z

...could probably be revisited later if it really breaks someone's setup.

This is where I'll ask Tim 🙂. Elsewhere I'd be inclined say it's probably OK, ship it and then quickly handle any actual bugs that are revealed in a point release. I prefer that to trying to guess what the issues are in advance, where it's not really obvious. Can we do that here? (It's a Bug so it would go into 2.x, but if we targeted it for 2.1 there's the beta, so we could expect feedback...)

@AdamDonna: Can you please change the base branch to master on the PR, rebase, squash the commits and adjust the commit message to the required format? (If you need help let me know.)

charettes · 2018-05-09T14:08:31Z

Well I'll defer to you and Tim but I think the editable primary key is a legitimate issue that should be fixed before shipping.

carltongibson · 2018-05-09T14:12:12Z

Ei! Sorry @charettes — I missed that was still pending. (Going blind by this stage of the afternoon 😵)

(My question was more for after that...)

AdamDonna · 2018-05-10T03:49:11Z

@charettes I'm still a bit of a noob, could you elaborate on how to cause the TypeError case so I can solve this issue?

@carltongibson I'll get that sorted once the invalid primary key issue is resolved. Will this patch be back ported to 1.11.x users as well?

carltongibson · 2018-05-10T06:31:41Z

Will this patch be back ported to 1.11.x users as well?

It would be nice. This is a pretty devastating bug for list_editable if you have any size of dataset. (That the concurrency issue isn't really addressed is all the worse.) Thus if we're confident enough then I'd be in favour of making the case the back porting as a special exception. (It's not security, so normally no.)

First, it comes down to that confident enough. Hence my query above: it would be nice to ask people to actually test it.

charettes · 2018-05-11T22:50:21Z

@AdamDonna,

I'm still a bit of a noob, could you elaborate on how to cause the TypeError case so I can solve this issue?

Sure! Given the following scenario:

class Book(models.Model):
    uuid = models.UUIDField(editable=True, primary_key=True)
    ...

class BookAdmin(ModelAdmin):
    list_editable = ['uuid', ...]

In this case it's possible an invalid uuid is submitted by a user. Right now against master this results in a ValidationError that is displayed to the user. With your patch it will result in a crash when trying to filter by an invalid uuid. Unfortunately, given the lazy nature of querysets I can't see how this can be fixed at the view level. I assume this will have to happen at the formset level somehow.

You can test it out by trying to pass an invalid value for form-0-uuid and notice how it will crash on database with a real UUID field (e.g. PostgreSQL but not SQLlite which uses a CharField). It's not necessary to define a new mode with an editable primary key, you can reuse the existing model and pass along an invalid pk and assert a validation error is correctly displayed in this case.

Also, your branch is based off stable/1.11.x. You might want to create a new one against master and the committer will take care of the backport if required.

jarshwah · 2018-05-15T04:27:46Z

I think an argument can be made for a backport considering this is a crashing bug for large datasets (OutOfMemory).

(disclosure: I work with Adam who is pushing this patch through)

AdamDonna · 2018-05-16T04:54:20Z

@charettes I've added the test to see the the type error that gets thrown.
I'm keen to hear possible solutions to this issue. I think I need some guidance before I go ahead and make changes to forms/formsets

carltongibson · 2018-05-16T14:19:40Z

I pushed c750d8a with adjustments for testing the ValidationError with a bad UUID string.

The SQLite CI failure is unrelated, which is annoying because...
A ValidationError is already raised by the .filter(pk__in=object_pks) with the bad UUID — as long as the QuerySet is evaluated.
- This happens at the field level during evaluation (in to_python) so it already works on all DBs.
I pulled that into a helper method get_list_editable_queryset, because it isn't very pretty. But maybe it's enough?

To get the admin to actually let me use this I had to make the PK editable as well as provide list_display, list_editable and list_display_links (because it was the first item in the list_display). A small part of me thought the admin would have been well within its rights to tell me this was not supported. 🙂

carltongibson · 2018-05-16T14:24:14Z

django/contrib/admin/options.py

+        modified_objects = self.get_queryset(request).filter(pk__in=object_pks)
+        try:
+            # Force evaluate queryset to verify that pks are valid
+            list(modified_objects)


The FormSet later needs modified_objects as a QuerySet, rather than a list. So is there a nicer way of forcing evaluation here?

I guess we could try calling the primary key's to_python instead of hitting the database here.

def get_list_editable_queryset(self, request, prefix): object_pks = self.get_edited_object_pks(request, prefix) queryset = self.get_queryset(request) validate = queryset.model._meta.pk.to_python try: for pk in object_pks: validate(pk) except ValidationError: # Disable optimization return queryset return queryset.filter(pk__in=object_pks)

charettes · 2018-05-16T16:21:07Z

django/contrib/admin/options.py

+            # Force evaluate queryset to verify that pks are valid
+            list(modified_objects)
+        except ValidationError:
+            raise ValidationError("Invalid Primary Key Provided")


Won't this also result in a 500?

Ah yes. It no doubt will. (That's too much DRF that is. 🙂) We need to handle this. 👍

@charettes' earlier point:

I guess we could try cleaning the submitted pks and and disable the optimization if an exception is raised instead of crashing?

If we here return self.get_queryset(request) we're no worse off than currently.

@AdamDonna Do you think you could put that in and adjust the test(s) for that?

I'll add that in and adjust the tests. I think this is a good compromise since it's no worse than it is currently, but most cases will be handled in a better way.

charettes · 2018-05-16T16:25:07Z

django/contrib/admin/options.py

+        modified_objects = self.get_queryset(request).filter(pk__in=object_pks)
+        try:
+            # Force evaluate queryset to verify that pks are valid
+            list(modified_objects)


I guess we could try calling the primary key's to_python instead of hitting the database here.

def get_list_editable_queryset(self, request, prefix): object_pks = self.get_edited_object_pks(request, prefix) queryset = self.get_queryset(request) validate = queryset.model._meta.pk.to_python try: for pk in object_pks: validate(pk) except ValidationError: # Disable optimization return queryset return queryset.filter(pk__in=object_pks)

AdamDonna · 2018-05-17T13:55:59Z

@charettes I think not hitting the DB was a great idea with to_python so I went with that for validation. I've added tests for get_list_editable_queryset, validation error path and valid pk path

charettes

Thanks for the hard work @AdamDonna and @carltongibson. All my concerns have been addressed 🎉 !

carltongibson · 2018-05-17T15:07:31Z

Great. Thanks @AdamDonna for your effort, and thank you @charettes for your review!

I've marked this RFC. I'll wait for Tim's input on whether we can back port to 1.11 or not.

AdamDonna · 2018-05-27T14:32:18Z

Hey @carltongibson and @charettes thanks for your help with this change. How'd the RFC go?

timgraham · 2018-05-27T15:32:05Z

I haven't reviewed in detail but did you give any thought to my suggestion, "A test that uses CaptureQueriesContext might be able to verify the presence of the WHERE clause for filtering by pk."
While the new methods are tested, I don't see a test for the fact that changelist_view() uses them.

AdamDonna · 2018-05-27T23:17:07Z

I think that's a good test to validate the usage in changelist_view(). I'll add that in soon.

timgraham

I'm thinking we could put this into tomorrow's 2.0.6 release and then 1.11.x in July.

timgraham · 2018-06-01T01:37:23Z

django/contrib/admin/options.py

@@ -1583,6 +1584,24 @@ def add_view(self, request, form_url='', extra_context=None):
    def change_view(self, request, object_id, form_url='', extra_context=None):
        return self.changeform_view(request, object_id, form_url, extra_context)

+    def get_edited_object_pks(self, request, prefix):


I would organize these methods else (inside of in the middle of the views) and prefix them with an underscore to indicate that they're helpers, not public APIs.

So they should be defined inside changelist_view?

timgraham · 2018-06-01T01:37:44Z

django/contrib/admin/options.py

+            for pk in object_pks:
+                validate(pk)
+        except ValidationError:
+            # Disable optimization


I don't see the reasoning for disabling the optimization on invalid data. Wouldn't that allow malicious users (unlikely, I guess, if they already have access to the admin) to create memory issues?

The reason was that we’d end up with a 500 server error in this case, whereas now we get a validation error.

An alternative that we could use here is the old approach ‘cl.result_list’, which we know is sensibily limited to just one page.

Either that, or since it's invalid POST data, bail out here and report the error to the user.
(That's a little bit more work though; I haven't yet thought what that looks like.)

AdamDonna · 2018-06-01T13:34:26Z

@timgraham I've got one more test to add to verify the view list editable uses the where clause with CaptureQueriesContext as per your suggestion

timgraham · 2018-06-01T13:37:38Z

Will that be ready within an hour? If not, I don't mind adding it later.

AdamDonna · 2018-06-01T13:40:37Z

I've got it done now but the rebase is giving me grief because the it's well ahead of my local (I only noticed when attempting to push).

timgraham · 2018-06-01T13:43:27Z

If you want to paste the test here, I'll add it the commit that I pushed.

AdamDonna · 2018-06-01T13:45:20Z

from django.db import connection
from django.test.utils import CaptureQueriesContext

...

    def test_changelist_view_list_editable_changed_objects_uses_filter(self):
        a = Swallow.objects.create(origin='Swallow A', load=4, speed=1)
        Swallow.objects.create(origin='Swallow B', load=2, speed=2)
        data = {
            'form-TOTAL_FORMS': '2',
            'form-INITIAL_FORMS': '2',
            'form-MIN_NUM_FORMS': '0',
            'form-MAX_NUM_FORMS': '1000',
            'form-0-uuid': str(a.pk),
            'form-0-load': 10,
            '_save': 'Save',
        }
        superuser = self._create_superuser('superuser')
        self.client.force_login(superuser)
        changelist_url = reverse('admin:admin_changelist_swallow_changelist')
        with CaptureQueriesContext(connection) as context:
            response = self.client.post(changelist_url, data=data)
            self.assertEquals(response.status_code, 200)
            self.assertIn('WHERE', context.captured_queries[4]['sql'])
            self.assertIn('IN', context.captured_queries[4]['sql'])
            self.assertIn(str(a.pk).replace('-', ''), context.captured_queries[4]['sql'])

timgraham · 2018-06-01T13:59:54Z

.replace('-', '') isn't going to work across all databases. For example, PostgreSQL includes the dashes. I'm not sure if there's a way to do the comparison that won't be brittle. Maybe checking for "WHERE" in the query is enough.

AdamDonna · 2018-06-01T14:12:22Z

The 'where' and 'in' are the minimum sufficient IMO. UUID was a stretch goal, on the off chance that another query was executed before it and it's no longer the 4th it verifies the where is identified on the right query.
You solution of checking the start is a good compromise.

Regression in 917cc28.

AdamDonna · 2018-06-01T15:04:21Z

Thanks for the help everyone 👍

JakeBar · 2018-07-17T02:14:41Z

AUTHORS

@@ -11,6 +11,7 @@ answer newbie questions, and generally made Django that much better:
    Abeer Upadhyay <ab.esquarer@gmail.com>
    Abhishek Gautam <abhishekg1128@yahoo.com>
    Adam Bogdał <adam@bogdal.pl>
+    Adam Donaghy


charettes reviewed May 3, 2018

View reviewed changes

AdamDonna force-pushed the ticket_28462 branch 2 times, most recently from 4185697 to 9c1c6b2 Compare May 4, 2018 15:51

carltongibson mentioned this pull request May 8, 2018

Fixed #28462 -- ModelAdmin.list_editable unusably slow and memory intensive with large datasets #9820

Closed

AdamDonna force-pushed the ticket_28462 branch from ed3a971 to 7f851c4 Compare May 16, 2018 03:01

AdamDonna changed the base branch from stable/1.11.x to master May 16, 2018 03:02

AdamDonna force-pushed the ticket_28462 branch from 7f851c4 to 83cba6b Compare May 16, 2018 03:22

carltongibson force-pushed the ticket_28462 branch from 83cba6b to c750d8a Compare May 16, 2018 14:07

carltongibson reviewed May 16, 2018

View reviewed changes

charettes reviewed May 16, 2018

View reviewed changes

AdamDonna force-pushed the ticket_28462 branch from d58d0ac to 8a9fd0d Compare May 17, 2018 12:34

charettes approved these changes May 17, 2018

View reviewed changes

timgraham reviewed Jun 1, 2018

View reviewed changes

timgraham changed the title ~~Fixed #28462 -- ModelAdmin.list_editable memory intensive with large datasets~~ Fixed #28462 -- Decreased memory usage with ModelAdmin.list_editable. Jun 1, 2018

timgraham force-pushed the ticket_28462 branch from c160e14 to c0e803e Compare June 1, 2018 13:22

timgraham force-pushed the ticket_28462 branch from c0e803e to 9d60bb2 Compare June 1, 2018 14:10

Fixed #28462 -- Decreased memory usage with ModelAdmin.list_editable.

b18650a

Regression in 917cc28.

timgraham force-pushed the ticket_28462 branch from 9d60bb2 to b18650a Compare June 1, 2018 14:41

timgraham merged commit b18650a into django:master Jun 1, 2018

JakeBar reviewed Jul 17, 2018

View reviewed changes

Fixed #28462 -- Decreased memory usage with ModelAdmin.list_editable. #9920

Fixed #28462 -- Decreased memory usage with ModelAdmin.list_editable. #9920

Conversation

AdamDonna commented May 3, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

carltongibson commented May 8, 2018

timgraham commented May 9, 2018

AdamDonna commented May 9, 2018

charettes commented May 9, 2018

carltongibson commented May 9, 2018 • edited

charettes commented May 9, 2018

carltongibson commented May 9, 2018 • edited

AdamDonna commented May 10, 2018

carltongibson commented May 10, 2018

charettes commented May 11, 2018 • edited

jarshwah commented May 15, 2018

AdamDonna commented May 16, 2018

carltongibson commented May 16, 2018 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

AdamDonna commented May 17, 2018

charettes left a comment

Choose a reason for hiding this comment

carltongibson commented May 17, 2018

AdamDonna commented May 27, 2018

timgraham commented May 27, 2018

AdamDonna commented May 27, 2018

timgraham left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

carltongibson Jun 1, 2018 • edited

Choose a reason for hiding this comment

AdamDonna commented Jun 1, 2018

timgraham commented Jun 1, 2018

AdamDonna commented Jun 1, 2018

timgraham commented Jun 1, 2018

AdamDonna commented Jun 1, 2018 • edited

timgraham commented Jun 1, 2018

AdamDonna commented Jun 1, 2018

AdamDonna commented Jun 1, 2018

Choose a reason for hiding this comment

carltongibson commented May 9, 2018 •

edited

carltongibson commented May 9, 2018 •

edited

charettes commented May 11, 2018 •

edited

carltongibson commented May 16, 2018 •

edited

carltongibson Jun 1, 2018 •

edited

AdamDonna commented Jun 1, 2018 •

edited