Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixed #28462 -- Decreased memory usage with ModelAdmin.list_editable. #9920

Merged
merged 1 commit into from Jun 1, 2018

Conversation

AdamDonna
Copy link
Contributor

@@ -1510,6 +1511,16 @@ def add_view(self, request, form_url='', extra_context=None):
def change_view(self, request, object_id, form_url='', extra_context=None):
return self.changeform_view(request, object_id, form_url, extra_context)

def get_edited_object_ids(self, request, prefix):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

get_edited_object_pks

def get_edited_object_ids(self, request, prefix):
# Get the objects ids to filter the queryset to get only the objects that will be updated.
# Matching on anything incase we come up against non-numeric fields
regexp = re.compile('{prefix}-(\w+)-id$'.format(prefix=prefix))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This must take into account that the primary key can be named something these than id? self.model._meta.pk.name should do. Also \w is too loose, you're not matching against the pk value but against the form index so \d+ should match. The capture group is also unnecessary.

It might also be worth considering bounding the index based on the values provided in the management form?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you elaborate on what advantage bounding the indexes based on the management form would provide?

for key, value in request.POST.items():
if regexp.match(key):
object_ids.append(value)
return object_ids
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could be reduced to a list comprehension

return [
    value for key, value in request.POST.items() if regexp.match(key)
]

def get_edited_object_ids(self, request, prefix):
# Get the objects ids to filter the queryset to get only the objects that will be updated.
# Matching on anything incase we come up against non-numeric fields
regexp = re.compile('{prefix}-(\w+)-id$'.format(prefix=prefix))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might want to give this a less generic name, pk_pattern maybe?

@@ -1510,6 +1511,16 @@ def add_view(self, request, form_url='', extra_context=None):
def change_view(self, request, object_id, form_url='', extra_context=None):
return self.changeform_view(request, object_id, form_url, extra_context)

def get_edited_object_ids(self, request, prefix):
# Get the objects ids to filter the queryset to get only the objects that will be updated.
# Matching on anything incase we come up against non-numeric fields
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wrap comments at 79 chars, drop the comment about matching on anything, you're matching against the form index which will be numeric not the actual field value.

@@ -1601,7 +1612,9 @@ def changelist_view(self, request, extra_context=None):
# Handle POSTed bulk-edit data.
if request.method == 'POST' and cl.list_editable and '_save' in request.POST:
FormSet = self.get_changelist_formset(request)
formset = cl.formset = FormSet(request.POST, request.FILES, queryset=self.get_queryset(request))
object_ids = self.get_edited_object_ids(request, FormSet.get_default_prefix())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

object_pks.

'form-1-speed': '5.0',
'form-2-load': '5.0',
'form-2-speed': '4.0',
'_save': 'Save',
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't we also have to pass along the management form data?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test doesn't need that data. However to simulate real usage it should be added.

formset = cl.formset = FormSet(request.POST, request.FILES, queryset=self.get_queryset(request))
object_ids = self.get_edited_object_ids(request, FormSet.get_default_prefix())
formset = cl.formset = FormSet(request.POST, request.FILES,
queryset=self.get_queryset(request).filter(pk__in=object_ids))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we sure the pk__in filter won't choke if passed non-numeric pks in a string form? What about uuids pk or even datetime that might require to be cleaned by the form?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll update the tests to have uuids as their primary keys which should provide some certainty

}
request = self.factory.post(changelist_url, data=data)
ids = m.get_edited_object_ids(request, 'form')
self.assertItemsEqual(ids, [str(a.pk), str(b.pk), str(c.pk)])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

assertEqual?

@carltongibson
Copy link
Member

This looks pretty good to me. (Need to change the base branch, rebase, squash and adjust the commit message, but we can do all that.)

The general strategy seems fine. (There's no other way of getting the PKs from the request.)

I wondered about being stricter on using the submitted data (with a form to validate say) but I think that may be overkill. The formset.is_valid() call already raises is you mess with the request data in the obvious ways (skipping a form-0-id for example).

Are we sure the pk__in filter won't choke if passed non-numeric pks in a string form? What about uuids pk or even datetime that might require to be cleaned by the form?

String PKs work with the filter call no problem. I wasn't sure of an example for (e.g.) a datetime?

It might also be worth considering bounding the index based on the values provided in the management form?

I added the management form data to the test case, but wasn't sure if we needed to validate against it?

@timgraham
Copy link
Member

A test that uses CaptureQueriesContext might be able to verify the presence of the WHERE clause for filtering by pk.

@AdamDonna
Copy link
Contributor Author

@carltongibson I agree that formset.is_valid() should be enough to validate the forms, at least for the use cases i can think of.

I don't think we need to validate against the management form data. Would we add sanity to someone's workflows by validating against it?

I'm also not sure about date time examples choking the filter. I can't think of an appropriate time to use datetime as your PK. I suspect it's an edge case but I don't want to make assumption and break someone's build.

@charettes
Copy link
Member

I'm also not sure about date time examples choking the filter. I can't think of an appropriate time to use datetime as your PK. I suspect it's an edge case but I don't want to make assumption and break someone's build.

I agree that it's an edge case and could probably be revisited later if it really breaks someone's setup.

The reason I initially mentioned it is that I wanted to make sure that:

  1. Complex data types would be handled correctly; it looks like UUID is working fine so we should be good.
  2. Make sure that editable=True primary keys are working fine. I could be missing something but this patch will cause a crash if an invalid primary key is submitted (e.g. TypeError) instead of a validation error like it used to. I guess we could try cleaning the submitted pks and and disable the optimization if an exception is raised instead of crashing?

@carltongibson
Copy link
Member

carltongibson commented May 9, 2018

...could probably be revisited later if it really breaks someone's setup.

This is where I'll ask Tim 🙂. Elsewhere I'd be inclined say it's probably OK, ship it and then quickly handle any actual bugs that are revealed in a point release. I prefer that to trying to guess what the issues are in advance, where it's not really obvious. Can we do that here? (It's a Bug so it would go into 2.x, but if we targeted it for 2.1 there's the beta, so we could expect feedback...)

@AdamDonna: Can you please change the base branch to master on the PR, rebase, squash the commits and adjust the commit message to the required format? (If you need help let me know.)

@charettes
Copy link
Member

Well I'll defer to you and Tim but I think the editable primary key is a legitimate issue that should be fixed before shipping.

@carltongibson
Copy link
Member

carltongibson commented May 9, 2018

Ei! Sorry @charettes — I missed that was still pending. (Going blind by this stage of the afternoon 😵)

(My question was more for after that...)

@AdamDonna
Copy link
Contributor Author

@charettes I'm still a bit of a noob, could you elaborate on how to cause the TypeError case so I can solve this issue?

@carltongibson I'll get that sorted once the invalid primary key issue is resolved. Will this patch be back ported to 1.11.x users as well?

@carltongibson
Copy link
Member

Will this patch be back ported to 1.11.x users as well?

It would be nice. This is a pretty devastating bug for list_editable if you have any size of dataset. (That the concurrency issue isn't really addressed is all the worse.) Thus if we're confident enough then I'd be in favour of making the case the back porting as a special exception. (It's not security, so normally no.)

First, it comes down to that confident enough. Hence my query above: it would be nice to ask people to actually test it.

@charettes
Copy link
Member

charettes commented May 11, 2018

@AdamDonna,

I'm still a bit of a noob, could you elaborate on how to cause the TypeError case so I can solve this issue?

Sure! Given the following scenario:

class Book(models.Model):
    uuid = models.UUIDField(editable=True, primary_key=True)
    ...

class BookAdmin(ModelAdmin):
    list_editable = ['uuid', ...]

In this case it's possible an invalid uuid is submitted by a user. Right now against master this results in a ValidationError that is displayed to the user. With your patch it will result in a crash when trying to filter by an invalid uuid. Unfortunately, given the lazy nature of querysets I can't see how this can be fixed at the view level. I assume this will have to happen at the formset level somehow.

You can test it out by trying to pass an invalid value for form-0-uuid and notice how it will crash on database with a real UUID field (e.g. PostgreSQL but not SQLlite which uses a CharField). It's not necessary to define a new mode with an editable primary key, you can reuse the existing model and pass along an invalid pk and assert a validation error is correctly displayed in this case.

Also, your branch is based off stable/1.11.x. You might want to create a new one against master and the committer will take care of the backport if required.

@jarshwah
Copy link
Member

I think an argument can be made for a backport considering this is a crashing bug for large datasets (OutOfMemory).

(disclosure: I work with Adam who is pushing this patch through)

@AdamDonna AdamDonna changed the base branch from stable/1.11.x to master May 16, 2018 03:02
@AdamDonna
Copy link
Contributor Author

@charettes I've added the test to see the the type error that gets thrown.
I'm keen to hear possible solutions to this issue. I think I need some guidance before I go ahead and make changes to forms/formsets

@carltongibson
Copy link
Member

carltongibson commented May 16, 2018

I pushed c750d8a with adjustments for testing the ValidationError with a bad UUID string.

  • The SQLite CI failure is unrelated, which is annoying because...
  • A ValidationError is already raised by the .filter(pk__in=object_pks) with the bad UUID — as long as the QuerySet is evaluated.
    • This happens at the field level during evaluation (in to_python) so it already works on all DBs.
  • I pulled that into a helper method get_list_editable_queryset, because it isn't very pretty. But maybe it's enough?

To get the admin to actually let me use this I had to make the PK editable as well as provide list_display, list_editable and list_display_links (because it was the first item in the list_display). A small part of me thought the admin would have been well within its rights to tell me this was not supported. 🙂

modified_objects = self.get_queryset(request).filter(pk__in=object_pks)
try:
# Force evaluate queryset to verify that pks are valid
list(modified_objects)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The FormSet later needs modified_objects as a QuerySet, rather than a list. So is there a nicer way of forcing evaluation here?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess we could try calling the primary key's to_python instead of hitting the database here.

def get_list_editable_queryset(self, request, prefix):
    object_pks = self.get_edited_object_pks(request, prefix)
    queryset  = self.get_queryset(request)
    validate = queryset.model._meta.pk.to_python
    try:
        for pk in object_pks:
            validate(pk)
    except ValidationError:
        # Disable optimization
        return queryset
    return queryset.filter(pk__in=object_pks)

# Force evaluate queryset to verify that pks are valid
list(modified_objects)
except ValidationError:
raise ValidationError("Invalid Primary Key Provided")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Won't this also result in a 500?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yes. It no doubt will. (That's too much DRF that is. 🙂) We need to handle this. 👍

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@charettes' earlier point:

I guess we could try cleaning the submitted pks and and disable the optimization if an exception is raised instead of crashing?

If we here return self.get_queryset(request) we're no worse off than currently.

@AdamDonna Do you think you could put that in and adjust the test(s) for that?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll add that in and adjust the tests. I think this is a good compromise since it's no worse than it is currently, but most cases will be handled in a better way.

modified_objects = self.get_queryset(request).filter(pk__in=object_pks)
try:
# Force evaluate queryset to verify that pks are valid
list(modified_objects)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess we could try calling the primary key's to_python instead of hitting the database here.

def get_list_editable_queryset(self, request, prefix):
    object_pks = self.get_edited_object_pks(request, prefix)
    queryset  = self.get_queryset(request)
    validate = queryset.model._meta.pk.to_python
    try:
        for pk in object_pks:
            validate(pk)
    except ValidationError:
        # Disable optimization
        return queryset
    return queryset.filter(pk__in=object_pks)

@AdamDonna
Copy link
Contributor Author

@charettes I think not hitting the DB was a great idea with to_python so I went with that for validation. I've added tests for get_list_editable_queryset, validation error path and valid pk path

Copy link
Member

@charettes charettes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the hard work @AdamDonna and @carltongibson. All my concerns have been addressed 🎉 !

@carltongibson
Copy link
Member

Great. Thanks @AdamDonna for your effort, and thank you @charettes for your review!

I've marked this RFC. I'll wait for Tim's input on whether we can back port to 1.11 or not.

@AdamDonna
Copy link
Contributor Author

Hey @carltongibson and @charettes thanks for your help with this change. How'd the RFC go?

@timgraham
Copy link
Member

I haven't reviewed in detail but did you give any thought to my suggestion, "A test that uses CaptureQueriesContext might be able to verify the presence of the WHERE clause for filtering by pk."
While the new methods are tested, I don't see a test for the fact that changelist_view() uses them.

@AdamDonna
Copy link
Contributor Author

I think that's a good test to validate the usage in changelist_view(). I'll add that in soon.

Copy link
Member

@timgraham timgraham left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm thinking we could put this into tomorrow's 2.0.6 release and then 1.11.x in July.

@@ -1583,6 +1584,24 @@ def add_view(self, request, form_url='', extra_context=None):
def change_view(self, request, object_id, form_url='', extra_context=None):
return self.changeform_view(request, object_id, form_url, extra_context)

def get_edited_object_pks(self, request, prefix):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would organize these methods else (inside of in the middle of the views) and prefix them with an underscore to indicate that they're helpers, not public APIs.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So they should be defined inside changelist_view?

for pk in object_pks:
validate(pk)
except ValidationError:
# Disable optimization
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see the reasoning for disabling the optimization on invalid data. Wouldn't that allow malicious users (unlikely, I guess, if they already have access to the admin) to create memory issues?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason was that we’d end up with a 500 server error in this case, whereas now we get a validation error.

An alternative that we could use here is the old approach ‘cl.result_list’, which we know is sensibily limited to just one page.

Either that, or since it's invalid POST data, bail out here and report the error to the user.
(That's a little bit more work though; I haven't yet thought what that looks like.)

@timgraham timgraham changed the title Fixed #28462 -- ModelAdmin.list_editable memory intensive with large datasets Fixed #28462 -- Decreased memory usage with ModelAdmin.list_editable. Jun 1, 2018
@AdamDonna
Copy link
Contributor Author

@timgraham I've got one more test to add to verify the view list editable uses the where clause with CaptureQueriesContext as per your suggestion

@timgraham
Copy link
Member

Will that be ready within an hour? If not, I don't mind adding it later.

@AdamDonna
Copy link
Contributor Author

I've got it done now but the rebase is giving me grief because the it's well ahead of my local (I only noticed when attempting to push).

@timgraham
Copy link
Member

If you want to paste the test here, I'll add it the commit that I pushed.

@AdamDonna
Copy link
Contributor Author

AdamDonna commented Jun 1, 2018

from django.db import connection
from django.test.utils import CaptureQueriesContext

...

    def test_changelist_view_list_editable_changed_objects_uses_filter(self):
        a = Swallow.objects.create(origin='Swallow A', load=4, speed=1)
        Swallow.objects.create(origin='Swallow B', load=2, speed=2)
        data = {
            'form-TOTAL_FORMS': '2',
            'form-INITIAL_FORMS': '2',
            'form-MIN_NUM_FORMS': '0',
            'form-MAX_NUM_FORMS': '1000',
            'form-0-uuid': str(a.pk),
            'form-0-load': 10,
            '_save': 'Save',
        }
        superuser = self._create_superuser('superuser')
        self.client.force_login(superuser)
        changelist_url = reverse('admin:admin_changelist_swallow_changelist')
        with CaptureQueriesContext(connection) as context:
            response = self.client.post(changelist_url, data=data)
            self.assertEquals(response.status_code, 200)
            self.assertIn('WHERE', context.captured_queries[4]['sql'])
            self.assertIn('IN', context.captured_queries[4]['sql'])
            self.assertIn(str(a.pk).replace('-', ''), context.captured_queries[4]['sql'])

@timgraham
Copy link
Member

.replace('-', '') isn't going to work across all databases. For example, PostgreSQL includes the dashes. I'm not sure if there's a way to do the comparison that won't be brittle. Maybe checking for "WHERE" in the query is enough.

@AdamDonna
Copy link
Contributor Author

The 'where' and 'in' are the minimum sufficient IMO. UUID was a stretch goal, on the off chance that another query was executed before it and it's no longer the 4th it verifies the where is identified on the right query.
You solution of checking the start is a good compromise.

@timgraham timgraham merged commit b18650a into django:master Jun 1, 2018
@AdamDonna
Copy link
Contributor Author

Thanks for the help everyone 👍

@@ -11,6 +11,7 @@ answer newbie questions, and generally made Django that much better:
Abeer Upadhyay <ab.esquarer@gmail.com>
Abhishek Gautam <abhishekg1128@yahoo.com>
Adam Bogdał <adam@bogdal.pl>
Adam Donaghy
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👌

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
6 participants