New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fixed #28462 -- Decreased memory usage with ModelAdmin.list_editable. #9920
Conversation
django/contrib/admin/options.py
Outdated
@@ -1510,6 +1511,16 @@ def add_view(self, request, form_url='', extra_context=None): | |||
def change_view(self, request, object_id, form_url='', extra_context=None): | |||
return self.changeform_view(request, object_id, form_url, extra_context) | |||
|
|||
def get_edited_object_ids(self, request, prefix): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
get_edited_object_pks
django/contrib/admin/options.py
Outdated
def get_edited_object_ids(self, request, prefix): | ||
# Get the objects ids to filter the queryset to get only the objects that will be updated. | ||
# Matching on anything incase we come up against non-numeric fields | ||
regexp = re.compile('{prefix}-(\w+)-id$'.format(prefix=prefix)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This must take into account that the primary key can be named something these than id
? self.model._meta.pk.name
should do. Also \w
is too loose, you're not matching against the pk value but against the form index so \d+
should match. The capture group is also unnecessary.
It might also be worth considering bounding the index based on the values provided in the management form?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you elaborate on what advantage bounding the indexes based on the management form would provide?
django/contrib/admin/options.py
Outdated
for key, value in request.POST.items(): | ||
if regexp.match(key): | ||
object_ids.append(value) | ||
return object_ids |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could be reduced to a list comprehension
return [
value for key, value in request.POST.items() if regexp.match(key)
]
django/contrib/admin/options.py
Outdated
def get_edited_object_ids(self, request, prefix): | ||
# Get the objects ids to filter the queryset to get only the objects that will be updated. | ||
# Matching on anything incase we come up against non-numeric fields | ||
regexp = re.compile('{prefix}-(\w+)-id$'.format(prefix=prefix)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might want to give this a less generic name, pk_pattern
maybe?
django/contrib/admin/options.py
Outdated
@@ -1510,6 +1511,16 @@ def add_view(self, request, form_url='', extra_context=None): | |||
def change_view(self, request, object_id, form_url='', extra_context=None): | |||
return self.changeform_view(request, object_id, form_url, extra_context) | |||
|
|||
def get_edited_object_ids(self, request, prefix): | |||
# Get the objects ids to filter the queryset to get only the objects that will be updated. | |||
# Matching on anything incase we come up against non-numeric fields |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wrap comments at 79 chars, drop the comment about matching on anything, you're matching against the form index which will be numeric not the actual field value.
django/contrib/admin/options.py
Outdated
@@ -1601,7 +1612,9 @@ def changelist_view(self, request, extra_context=None): | |||
# Handle POSTed bulk-edit data. | |||
if request.method == 'POST' and cl.list_editable and '_save' in request.POST: | |||
FormSet = self.get_changelist_formset(request) | |||
formset = cl.formset = FormSet(request.POST, request.FILES, queryset=self.get_queryset(request)) | |||
object_ids = self.get_edited_object_ids(request, FormSet.get_default_prefix()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
object_pks
.
'form-1-speed': '5.0', | ||
'form-2-load': '5.0', | ||
'form-2-speed': '4.0', | ||
'_save': 'Save', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't we also have to pass along the management form data?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test doesn't need that data. However to simulate real usage it should be added.
django/contrib/admin/options.py
Outdated
formset = cl.formset = FormSet(request.POST, request.FILES, queryset=self.get_queryset(request)) | ||
object_ids = self.get_edited_object_ids(request, FormSet.get_default_prefix()) | ||
formset = cl.formset = FormSet(request.POST, request.FILES, | ||
queryset=self.get_queryset(request).filter(pk__in=object_ids)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we sure the pk__in
filter won't choke if passed non-numeric pks in a string form? What about uuid
s pk or even datetime
that might require to be cleaned by the form?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll update the tests to have uuids as their primary keys which should provide some certainty
tests/admin_changelist/tests.py
Outdated
} | ||
request = self.factory.post(changelist_url, data=data) | ||
ids = m.get_edited_object_ids(request, 'form') | ||
self.assertItemsEqual(ids, [str(a.pk), str(b.pk), str(c.pk)]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
assertEqual
?
4185697
to
9c1c6b2
Compare
This looks pretty good to me. (Need to change the base branch, rebase, squash and adjust the commit message, but we can do all that.) The general strategy seems fine. (There's no other way of getting the PKs from the request.) I wondered about being stricter on using the submitted data (with a form to validate say) but I think that may be overkill. The
String PKs work with the
I added the management form data to the test case, but wasn't sure if we needed to validate against it? |
A test that uses |
@carltongibson I agree that I don't think we need to validate against the management form data. Would we add sanity to someone's workflows by validating against it? I'm also not sure about date time examples choking the filter. I can't think of an appropriate time to use datetime as your PK. I suspect it's an edge case but I don't want to make assumption and break someone's build. |
I agree that it's an edge case and could probably be revisited later if it really breaks someone's setup. The reason I initially mentioned it is that I wanted to make sure that:
|
This is where I'll ask Tim 🙂. Elsewhere I'd be inclined say it's probably OK, ship it and then quickly handle any actual bugs that are revealed in a point release. I prefer that to trying to guess what the issues are in advance, where it's not really obvious. Can we do that here? (It's a Bug so it would go into 2.x, but if we targeted it for 2.1 there's the beta, so we could expect feedback...) @AdamDonna: Can you please change the base branch to |
Well I'll defer to you and Tim but I think the |
Ei! Sorry @charettes — I missed that was still pending. (Going blind by this stage of the afternoon 😵) (My question was more for after that...) |
@charettes I'm still a bit of a noob, could you elaborate on how to cause the @carltongibson I'll get that sorted once the invalid primary key issue is resolved. Will this patch be back ported to 1.11.x users as well? |
It would be nice. This is a pretty devastating bug for First, it comes down to that confident enough. Hence my query above: it would be nice to ask people to actually test it. |
Sure! Given the following scenario: class Book(models.Model):
uuid = models.UUIDField(editable=True, primary_key=True)
...
class BookAdmin(ModelAdmin):
list_editable = ['uuid', ...] In this case it's possible an invalid You can test it out by trying to pass an invalid value for Also, your branch is based off |
I think an argument can be made for a backport considering this is a crashing bug for large datasets (OutOfMemory). (disclosure: I work with Adam who is pushing this patch through) |
@charettes I've added the test to see the the type error that gets thrown. |
83cba6b
to
c750d8a
Compare
I pushed c750d8a with adjustments for testing the ValidationError with a bad UUID string.
To get the admin to actually let me use this I had to make the PK editable as well as provide |
django/contrib/admin/options.py
Outdated
modified_objects = self.get_queryset(request).filter(pk__in=object_pks) | ||
try: | ||
# Force evaluate queryset to verify that pks are valid | ||
list(modified_objects) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The FormSet later needs modified_objects
as a QuerySet, rather than a list. So is there a nicer way of forcing evaluation here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess we could try calling the primary key's to_python
instead of hitting the database here.
def get_list_editable_queryset(self, request, prefix):
object_pks = self.get_edited_object_pks(request, prefix)
queryset = self.get_queryset(request)
validate = queryset.model._meta.pk.to_python
try:
for pk in object_pks:
validate(pk)
except ValidationError:
# Disable optimization
return queryset
return queryset.filter(pk__in=object_pks)
django/contrib/admin/options.py
Outdated
# Force evaluate queryset to verify that pks are valid | ||
list(modified_objects) | ||
except ValidationError: | ||
raise ValidationError("Invalid Primary Key Provided") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Won't this also result in a 500?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah yes. It no doubt will. (That's too much DRF that is. 🙂) We need to handle this. 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@charettes' earlier point:
I guess we could try cleaning the submitted pks and and disable the optimization if an exception is raised instead of crashing?
If we here return self.get_queryset(request)
we're no worse off than currently.
@AdamDonna Do you think you could put that in and adjust the test(s) for that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll add that in and adjust the tests. I think this is a good compromise since it's no worse than it is currently, but most cases will be handled in a better way.
django/contrib/admin/options.py
Outdated
modified_objects = self.get_queryset(request).filter(pk__in=object_pks) | ||
try: | ||
# Force evaluate queryset to verify that pks are valid | ||
list(modified_objects) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess we could try calling the primary key's to_python
instead of hitting the database here.
def get_list_editable_queryset(self, request, prefix):
object_pks = self.get_edited_object_pks(request, prefix)
queryset = self.get_queryset(request)
validate = queryset.model._meta.pk.to_python
try:
for pk in object_pks:
validate(pk)
except ValidationError:
# Disable optimization
return queryset
return queryset.filter(pk__in=object_pks)
@charettes I think not hitting the DB was a great idea with |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the hard work @AdamDonna and @carltongibson. All my concerns have been addressed 🎉 !
Great. Thanks @AdamDonna for your effort, and thank you @charettes for your review! I've marked this RFC. I'll wait for Tim's input on whether we can back port to 1.11 or not. |
Hey @carltongibson and @charettes thanks for your help with this change. How'd the RFC go? |
I haven't reviewed in detail but did you give any thought to my suggestion, "A test that uses CaptureQueriesContext might be able to verify the presence of the WHERE clause for filtering by pk." |
I think that's a good test to validate the usage in |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm thinking we could put this into tomorrow's 2.0.6 release and then 1.11.x in July.
django/contrib/admin/options.py
Outdated
@@ -1583,6 +1584,24 @@ def add_view(self, request, form_url='', extra_context=None): | |||
def change_view(self, request, object_id, form_url='', extra_context=None): | |||
return self.changeform_view(request, object_id, form_url, extra_context) | |||
|
|||
def get_edited_object_pks(self, request, prefix): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would organize these methods else (inside of in the middle of the views) and prefix them with an underscore to indicate that they're helpers, not public APIs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So they should be defined inside changelist_view
?
django/contrib/admin/options.py
Outdated
for pk in object_pks: | ||
validate(pk) | ||
except ValidationError: | ||
# Disable optimization |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see the reasoning for disabling the optimization on invalid data. Wouldn't that allow malicious users (unlikely, I guess, if they already have access to the admin) to create memory issues?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reason was that we’d end up with a 500 server error in this case, whereas now we get a validation error.
An alternative that we could use here is the old approach ‘cl.result_list’, which we know is sensibily limited to just one page.
Either that, or since it's invalid POST data, bail out here and report the error to the user.
(That's a little bit more work though; I haven't yet thought what that looks like.)
@timgraham I've got one more test to add to verify the view list editable uses the where clause with CaptureQueriesContext as per your suggestion |
Will that be ready within an hour? If not, I don't mind adding it later. |
I've got it done now but the rebase is giving me grief because the it's well ahead of my local (I only noticed when attempting to push). |
If you want to paste the test here, I'll add it the commit that I pushed. |
|
|
The 'where' and 'in' are the minimum sufficient IMO. UUID was a stretch goal, on the off chance that another query was executed before it and it's no longer the 4th it verifies the where is identified on the right query. |
Thanks for the help everyone 👍 |
@@ -11,6 +11,7 @@ answer newbie questions, and generally made Django that much better: | |||
Abeer Upadhyay <ab.esquarer@gmail.com> | |||
Abhishek Gautam <abhishekg1128@yahoo.com> | |||
Adam Bogdał <adam@bogdal.pl> | |||
Adam Donaghy |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👌
New PR related to #9820 (comment)
Based on the bug https://code.djangoproject.com/ticket/28462