Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with
or
.
Download ZIP

Loading…

Admin: optimize search filter construction #881

Closed
wants to merge 2 commits into from

3 participants

@acdha

Using the database this way is inherently slow but the previous implementation
would call QuerySet.filter() once per whitespace-separated term in the query
text, causing an extra set of JOINs for each term. With the MySQL backend,
this actually causes database errors once you hit the server limit of 61.

This commit makes two changes: the first is simply to filter the queryset only
once; the second is to only process unique terms in the search field.

@charettes
Collaborator

I agree that the multiple JOINs are not desirable but I don't think that's the right approach.

Here you changed the query constraints from a set of len(bits) ANDed groups of len(orm_lookups) ORs to a set (len(bits) * len(orm_lookups)) ORs.

I think the correct fix would be to collect the OR expressions created in the loop and reduce them using operator.and_ to issue a single filter call.

or_queries = []
for bit in set(self.query.split()):
    or_queries.extend(reduce(operator.or_,
        (models.Q(**{lookup: bit}) for lookup in orm_lookups)
    ))
qs = qs.filter(reduce(operator.and_, or_queries))

Since this will require testing and thus is not a trivial patch could you open a trac ticket and link to this PR.

Thanks for your report.

@acdha

I had forgotten that the search implementation is explicitly documented as an AND search. I'll update my diff. The tests actually passed with the old implementation so it would make sense to have a test to confirm that it works as documented.

@charettes
Collaborator

This looks like the correct place to added tests.

@timgraham
Owner

Is there a trac ticket for this?

@timgraham
Owner

@acdha - are you interested in adding a test so we can commit this?

@acdha

@timgraham Sure: I'll update my branch and update this ticket.

@acdha

Just on the off chance that this is useful for posterity, the change against 1.5 is here:

https://github.com/acdha/django/tree/admin-search-optimization-1.5

I'm adding tests after updating this branch against 1.6.

@charettes
Collaborator

@acdha you should rebase this branch against master instead. This won't be merged in 1.6.

@acdha

@charettes My mistake - I was actually working off of master but wrote 1.6 above

acdha added some commits
@acdha acdha Admin: optimize search filter construction
Using the database this way is inherently slow but the previous implementation would call QuerySet.filter() once per whitespace-separated term in the query text, causing an extra set of JOINs for each term. With the MySQL backend, this actually causes database errors once you hit the server limit of 61.

This commit logically ANDs each of the query components together so
`queryset.filter()` can be called a single time.
d4b88f0
@acdha acdha Tests: confirm that admin changelist search terms are ANDed
Performance work on #881 inadvertently demonstrated that there wasn't a
test for the documented behaviour of admin changelist terms being
evaluated as logical ANDs
76ea565
@acdha

@charettes With the updated code, I've confirmed that the behaviour without the patch was still to generate a full JOIN for each term. With the patch, it generates one JOIN.

@charettes
Collaborator

@acdha tests are looking good! Could you please create a ticket in Trac and referece this PR from there?

@charettes charettes commented on the diff
django/contrib/admin/options.py
((6 lines not shown))
for bit in search_term.split():
- or_queries = [models.Q(**{orm_lookup: bit})
- for orm_lookup in orm_lookups]
- queryset = queryset.filter(reduce(operator.or_, or_queries))
+ or_queries = (models.Q(**{orm_lookup: bit})
+ for orm_lookup in orm_lookups)
+ query_parts.append(reduce(operator.or_, or_queries))
@charettes Collaborator

Import reduce from functools for py3 support.

@charettes Collaborator

Sorry, you're completely right!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
@charettes
Collaborator

Merged in 698dd82, thanks!

@charettes charettes closed this
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Commits on Oct 7, 2013
  1. @acdha

    Admin: optimize search filter construction

    acdha authored
    Using the database this way is inherently slow but the previous implementation would call QuerySet.filter() once per whitespace-separated term in the query text, causing an extra set of JOINs for each term. With the MySQL backend, this actually causes database errors once you hit the server limit of 61.
    
    This commit logically ANDs each of the query components together so
    `queryset.filter()` can be called a single time.
  2. @acdha

    Tests: confirm that admin changelist search terms are ANDed

    acdha authored
    Performance work on #881 inadvertently demonstrated that there wasn't a
    test for the documented behaviour of admin changelist terms being
    evaluated as logical ANDs
This page is out of date. Refresh to see the latest.
Showing with 57 additions and 3 deletions.
  1. +7 −3 django/contrib/admin/options.py
  2. +50 −0 tests/admin_changelist/tests.py
View
10 django/contrib/admin/options.py
@@ -849,10 +849,14 @@ def construct_search(field_name):
if search_fields and search_term:
orm_lookups = [construct_search(str(search_field))
for search_field in search_fields]
+
+ query_parts = []
for bit in search_term.split():
- or_queries = [models.Q(**{orm_lookup: bit})
- for orm_lookup in orm_lookups]
- queryset = queryset.filter(reduce(operator.or_, or_queries))
+ or_queries = (models.Q(**{orm_lookup: bit})
+ for orm_lookup in orm_lookups)
+ query_parts.append(reduce(operator.or_, or_queries))
@charettes Collaborator

Import reduce from functools for py3 support.

@charettes Collaborator

Sorry, you're completely right!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
+ queryset = queryset.filter(reduce(operator.and_, query_parts))
+
if not use_distinct:
for search_spec in orm_lookups:
if lookup_needs_distinct(self.opts, search_spec):
View
50 tests/admin_changelist/tests.py
@@ -643,6 +643,56 @@ def test_pagination_page_range(self):
list(real_page_range),
)
+ def test_search_query_efficiency(self):
+ """Ensure that search queries only add one ORM filter rather than one per term"""
+ new_parent = Parent.objects.create(name='parent')
+ for i in range(200):
+ Child.objects.create(name='foo bar baz qux quux corge %s' % i,
+ parent=new_parent)
+
+ m = ParentAdmin(Parent, admin.site)
+
+ request = self.factory.get('/parent/', data={'q': 'foo bar baz'})
+
+ cl = ChangeList(request, Parent, m.list_display, m.list_display_links,
+ m.list_filter, m.date_hierarchy, m.search_fields,
+ m.list_select_related, m.list_per_page,
+ m.list_max_show_all, m.list_editable, m)
+
+ self.assertEqual(2, cl.queryset.query.count_active_tables(),
+ "ChangeList search filters should not cause duplicate JOINs")
+
+ def test_search_query_logic(self):
+ """Changelist search terms should be ANDed"""
+
+ parent1 = Parent.objects.create(name='parent 1')
+ parent2 = Parent.objects.create(name='parent 2')
+
+ Child.objects.create(name='foo bar baz', parent=parent1)
+ Child.objects.create(name='bar baz qux', parent=parent2)
+
+ m = ParentAdmin(Parent, admin.site)
+
+ request = self.factory.get('/parent/', data={'q': 'foo bar baz'})
+
+ cl = ChangeList(request, Parent, m.list_display, m.list_display_links,
+ m.list_filter, m.date_hierarchy, m.search_fields,
+ m.list_select_related, m.list_per_page,
+ m.list_max_show_all, m.list_editable, m)
+
+ cl.get_results(request)
+ self.assertListEqual(["parent 1"], list(cl.queryset.values_list("name", flat=True)))
+
+
+ request2 = self.factory.get('/parent/', data={'q': 'bar baz'})
+ cl2 = ChangeList(request2, Parent, m.list_display, m.list_display_links,
+ m.list_filter, m.date_hierarchy, m.search_fields,
+ m.list_select_related, m.list_per_page,
+ m.list_max_show_all, m.list_editable, m)
+ cl2.get_results(request2)
+ self.assertListEqual(['parent 1', 'parent 2'],
+ list(cl2.queryset.order_by("name").values_list("name", flat=True)))
+
class AdminLogNodeTestCase(TestCase):
Something went wrong with that request. Please try again.