Skip to content

Loading…

Admin: optimize search filter construction #881

Closed
wants to merge 2 commits into from

3 participants

@acdha

Using the database this way is inherently slow but the previous implementation
would call QuerySet.filter() once per whitespace-separated term in the query
text, causing an extra set of JOINs for each term. With the MySQL backend,
this actually causes database errors once you hit the server limit of 61.

This commit makes two changes: the first is simply to filter the queryset only
once; the second is to only process unique terms in the search field.

@charettes
Django member

I agree that the multiple JOINs are not desirable but I don't think that's the right approach.

Here you changed the query constraints from a set of len(bits) ANDed groups of len(orm_lookups) ORs to a set (len(bits) * len(orm_lookups)) ORs.

I think the correct fix would be to collect the OR expressions created in the loop and reduce them using operator.and_ to issue a single filter call.

or_queries = []
for bit in set(self.query.split()):
    or_queries.extend(reduce(operator.or_,
        (models.Q(**{lookup: bit}) for lookup in orm_lookups)
    ))
qs = qs.filter(reduce(operator.and_, or_queries))

Since this will require testing and thus is not a trivial patch could you open a trac ticket and link to this PR.

Thanks for your report.

@acdha

I had forgotten that the search implementation is explicitly documented as an AND search. I'll update my diff. The tests actually passed with the old implementation so it would make sense to have a test to confirm that it works as documented.

@charettes
Django member

This looks like the correct place to added tests.

@timgraham
Django member

Is there a trac ticket for this?

@timgraham
Django member

@acdha - are you interested in adding a test so we can commit this?

@acdha

@timgraham Sure: I'll update my branch and update this ticket.

@acdha

Just on the off chance that this is useful for posterity, the change against 1.5 is here:

https://github.com/acdha/django/tree/admin-search-optimization-1.5

I'm adding tests after updating this branch against 1.6.

@charettes
Django member

@acdha you should rebase this branch against master instead. This won't be merged in 1.6.

@acdha

@charettes My mistake - I was actually working off of master but wrote 1.6 above

acdha added some commits
@acdha acdha Admin: optimize search filter construction
Using the database this way is inherently slow but the previous implementation would call QuerySet.filter() once per whitespace-separated term in the query text, causing an extra set of JOINs for each term. With the MySQL backend, this actually causes database errors once you hit the server limit of 61.

This commit logically ANDs each of the query components together so
`queryset.filter()` can be called a single time.
d4b88f0
@acdha acdha Tests: confirm that admin changelist search terms are ANDed
Performance work on #881 inadvertently demonstrated that there wasn't a
test for the documented behaviour of admin changelist terms being
evaluated as logical ANDs
76ea565
@acdha

@charettes With the updated code, I've confirmed that the behaviour without the patch was still to generate a full JOIN for each term. With the patch, it generates one JOIN.

@charettes
Django member

@acdha tests are looking good! Could you please create a ticket in Trac and referece this PR from there?

@charettes charettes commented on the diff
django/contrib/admin/options.py
((6 lines not shown))
for bit in search_term.split():
- or_queries = [models.Q(**{orm_lookup: bit})
- for orm_lookup in orm_lookups]
- queryset = queryset.filter(reduce(operator.or_, or_queries))
+ or_queries = (models.Q(**{orm_lookup: bit})
+ for orm_lookup in orm_lookups)
+ query_parts.append(reduce(operator.or_, or_queries))
@charettes Django member

Import reduce from functools for py3 support.

@charettes Django member

Sorry, you're completely right!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
@charettes
Django member

Merged in 698dd82, thanks!

@charettes charettes closed this
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Commits on Oct 7, 2013
  1. @acdha

    Admin: optimize search filter construction

    acdha committed
    Using the database this way is inherently slow but the previous implementation would call QuerySet.filter() once per whitespace-separated term in the query text, causing an extra set of JOINs for each term. With the MySQL backend, this actually causes database errors once you hit the server limit of 61.
    
    This commit logically ANDs each of the query components together so
    `queryset.filter()` can be called a single time.
  2. @acdha

    Tests: confirm that admin changelist search terms are ANDed

    acdha committed
    Performance work on #881 inadvertently demonstrated that there wasn't a
    test for the documented behaviour of admin changelist terms being
    evaluated as logical ANDs
Showing with 57 additions and 3 deletions.
  1. +7 −3 django/contrib/admin/options.py
  2. +50 −0 tests/admin_changelist/tests.py
View
10 django/contrib/admin/options.py
@@ -849,10 +849,14 @@ def construct_search(field_name):
if search_fields and search_term:
orm_lookups = [construct_search(str(search_field))
for search_field in search_fields]
+
+ query_parts = []
for bit in search_term.split():
- or_queries = [models.Q(**{orm_lookup: bit})
- for orm_lookup in orm_lookups]
- queryset = queryset.filter(reduce(operator.or_, or_queries))
+ or_queries = (models.Q(**{orm_lookup: bit})
+ for orm_lookup in orm_lookups)
+ query_parts.append(reduce(operator.or_, or_queries))
@charettes Django member

Import reduce from functools for py3 support.

@charettes Django member

Sorry, you're completely right!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
+ queryset = queryset.filter(reduce(operator.and_, query_parts))
+
if not use_distinct:
for search_spec in orm_lookups:
if lookup_needs_distinct(self.opts, search_spec):
View
50 tests/admin_changelist/tests.py
@@ -643,6 +643,56 @@ def test_pagination_page_range(self):
list(real_page_range),
)
+ def test_search_query_efficiency(self):
+ """Ensure that search queries only add one ORM filter rather than one per term"""
+ new_parent = Parent.objects.create(name='parent')
+ for i in range(200):
+ Child.objects.create(name='foo bar baz qux quux corge %s' % i,
+ parent=new_parent)
+
+ m = ParentAdmin(Parent, admin.site)
+
+ request = self.factory.get('/parent/', data={'q': 'foo bar baz'})
+
+ cl = ChangeList(request, Parent, m.list_display, m.list_display_links,
+ m.list_filter, m.date_hierarchy, m.search_fields,
+ m.list_select_related, m.list_per_page,
+ m.list_max_show_all, m.list_editable, m)
+
+ self.assertEqual(2, cl.queryset.query.count_active_tables(),
+ "ChangeList search filters should not cause duplicate JOINs")
+
+ def test_search_query_logic(self):
+ """Changelist search terms should be ANDed"""
+
+ parent1 = Parent.objects.create(name='parent 1')
+ parent2 = Parent.objects.create(name='parent 2')
+
+ Child.objects.create(name='foo bar baz', parent=parent1)
+ Child.objects.create(name='bar baz qux', parent=parent2)
+
+ m = ParentAdmin(Parent, admin.site)
+
+ request = self.factory.get('/parent/', data={'q': 'foo bar baz'})
+
+ cl = ChangeList(request, Parent, m.list_display, m.list_display_links,
+ m.list_filter, m.date_hierarchy, m.search_fields,
+ m.list_select_related, m.list_per_page,
+ m.list_max_show_all, m.list_editable, m)
+
+ cl.get_results(request)
+ self.assertListEqual(["parent 1"], list(cl.queryset.values_list("name", flat=True)))
+
+
+ request2 = self.factory.get('/parent/', data={'q': 'bar baz'})
+ cl2 = ChangeList(request2, Parent, m.list_display, m.list_display_links,
+ m.list_filter, m.date_hierarchy, m.search_fields,
+ m.list_select_related, m.list_per_page,
+ m.list_max_show_all, m.list_editable, m)
+ cl2.get_results(request2)
+ self.assertListEqual(['parent 1', 'parent 2'],
+ list(cl2.queryset.order_by("name").values_list("name", flat=True)))
+
class AdminLogNodeTestCase(TestCase):
Something went wrong with that request. Please try again.