New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixed #16731 -- Made pattern lookups work properly with F() expressions #3284

Closed
wants to merge 1 commit into
base: master
from

Conversation

Projects
None yet
6 participants
@tchaumeny
Contributor

tchaumeny commented Sep 27, 2014

@akaariai

View changes

Show outdated Hide outdated django/db/backends/mysql/base.py Outdated
@kevinetienne

View changes

Show outdated Hide outdated tests/expressions/tests.py Outdated
@mjtamlyn

This comment has been minimized.

Show comment
Hide comment
@mjtamlyn

mjtamlyn Oct 5, 2014

Member

Is there a significant performance difference with these new queries? I guess as they are applied to the lookup rather than the value of each row it should be ok, otherwise can we perhaps use a "simple" query when there is no _, %, \, * etc, and the more complex one when there is?

Member

mjtamlyn commented Oct 5, 2014

Is there a significant performance difference with these new queries? I guess as they are applied to the lookup rather than the value of each row it should be ok, otherwise can we perhaps use a "simple" query when there is no _, %, \, * etc, and the more complex one when there is?

@tchaumeny

This comment has been minimized.

Show comment
Hide comment
@tchaumeny

tchaumeny Oct 5, 2014

Contributor

@mjtamlyn Actually those pattern_ops are not used when a raw Python value is supplied. They are only used when the right-hand side of the lookup has a as_sql method, in which case it must be a F() expression (or a transformation of the right-hand side following my other PR) — see https://github.com/django/django/pull/3284/files#diff-b6b218ec29b7fb6a7d89868a94bfc73eR247

In that case, we have no other option than escaping in the query itself as the rhs belongs to the database.

As explained in https://code.djangoproject.com/ticket/16731, those kind of lookups with expressions are currently either totally broken or leading to unexpected results depending on the lookup and the backend.

You can try that on master's version :
If you run

user = User.objects.create(first_name="John", last_name="%")
User.objects.filter(first_name__contains=F('last_name'))

you will see that user appears in the results (on any backend).

If you run User.objects.filter(first_name__startswith=F('last_name') with MySQL backend, you will get an AttributeError: 'DatabaseWrapper' object has no attribute 'pattern_ops'.

Contributor

tchaumeny commented Oct 5, 2014

@mjtamlyn Actually those pattern_ops are not used when a raw Python value is supplied. They are only used when the right-hand side of the lookup has a as_sql method, in which case it must be a F() expression (or a transformation of the right-hand side following my other PR) — see https://github.com/django/django/pull/3284/files#diff-b6b218ec29b7fb6a7d89868a94bfc73eR247

In that case, we have no other option than escaping in the query itself as the rhs belongs to the database.

As explained in https://code.djangoproject.com/ticket/16731, those kind of lookups with expressions are currently either totally broken or leading to unexpected results depending on the lookup and the backend.

You can try that on master's version :
If you run

user = User.objects.create(first_name="John", last_name="%")
User.objects.filter(first_name__contains=F('last_name'))

you will see that user appears in the results (on any backend).

If you run User.objects.filter(first_name__startswith=F('last_name') with MySQL backend, you will get an AttributeError: 'DatabaseWrapper' object has no attribute 'pattern_ops'.

@tchaumeny

This comment has been minimized.

Show comment
Hide comment
@tchaumeny

tchaumeny Oct 6, 2014

Contributor

Regarding the performance, there is an impact. I run some test on my laptop with a user database of ~140K entries :

dbase=# \timing
Timing is on.

dbase=# SELECT COUNT(*) FROM auth_user;
 count
--------
 139936
(1 row)

Time: 83,630 ms

dbase=# SELECT COUNT(*) FROM auth_user WHERE first_name LIKE '%' || REGEXP_REPLACE(last_name, '(\\|%|_)', '\\\1', 'g') || '%';
 count
-------
 19199
(1 row)

Time: 466,214 ms
dbase=# SELECT COUNT(*) FROM auth_user WHERE first_name LIKE '%' || REPLACE(REPLACE(REPLACE(last_name, '\\', '\\\\'), '%', '\\%'), '_', '\_') || '%';
 count
-------
 19199
(1 row)

Time: 402,391 ms

dbase=# SELECT COUNT(*) FROM auth_user WHERE first_name LIKE '%' || last_name || '%';
 count
-------
 19199
(1 row)

Time: 255,128 ms

In my opinion, the choices here are :

  • Do not support pattern lookups on a F() expressions
  • Support them without escaping the wildcards, wich might lead to inconsistent results
  • Support them and escape the wildcards (with an approximate +50% execution time for pattern lookups using F() expressions)

See also http://stackoverflow.com/questions/10153440/how-to-escape-string-while-matching-pattern-in-postgresql#comment13036581_10155313 which discusses this precise problem.

Contributor

tchaumeny commented Oct 6, 2014

Regarding the performance, there is an impact. I run some test on my laptop with a user database of ~140K entries :

dbase=# \timing
Timing is on.

dbase=# SELECT COUNT(*) FROM auth_user;
 count
--------
 139936
(1 row)

Time: 83,630 ms

dbase=# SELECT COUNT(*) FROM auth_user WHERE first_name LIKE '%' || REGEXP_REPLACE(last_name, '(\\|%|_)', '\\\1', 'g') || '%';
 count
-------
 19199
(1 row)

Time: 466,214 ms
dbase=# SELECT COUNT(*) FROM auth_user WHERE first_name LIKE '%' || REPLACE(REPLACE(REPLACE(last_name, '\\', '\\\\'), '%', '\\%'), '_', '\_') || '%';
 count
-------
 19199
(1 row)

Time: 402,391 ms

dbase=# SELECT COUNT(*) FROM auth_user WHERE first_name LIKE '%' || last_name || '%';
 count
-------
 19199
(1 row)

Time: 255,128 ms

In my opinion, the choices here are :

  • Do not support pattern lookups on a F() expressions
  • Support them without escaping the wildcards, wich might lead to inconsistent results
  • Support them and escape the wildcards (with an approximate +50% execution time for pattern lookups using F() expressions)

See also http://stackoverflow.com/questions/10153440/how-to-escape-string-while-matching-pattern-in-postgresql#comment13036581_10155313 which discusses this precise problem.

@mjtamlyn

This comment has been minimized.

Show comment
Hide comment
@mjtamlyn

mjtamlyn Oct 6, 2014

Member

Ok, I had obviously slightly misunderstood the code path here. Assuming that there is no nicer way to make sure we get the correct results, then I'm ok with this approach.

It may be worth a note somewhere in the documentation that the queries generated by this pattern are very complex (and why) and as a result are not particularly fast.

Member

mjtamlyn commented Oct 6, 2014

Ok, I had obviously slightly misunderstood the code path here. Assuming that there is no nicer way to make sure we get the correct results, then I'm ok with this approach.

It may be worth a note somewhere in the documentation that the queries generated by this pattern are very complex (and why) and as a result are not particularly fast.

@tchaumeny

This comment has been minimized.

Show comment
Hide comment
@tchaumeny

tchaumeny Oct 7, 2014

Contributor

I mentioned that change in the 1.8 release note.

Contributor

tchaumeny commented Oct 7, 2014

I mentioned that change in the 1.8 release note.

@timgraham

View changes

Show outdated Hide outdated tests/expressions/tests.py Outdated
@timgraham

View changes

Show outdated Hide outdated django/db/backends/mysql/base.py Outdated
@timgraham

View changes

Show outdated Hide outdated docs/releases/1.8.txt Outdated
@timgraham

View changes

Show outdated Hide outdated docs/releases/1.8.txt Outdated
@akaariai

This comment has been minimized.

Show comment
Hide comment
@akaariai

akaariai Nov 12, 2014

Member

It might be better to go with a wrapper expression instead of the repeated REPLACE(REPLACE(REPLACE())) calls in the pattern_ops. Basically the idea is to create a wrapper expression to the rhs value, then use the improved pattern_ops against that. This should clean up the pattern_ops dictionaries significantly.

Member

akaariai commented Nov 12, 2014

It might be better to go with a wrapper expression instead of the repeated REPLACE(REPLACE(REPLACE())) calls in the pattern_ops. Basically the idea is to create a wrapper expression to the rhs value, then use the improved pattern_ops against that. This should clean up the pattern_ops dictionaries significantly.

@tchaumeny

This comment has been minimized.

Show comment
Hide comment
@tchaumeny

tchaumeny Nov 12, 2014

Contributor

@akaariai Right. I updated the PR to do something like what you describe, it's much simpler indeed. Let me know what you think.

Contributor

tchaumeny commented Nov 12, 2014

@akaariai Right. I updated the PR to do something like what you describe, it's much simpler indeed. Let me know what you think.

@collinanderson

This comment has been minimized.

Show comment
Hide comment
@collinanderson

collinanderson Nov 13, 2014

Contributor

buildbot, retest this please

Contributor

collinanderson commented Nov 13, 2014

buildbot, retest this please

# escaped on database side.
#
# Note: we use str.format() here for readability as '%' is used as a wildcard for
# the LIKE operator.

This comment has been minimized.

@akaariai

akaariai Nov 13, 2014

Member

Add a small comment here explaining why we have two different versions of the ops and where the real pattern_ops is initialized.

@akaariai

akaariai Nov 13, 2014

Member

Add a small comment here explaining why we have two different versions of the ops and where the real pattern_ops is initialized.

This comment has been minimized.

@tchaumeny

tchaumeny Nov 13, 2014

Contributor

It's already explained here

# Ticket #14149: Check whether our LIKE implementation will
, I just reused the existing mechanism to initialize pattern_ops too.

@tchaumeny

tchaumeny Nov 13, 2014

Contributor

It's already explained here

# Ticket #14149: Check whether our LIKE implementation will
, I just reused the existing mechanism to initialize pattern_ops too.

@akaariai

View changes

Show outdated Hide outdated django/db/backends/oracle/base.py Outdated
@collinanderson

This comment has been minimized.

Show comment
Hide comment
@collinanderson

collinanderson Nov 13, 2014

Contributor

buildbot, retest this please

Contributor

collinanderson commented Nov 13, 2014

buildbot, retest this please

@akaariai

This comment has been minimized.

Show comment
Hide comment
@akaariai

akaariai Nov 28, 2014

Member

I'll take care of committing this one.

Member

akaariai commented Nov 28, 2014

I'll take care of committing this one.

@akaariai

This comment has been minimized.

Show comment
Hide comment
@akaariai

akaariai Nov 28, 2014

Member

Committed manually in 6b5d827

Member

akaariai commented Nov 28, 2014

Committed manually in 6b5d827

@akaariai akaariai closed this Nov 28, 2014

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment