Join GitHub today
GitHub is home to over 28 million developers working together to host and review code, manage projects, and build software together.Sign up
Fixed #16731 -- Made pattern lookups work properly with F() expressions #3284
referenced this pull request
Sep 30, 2014
Is there a significant performance difference with these new queries? I guess as they are applied to the lookup rather than the value of each row it should be ok, otherwise can we perhaps use a "simple" query when there is no
@mjtamlyn Actually those
In that case, we have no other option than escaping in the query itself as the rhs belongs to the database.
As explained in https://code.djangoproject.com/ticket/16731, those kind of lookups with expressions are currently either totally broken or leading to unexpected results depending on the lookup and the backend.
You can try that on master's version :
user = User.objects.create(first_name="John", last_name="%") User.objects.filter(first_name__contains=F('last_name'))
you will see that
If you run
Regarding the performance, there is an impact. I run some test on my laptop with a user database of ~140K entries :
dbase=# \timing Timing is on. dbase=# SELECT COUNT(*) FROM auth_user; count -------- 139936 (1 row) Time: 83,630 ms dbase=# SELECT COUNT(*) FROM auth_user WHERE first_name LIKE '%' || REGEXP_REPLACE(last_name, '(\\|%|_)', '\\\1', 'g') || '%'; count ------- 19199 (1 row) Time: 466,214 ms dbase=# SELECT COUNT(*) FROM auth_user WHERE first_name LIKE '%' || REPLACE(REPLACE(REPLACE(last_name, '\\', '\\\\'), '%', '\\%'), '_', '\_') || '%'; count ------- 19199 (1 row) Time: 402,391 ms dbase=# SELECT COUNT(*) FROM auth_user WHERE first_name LIKE '%' || last_name || '%'; count ------- 19199 (1 row) Time: 255,128 ms
In my opinion, the choices here are :
See also http://stackoverflow.com/questions/10153440/how-to-escape-string-while-matching-pattern-in-postgresql#comment13036581_10155313 which discusses this precise problem.
Ok, I had obviously slightly misunderstood the code path here. Assuming that there is no nicer way to make sure we get the correct results, then I'm ok with this approach.
It may be worth a note somewhere in the documentation that the queries generated by this pattern are very complex (and why) and as a result are not particularly fast.
It might be better to go with a wrapper expression instead of the repeated REPLACE(REPLACE(REPLACE())) calls in the pattern_ops. Basically the idea is to create a wrapper expression to the rhs value, then use the improved pattern_ops against that. This should clean up the pattern_ops dictionaries significantly.