Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixed #29251 -- Added CharField converter to the MySQL backend. #9816

Closed
wants to merge 1 commit into from

Conversation

felixxm
Copy link
Member

@felixxm felixxm commented Mar 22, 2018

https://code.djangoproject.com/ticket/29251

I'm not convinced that this is the best solution, but I haven't found a better way to solve this issue.

def convert_charfield_value(self, value, expression, connection):
if isinstance(value, bytes):
return force_text(value)
return value
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess this does follow other existing methods, but:

return force_text(value) if isinstance(value, bytes) else value

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. I updated this line.

Copy link
Member

@timgraham timgraham left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is force_text() needed as opposed to value.decode()? If so, can you add a test to demonstrate?

def test_funct_string_combination(self):
Author.objects.create(name='Rhonda', alias='john_smith')
authors = Author.objects.annotate(filled=LPad('name', Length('alias'), output_field=CharField()))
self.assertQuerysetEqual(authors, [' Rhonda'], lambda a: a.filled, ordered=False)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder why ordered=False with a result of length 1.

@@ -671,3 +671,8 @@ def test_function_as_filter(self):
Author.objects.exclude(alias=Upper(V('smithj'))),
['Rhonda'], lambda x: x.name
)

def test_funct_string_combination(self):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

funct -> func -- I think? or "function"

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo 🤦‍♂️ Thanks.

@felixxm
Copy link
Member Author

felixxm commented Mar 23, 2018

@timgraham Thanks for the review. You're right decode() is enough.

@ngnpope
Copy link
Member

ngnpope commented Mar 23, 2018

I note that convert_textfield_value() below is using force_text().
Perhaps that can be resolved in a follow up commit if .decode() is enough there too?

@timgraham
Copy link
Member

Yes, there's a ticket to audit force_text() usage.

@timgraham
Copy link
Member

Then non-ASCII character case doesn't work on Oracle (leading spaces are missing).

@felixxm
Copy link
Member Author

felixxm commented Mar 23, 2018

LPad/Rpad doesn't work properly with multibyte characters sets on Oracle. I found the third issue in a row 😄

@felixxm
Copy link
Member Author

felixxm commented Mar 23, 2018

@timgraham I checked and IMO there doesn't exist a good workaround for LPad/RPad with multibyte characters on Oracle. None of them works in all cases. According Oracle documentation:

... In most character sets, this is also the number of characters in the return value. However, in some multibyte character sets, the display length of a character string can differ from the number of characters in the string.

I think it's fine to add a note in documentation about this restriction.

@timgraham
Copy link
Member

Sure, if you think is deserves a mention. We can't document everything about each database. :-)

@felixxm
Copy link
Member Author

felixxm commented Mar 23, 2018

You're right, it's impossible to describe all Oracle's caveats 😉 I've left a note in the ticket 28643, this should be enough. I updated tests, it works on Oracle.

@claudep
Copy link
Member

claudep commented Mar 23, 2018

Would it be worth benchmarking the speed overhead of this addition?

mysqlclient returns bytes instead of string in some edge cases.

Thanks Tim Graham and Nick Pope for reviews.
@felixxm
Copy link
Member Author

felixxm commented Mar 27, 2018

@claudep I wrote simple benchmark with djangobench:

Running benchmarks: mysql_char_converter
Control: Django 2.1.dev20180326173423 (in git branch master)
Experiment: Django 2.1.dev20180327171917 (in git branch issue-29251)

Running 'mysql_char_converter' benchmark ...
Min: 0.038486 -> 0.046070: 1.1970x slower
Avg: 0.043529 -> 0.051424: 1.1814x slower
Significant (t=-10.273530)
Stddev: 0.00555 -> 0.00532: 1.0428x smaller (N = 100)

Control: Django 2.1.dev20180326173423 (in git branch master)
Experiment: Django 2.1.dev20180327171917 (in git branch issue-29251)

Running 'mysql_char_converter' benchmark ...
Min: 0.034216 -> 0.042022: 1.2281x slower
Avg: 0.038185 -> 0.045879: 1.2015x slower
Significant (t=-14.931398)
Stddev: 0.00528 -> 0.00502: 1.0528x smaller (N = 200)

@claudep
Copy link
Member

claudep commented Mar 27, 2018

Oh thanks! So does that mean we are ~20% slower with the patch? (sorry, I'm not so great at stats)

@felixxm
Copy link
Member Author

felixxm commented Mar 28, 2018

I think so, but even if convert_charfield_value() doesn't check or change anything I still get about ~11% , i.e. with

def convert_charfield_value(self, value, expression, connection):
    return value

the result is

Running benchmarks: mysql_char_converter
Control: Django 2.1.dev20180326173423 (in git branch master)
Experiment: Django 2.1.dev20180328202753 (in git branch issue-29251)

Running 'mysql_char_converter' benchmark ...
Min: 0.075977 -> 0.085019: 1.1190x slower
Avg: 0.086817 -> 0.096318: 1.1094x slower
Significant (t=-8.154523)
Stddev: 0.00763 -> 0.00881: 1.1550x larger (N = 100)

Control: Django 2.1.dev20180326173423 (in git branch master)
Experiment: Django 2.1.dev20180328202753 (in git branch issue-29251)

Running 'mysql_char_converter' benchmark ...
Min: 0.080404 -> 0.091563: 1.1388x slower
Avg: 0.091934 -> 0.102306: 1.1128x slower
Significant (t=-12.184438)
Stddev: 0.00919 -> 0.00777: 1.1821x smaller (N = 200)

@claudep
Copy link
Member

claudep commented Mar 29, 2018

That's probably the overhead cost of adding converters (function calls et al). Then I think we should find another way of solving this issue. Slowing the whole stack just to fix this edge case is not very sound IMHO.

@felixxm
Copy link
Member Author

felixxm commented Mar 29, 2018

I've prepared alternative PR #9835 with conversion only in affected database functions.

@felixxm
Copy link
Member Author

felixxm commented Mar 31, 2018

Closing due to the alternative approach in #9835.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
4 participants