-
Notifications
You must be signed in to change notification settings - Fork 207
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
UTF-8 Conversion Ignores Connection Encoding Setting #24
Comments
Transfered from Stackoverflow. TheSquad asked:I Have wrote a question which got a right answer here[1] about emysql encoding. The answer pinpoint another question... I'm trying to store iPhone emojis into a database...
Everything works fine...
I'm retrieving Mojibake.
Unfortunately, I cannot use utf8 because of the previous software that used the database and stored emoji's that way, If I do use utf8, it will work with the new system, but not with rows inserted by the old one. I really would really like to use prepared statement, that would prevent SQL injection effectively. barsju answered:Just convert you table to UTF-8: TheSquad replied:Users table is already in utf8 character set,but that's not really the issue here... My question is about why the prepared statement is using utf8 since I explicitily asked emysql to connect to the database with latin1 characters set. Barsu again:Well if I understand erlang correctly, strings are ISO8859-1 (latin) encoded while binaries are UTF-8. So: "Hi" is latin but <<"Hi">> is utf-8. What happens if you instruct emysql to connect using utf-8? TheSquad:If I use UTF8 for connecting to database, I can Add/retrieve correctly with the new software, the only thing is I can't retrieve correctly Data already on the database (I get Mojibake Data) which were inserted with the previous software. And unfortunately I need to retrieve old Data correctly. Actually, I'm wondering why it works correctly with non-prepared statements, although I use emysql_util:encode() with it in order to avoid SQL Injections, BUT with prepared statement, It does not insert correctly in latin1 [1] http://stackoverflow.com/questions/9668647/erlang-emysql-iphone-emoji-encoding-issue |
I believe it's an error in Emysql and I think I fixed it. Still working out the unit tests so it all makes sense. I'll let you know when it's posted to github. I opened an issue for this: #24 Essentially, you are tricking the driver and the database because you open the connection with latin-1 but the database is utf-8. Then you trip over the automatic conversion. Still, I think you are right that the driver should respect that you set the connection to latin-1 and not do the magic of automatic conversion to utf-8. If you read issue #14 at Eonblast/Emysql at github you'll find I always suspected automatic conversion was a bad idea. However, just from the fact that the unit tests for the conversions are now blowing up by the factor of four (and pose some rather uninteresting but mind boggling fringe issues I can't get my head around) I think tricking the database the way you do is likewise a bad idea. If you can, you should clean this up rather than rely on the mechanics in-between to hold. There are multiple levels in MySQL where conversions occur. As you know you can set the connection, the database, also the table to a character set. It's a great way to produce bugs. Can you describe why you could not? Because you have no control and must act blind to encoding? I'd like to know if there is a real case where you can't live without this hack. Regardless, your complaint about the setting of the connection to latin-1 probably showed the way to eliminate all or most of the guessing in the character conversions in Emysql. That's very much appreciated and I hope I'll have a solution for you later today. Henning |
No protests, closing. |
Automatic conversion of parameters sent to prepared statements to utf8 should not happen when the connection is set to latin1.
The text was updated successfully, but these errors were encountered: