Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UTF-8 Conversion Ignores Connection Encoding Setting #24

Closed
hdiedrich opened this issue Mar 22, 2012 · 4 comments
Closed

UTF-8 Conversion Ignores Connection Encoding Setting #24

hdiedrich opened this issue Mar 22, 2012 · 4 comments
Assignees

Comments

@hdiedrich
Copy link
Collaborator

Automatic conversion of parameters sent to prepared statements to utf8 should not happen when the connection is set to latin1.

@hdiedrich
Copy link
Collaborator Author

Transfered from Stackoverflow. TheSquad asked:

http://stackoverflow.com/questions/9686979/erlang-emysql-encoding-difference-between-prepared-and-regular-query

I Have wrote a question which got a right answer here[1] about emysql encoding. The answer pinpoint another question...

I'm trying to store iPhone emojis into a database...
When I do :

Query = io_lib:format("UPDATE Users SET c=\"~s\" WHERE id=~B", [C, Id]),
emysql:execute(mydb, Query).

Everything works fine...
But with:

emysql:prepare(update_c, <<"UPDATE Users SET c=? WHERE id=?">>),
emysql:execute(mydb, update_c, [C, Id]).

I'm retrieving Mojibake.
I'm connecting with :

emysql:add_pool(my_db, 3, "login", "password", "db.mydomain.com", 3306, "MyTable", latin1)

Unfortunately, I cannot use utf8 because of the previous software that used the database and stored emoji's that way, If I do use utf8, it will work with the new system, but not with rows inserted by the old one.

I really would really like to use prepared statement, that would prevent SQL injection effectively.

barsju answered:

Just convert you table to UTF-8:
ALTER TABLE Users CONVERT TO CHARACTER SET utf8;
Then you can use utf-8 with new data and the old will have been converted to UTF-8 aswell.

TheSquad replied:

Users table is already in utf8 character set,but that's not really the issue here... My question is about why the prepared statement is using utf8 since I explicitily asked emysql to connect to the database with latin1 characters set.

Barsu again:

Well if I understand erlang correctly, strings are ISO8859-1 (latin) encoded while binaries are UTF-8. So: "Hi" is latin but <<"Hi">> is utf-8. What happens if you instruct emysql to connect using utf-8?

TheSquad:

If I use UTF8 for connecting to database, I can Add/retrieve correctly with the new software, the only thing is I can't retrieve correctly Data already on the database (I get Mojibake Data) which were inserted with the previous software. And unfortunately I need to retrieve old Data correctly.

Actually, I'm wondering why it works correctly with non-prepared statements, although I use emysql_util:encode() with it in order to avoid SQL Injections, BUT with prepared statement, It does not insert correctly in latin1

[1] http://stackoverflow.com/questions/9668647/erlang-emysql-iphone-emoji-encoding-issue

@hdiedrich
Copy link
Collaborator Author

I believe it's an error in Emysql and I think I fixed it. Still working out the unit tests so it all makes sense. I'll let you know when it's posted to github.

I opened an issue for this: #24

Essentially, you are tricking the driver and the database because you open the connection with latin-1 but the database is utf-8. Then you trip over the automatic conversion.

Still, I think you are right that the driver should respect that you set the connection to latin-1 and not do the magic of automatic conversion to utf-8. If you read issue #14 at Eonblast/Emysql at github you'll find I always suspected automatic conversion was a bad idea.

However, just from the fact that the unit tests for the conversions are now blowing up by the factor of four (and pose some rather uninteresting but mind boggling fringe issues I can't get my head around) I think tricking the database the way you do is likewise a bad idea. If you can, you should clean this up rather than rely on the mechanics in-between to hold. There are multiple levels in MySQL where conversions occur. As you know you can set the connection, the database, also the table to a character set. It's a great way to produce bugs. Can you describe why you could not? Because you have no control and must act blind to encoding? I'd like to know if there is a real case where you can't live without this hack.

Regardless, your complaint about the setting of the connection to latin-1 probably showed the way to eliminate all or most of the guessing in the character conversions in Emysql. That's very much appreciated and I hope I'll have a solution for you later today.

Henning

@ghost ghost assigned hdiedrich Mar 26, 2012
@hdiedrich
Copy link
Collaborator Author

Should be fixed in 0.2.13 (253b7f9). Please let me know your feedback.

253b7f9

@hdiedrich
Copy link
Collaborator Author

No protests, closing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant