Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DB Upgrade script 2009100701.sql corrupts UTF8/multibyte characters in database in some cases (Trac #1928) #1928

Closed
elgg-gitbot opened this issue Feb 16, 2013 · 20 comments

Comments

@elgg-gitbot
Copy link

@elgg-gitbot elgg-gitbot commented Feb 16, 2013

Original ticket http://trac.elgg.org/ticket/1928 on 40171649-06-10 by cash, assigned to unknown.

Elgg version: 1.7

I'm creating a ticket in case other users have this problem.

To reproduce:

  • clean 1.6.1 install (happened to be on Ubuntu 9.10 Server with all defaults settings)
  • create a blog post with the body: 조선말
  • It should display fine in 1.6.1
  • Run upgrade script to 1.7
  • The blog post should now be: ???
  • Checking the database in phpmyadmin gives the same result in objects table: ???
@elgg-gitbot
Copy link
Author

@elgg-gitbot elgg-gitbot commented Feb 16, 2013

brettp wrote on 40171732-08-16

I have tested this with Japanese, Korean, and Russian on 4 different servers (including Ubuntu 9.10 with defaults) and cannot duplicate.

Cash - You said 9.10--Was this a fully updated copy or a fresh install from their ISO? Also, did you just use

apt-get install libapache2-mod-php5 php5-mysql mysql-server 

to install or specify specific versions (mysql-server-5.0)?

@elgg-gitbot
Copy link
Author

@elgg-gitbot elgg-gitbot commented Feb 16, 2013

cash wrote on 40171853-08-23

Two people reported it in the community site so far.

I did an install from the ISO and then did a dist-upgrade. I did not specify a specific mysql version for the install. I'll grab all these details tonight and add them to the ticket.

@elgg-gitbot
Copy link
Author

@elgg-gitbot elgg-gitbot commented Feb 16, 2013

trac user kako wrote on 40172443-05-16

This occurs on UTF-8 format, on Spanish versión loose all special character (ñ, í, ó) and cannot login.

I've just install on debian etch server with mysql-server-5.0.

Thx.

@elgg-gitbot
Copy link
Author

@elgg-gitbot elgg-gitbot commented Feb 16, 2013

brettp wrote on 40172690-09-02

Could someone please post the result of running

SELECT hex(string) from elggmetastrings WHERE id=id_of_a_corrupt_string

both before and after a faulty migration?

@elgg-gitbot
Copy link
Author

@elgg-gitbot elgg-gitbot commented Feb 16, 2013

brettp wrote on 40172691-08-15

Also, what the string should be in UTF8 would be useful too :)

@elgg-gitbot
Copy link
Author

@elgg-gitbot elgg-gitbot commented Feb 16, 2013

cash wrote on 40172813-07-23

Okay - I figured out how I did this when trying to reproduce the reports. I entered the data as UTF8 and the conversion assumes latin1 and the ending result was the corruption. I was in too much of a hurry. You can throw out my data point.

This gives us three reports. Two gave absolutely no details. The third: kako posted a screengrab of the website on the community site that has latin1 characters coming through. This probably means the new code was copied over but the upgrade script was not run.

That leaves two reports with no details...

I think we can close this and anyone who runs into the problem can reopen.

@elgg-gitbot
Copy link
Author

@elgg-gitbot elgg-gitbot commented Feb 16, 2013

brettp wrote on 40172860-02-21

Agreed. Closing this one.

One thing though--If users trying to upgrade modified core (as suggested and warned against in this thread [svn:1]) the upgrade will result in corrupted databases. If so, the users will need to restore from a backup, delete the 1.7 engine/schema/upgrades/2009100701.sql migration, and run the upgrade again.

[svn:1] http://community.elgg.org/mod/groups/topicposts.php?topic=236843&group_guid=16#annotation-432444

@elgg-gitbot
Copy link
Author

@elgg-gitbot elgg-gitbot commented Feb 16, 2013

trac user kako wrote on 40173793-05-28

Replying to brettp:

Agreed. Closing this one.

One thing though--If users trying to upgrade modified core (as suggested and warned against in this thread [svn:1]) the upgrade will result in corrupted databases. If so, the users will need to restore from a backup, delete the 1.7 engine/schema/upgrades/2009100701.sql migration, and run the upgrade again.
Thx!, i will test it tomorrow evening and will tell you what happen.

[svn:1] http://community.elgg.org/mod/groups/topicposts.php?topic=236843&group_guid=16#annotation-432444

@elgg-gitbot
Copy link
Author

@elgg-gitbot elgg-gitbot commented Feb 16, 2013

trac user kiang wrote on 40189130-03-10

I met the same problem and the solution to remove '2009100701.sql' worked for me.

I haven't had tried to modify the core of elgg. But our website is using charset utf-8 ( http://oss.tw/elgg/ ) and the tables were already in utf-8. Maybe I did create the database with utf-8 and the tables inherited when installing elgg 1.6.1.

I think the upgrade script could detect if the table is using utf-8 charset by sending the query 'SHOW TABLE STATUS' to the database.

Ref: http://dev.mysql.com/doc/refman/5.0/en/show-table-status.html

@elgg-gitbot
Copy link
Author

@elgg-gitbot elgg-gitbot commented Feb 16, 2013

brettp wrote on 40189316-04-29

It doesn't matter what charset the tables are in this case; it matters what charset PHP uses to communicate with MySQL. 1.7 forward uses "SET NAMES utf8" to tell PHP to community with the MySQL server in utf8. Previous versions used the default of latin1. I don't know of a way to test what charset previous clients used. Simply testing what charset the tables are in isn't enough.

After some research, I believe there may be a way to test for this. kiang, can you please do the following if you are able?

Use a standard mysql command line client (PHP My Admin won't work) to connect to your server and select your Elgg database, then run these queries:

show variables like 'character_set%';
set names utf8;
show variables like 'character_set%';

I would be very interested to see the output of this on your server. Thanks.

@elgg-gitbot
Copy link
Author

@elgg-gitbot elgg-gitbot commented Feb 16, 2013

trac user kiang wrote on 40189328-02-11

Sure, here you are. ;)

mysql -uroot -p
Enter password: 

Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 2228
Server version: 5.1.37-1ubuntu5.1 (Ubuntu)

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

mysql> use elgg;
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A

Database changed
mysql> show variables like 'character_set%';
+--------------------------+----------------------------+
| Variable_name            | Value                      |
+--------------------------+----------------------------+
| character_set_client     | latin1                     | 
| character_set_connection | latin1                     | 
| character_set_database   | utf8                       | 
| character_set_filesystem | binary                     | 
| character_set_results    | latin1                     | 
| character_set_server     | latin1                     | 
| character_set_system     | utf8                       | 
| character_sets_dir       | /usr/share/mysql/charsets/ | 
+--------------------------+----------------------------+
8 rows in set (0.00 sec)

mysql> set names utf8;
Query OK, 0 rows affected (0.00 sec)

mysql> show variables like 'character_set%';
+--------------------------+----------------------------+
| Variable_name            | Value                      |
+--------------------------+----------------------------+
| character_set_client     | utf8                       | 
| character_set_connection | utf8                       | 
| character_set_database   | utf8                       | 
| character_set_filesystem | binary                     | 
| character_set_results    | utf8                       | 
| character_set_server     | latin1                     | 
| character_set_system     | utf8                       | 
| character_sets_dir       | /usr/share/mysql/charsets/ | 
+--------------------------+----------------------------+
8 rows in set (0.00 sec)

@elgg-gitbot
Copy link
Author

@elgg-gitbot elgg-gitbot commented Feb 16, 2013

brettp wrote on 40189338-04-15

kiang Thanks for the quick reply! Is this an upgraded 1.7 database or a previous one?

@elgg-gitbot
Copy link
Author

@elgg-gitbot elgg-gitbot commented Feb 16, 2013

trac user kiang wrote on 40189354-03-23

brettp, this is the clone of the database before upgrading(1.6.1).

@elgg-gitbot
Copy link
Author

@elgg-gitbot elgg-gitbot commented Feb 16, 2013

Title changed from Upgrade script corrupts Korean characters in database to DB Upgrade script 2009100701.sql corrupts UTF8/multibyte characters in database in some cases by brettp on 40199064-01-02

@elgg-gitbot
Copy link
Author

@elgg-gitbot elgg-gitbot commented Feb 16, 2013

Milestone changed to Elgg 1.7.1 by cash on 40201959-02-25

@elgg-gitbot
Copy link
Author

@elgg-gitbot elgg-gitbot commented Feb 16, 2013

Attachment added by brettp on 40209766-08-26: utf8_test.php

@elgg-gitbot
Copy link
Author

@elgg-gitbot elgg-gitbot commented Feb 16, 2013

brettp wrote on 40209777-12-07

For everyone having this problem, could you please download the attached 'utf8_test.php' file, place it in your Elgg root, run it from a web browser, and copy the output here? This is a simple file that outputs information about your default PHP settings that I hope can be used to conditionally run this upgrade.

@elgg-gitbot
Copy link
Author

@elgg-gitbot elgg-gitbot commented Feb 16, 2013

trac user gatchi wrote on 40241771-01-27

I have a same problem with korean language. i think it is exactly same problem with Sam. Result with the 'uft8_test.php' test on my server is this

array(2) { ["Variable_name"]=> string(20) "character_set_client" ["Value"]=> string(4) "utf8" } array(2) { ["Variable_name"]=> string(24) "character_set_connection" ["Value"]=> string(4) " utf8" } array(2) { ["Variable_name"]=> string(22) "character_set_database" ["Value"]=> string(4) "utf8" } array(2) { ["Variable_name"]=> string(24) "character_set_filesystem" ["Value"]=> string(6) "binary" } array(2) { ["Variable_name"]=> string(21) "character_set_results" ["Value"]=> string(4) "utf8" } array(2) { ["Variable_name"]=> string(20) "character_set_server" ["Value"]=> string(4) "utf8" } array(2) { ["Variable_name"]=> string(20) "character_set_system" ["Value"]=> string(4) "utf8" } array(2) { ["Variable_name"]=> string(18) "character_sets_dir" ["Value"]=> string(38) "/usr/local/mysql/share/mysql/charsets/" }

@elgg-gitbot
Copy link
Author

@elgg-gitbot elgg-gitbot commented Feb 16, 2013

brettp wrote on 40248891-09-21

(In [svn:5565]) Fixes #1928: UTF8 migration moved from db to code to detect if it is required. Bumped version to 1.7.1 (2010033101).

@elgg-gitbot
Copy link
Author

@elgg-gitbot elgg-gitbot commented Feb 16, 2013

brettp wrote on 40248905-02-05

Thanks to the feedback from the utf8 test I was able to write a new upgrade that I believe resolves this problem. Please grab the latest 1.7 branch and give it a try and reopen if there are any problems. I'd like to have this tested on a variety of servers before releasing 1.7.1. Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

No branches or pull requests

1 participant