Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Convert Alpha to UTF-8 #13

Closed
alphadevx opened this issue Dec 4, 2014 · 10 comments
Closed

Convert Alpha to UTF-8 #13

alphadevx opened this issue Dec 4, 2014 · 10 comments
Milestone

Comments

@alphadevx
Copy link
Owner

Requires steps:

Alter tables:

ALTER TABLE name DEFAULT CHARACTER SET utf8 COLLATE utf8_bin;

Install mbstring extension:

yum install php-mbstring

php.ini:

mbstring.language = Neutral ; Set default language to Neutral(UTF-8) (default)
mbstring.internal_encoding = UTF-8 ; Set default internal encoding to UTF-8
mbstring.encoding_translation = On ; HTTP input encoding translation is enabled
mbstring.http_input = auto ; Set HTTP input character set dectection to auto
mbstring.http_output = UTF-8 ; Set HTTP output encoding to UTF-8
mbstring.detect_order = auto ; Set default character encoding detection order to auto
mbstring.substitute_character = none ; Do not print invalid characters
default_charset = UTF-8 ; Default character set for auto content type header

Code-replace:

mail() -> mb_send_mail()
strlen() -> mb_strlen()
strpos() -> mb_strpos()
strrpos() -> mb_strrpos()
substr() -> mb_substr()
strtolower() -> mb_strtolower()
strtoupper() -> mb_strtoupper()
substr_count() -> mb_substr_count()
ereg() -> mb_ereg()
eregi() -> mb_eregi()
ereg_replace() -> mb_ereg_replace()
eregi_replace() -> mb_eregi_replace()
split() -> mb_split()

Pages:

And:

header('Content-Type: text/html; charset=utf-8');

htmlentities($text,ENT_COMPAT,'utf-8');

Forms:

Check posted data with: mb_check_encoding()

Database:

my.conf:

[mysqld]
character-set-server = utf8
default-character-set = utf8
skip-character-set-client-handshake

Backup and copy before converting to UTF8:

$ mysqldump -u root -p dbname >~/db_name.sql
mysql> create database dbname_copy;
mysql> use dbname_copy;
mysql> source ~/db_name.sql

References:

http://developer.loftdigital.com/blog/php-utf-8-cheatsheet
http://www.startupcto.com/backend-tech/going-utf-8-utf8-with-php-and-mysql

@alphadevx
Copy link
Owner Author

I have added calls to the FrontController to set the responses to be UTF8, and replaced all occurences of strlen with mb_strlen.

@alphadevx
Copy link
Owner Author

I have made all of the suggested changes listed above, moving this to end-user testing to see if there are any issues (all existing unit tests are passing).

@alphadevx
Copy link
Owner Author

I missed the call to mb_check_encoding(), I need to include this where requests are processed, or in the data validator.

@alphadevx
Copy link
Owner Author

In the MySQLi provider, I also need to add the following:

$mysqli->set_charset('utf8');

Ref: http://php.net/manual/en/mysqli.set-charset.php

@alphadevx
Copy link
Owner Author

I should also consider using the utf8mb4 character set in MySQL (this requires 5.5.3+): http://dev.mysql.com/doc/refman/5.5/en/charset-unicode-utf8mb4.html

@alphadevx
Copy link
Owner Author

Okay I have this working locally now with the mb_check_encoding() and $mysqli->set_charset('utf8') calls enabled, however this needs a lot more testing. Furthermore, I had to dumb my database, convert to UTF8, then reimport.

I'll continue to do more testing before I can consider this safe (it's not in trunk yet, but on the branches/13-convert-to-utf8 feature branch).

@alphadevx
Copy link
Owner Author

Worth trying the following as a conversion script:

mysqldump --add-drop-table database_to_correct | replace CHARSET=latin1 CHARSET=utf8 | iconv -f latin1 -t utf8 | mysql database_to_correct

Source: http://blogs.law.harvard.edu/djcp/2010/01/convert-mysql-database-from-latin1-to-utf8-the-right-way/

@alphadevx
Copy link
Owner Author

I just testing the following one-line command to convert an existing Alpha MySQL database from latin1 to UTF8 and it worked fine:

mysqldump --add-drop-table -u username --password='password' databasename | sed -e 's/CHARSET\=latin1/CHARSET\=utf8\ COLLATE\=utf8_general_ci/g' | iconv -f latin1 -t utf8 | mysql -u username --password='password' databasename

@alphadevx
Copy link
Owner Author

The above mysqldump command will convert each table in a database, however to convert the database default itself you should also run the following to be safe:

ALTER DATABASE databasename CHARACTER SET utf8 COLLATE utf8_general_ci;

@alphadevx
Copy link
Owner Author

Testing passed, merged back into trunk.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant