New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

utf8mb4 #367

Closed
kadkins opened this Issue Mar 27, 2013 · 11 comments

Comments

Projects
None yet
5 participants
@kadkins

kadkins commented Mar 27, 2013

I am using MariaDB 5.5 but had the same issue with MySQL 5.5

  • I can add/edit/view 4-byte utf-8 characters (iOS emoji's for example) from a terminal and sql gui client
  • my.cnf is set to use utf8mb4 as default charset
    • I set the databases default encoding to utf8mb4

Using Rails 3.2.13 database.yml

staging:
  adapter: mysql2
  host: dbstage01
  database: sp_stage
  username: sp_stage
  password: 12345678
  encoding: utf8mb4
  collation: utf8mb4_unicode_ci
  port: 3306

I have tried with and without collation:, encoding: hoping it would use the databases default encoding as well as using just using utf8.

Using native Rails 3 gem 'gem mysql2':

  • rake db:create fails with 'Unsupported charset: '"utf8mb4"' message

Using gem 'mysql2', :git => "https://github.com/brianmario/mysql2.git"

  • The database and migrations complete, but it is not using utf8mb4 as specified in the config file.
  • 4 byte emoji displays as ?'s

I was able to fix this on an earlier project using mysql5.5 and writing a bunch of scripts to go through each table and column, changing charsets and collations to utf8mb4, and also setting indexes on utf8mb4 to a max of 191 characters.

@kadkins

This comment has been minimized.

Show comment
Hide comment
@kadkins

kadkins Mar 27, 2013

It also appears that github cannot handle 4byte UTF-8 either

kadkins commented Mar 27, 2013

It also appears that github cannot handle 4byte UTF-8 either

@andreychernih

This comment has been minimized.

Show comment
Hide comment
@andreychernih

andreychernih Apr 18, 2013

+1 on this issue, I was not able to specify utf8mb4 encoding in the database.yml. It fails with Unsupported charset: '"utf8mb4"'

andreychernih commented Apr 18, 2013

+1 on this issue, I was not able to specify utf8mb4 encoding in the database.yml. It fails with Unsupported charset: '"utf8mb4"'

@andreychernih

This comment has been minimized.

Show comment
Hide comment
@andreychernih

andreychernih Apr 18, 2013

This error hit me when I was upgrading from Ruby 1.8.7 to Ruby 1.9.3.

andreychernih commented Apr 18, 2013

This error hit me when I was upgrading from Ruby 1.8.7 to Ruby 1.9.3.

@sodabrew

This comment has been minimized.

Show comment
Hide comment
@sodabrew

sodabrew Apr 18, 2013

Collaborator

Are you using mysql2 0.3.11 or 0.3.12bX? utf8mb4 is not present in 0.3.11.

Collaborator

sodabrew commented Apr 18, 2013

Are you using mysql2 0.3.11 or 0.3.12bX? utf8mb4 is not present in 0.3.11.

@kadkins

This comment has been minimized.

Show comment
Hide comment
@kadkins

kadkins Apr 18, 2013

Also, as a workaround I had to monkey patch /usr/share/mysql/charsets/Index.xml (Ubuntu 10.04LTS Server) to get the mysql gem to work correctly, even though the databases are remote.

I simply renamed the default utf8 to utf8mb4 in order to get the mysql client to work:

<charset name="utf8mb4">
  <family>Unicode</family>
  <description>UTF-8 Unicode</description>
  <alias>utf-8</alias>
  <collation name="utf8_general_ci"     id="33">
   <flag>primary</flag>
   <flag>compiled</flag>
  </collation>
  <collation name="utf8_bin"            id="83">
    <flag>binary</flag>
    <flag>compiled</flag>
  </collation>
</charset>

I also wrote this script to help me convert my databases. It only outputs DDL statements and does nto automatically do anything. I wrote it so I could paste the results into my query window and execute them after review... use with caution and at your own risk.

Copy and paste into utf8mb4helper.rb and alter the settings in top of the script and then run ruby utf8mb4helper.rb > utf8mb4helper.sql

# ruby script to generate ddl statements to convert utf8 to utf8mb4
# you will need ruby, rubygems and the mysql2 gem to run this script
# => gem install mysql2 --no-rdoc --no-ri
require 'rubygems'
require 'mysql2'

### settings
dbhost='localhost'
dbuser='root'
dbuserpwd='password'
database='testdbname'
charset='utf8mb4'
collation='utf8mb4_unicode_ci'

### these will store the statements
no_index_and_data_less_than_191=""
alter_tables=""
alter_column_sizes=""
needs_index_resized=""
no_index_but_data_greater_than_191=""
has_index_but_less_than_191 = ""

### connect to database
con = Mysql2::Client.new(:host=>dbhost,:username=>dbuser,:password=>dbuserpwd)

# For each table in database
tables = con.query("SELECT table_name from information_schema.tables where table_schema = '" + database + "';")
tables.each do |h|
  # check and update columns
  columns = con.query("SHOW FULL COLUMNS FROM `" + database + "`.`" + h["table_name"] + "`;")
  columns.each do |c|
    # if the column has an index, set the max field size to 191
    checkindex=con.query("SELECT * FROM information_schema.statistics WHERE table_schema = '#{database}' AND table_name = '#{h["table_name"]}' AND column_name = '#{c["Field"]}';")
    if c["Type"] =~ /^varchar/
      # varchar.. check the size
      maxsize = con.query("SELECT MAX(LENGTH(`#{c["Field"]}`)) FROM `#{database}`.`#{h["table_name"]}`;")
      size = maxsize.first.each_value.collect.first || 0
      if checkindex.size > 0
        # varchar with an index
        if size > 191
          needs_index_resized << "/* ALTER TABLE `#{h["table_name"]}` CHANGE `#{c["Field"]}` `#{c["Field"]}` varchar(191); */\n"
        else
          has_index_but_less_than_191 << "ALTER TABLE `#{h["table_name"]}` CHANGE `#{c["Field"]}` `#{c["Field"]}` varchar(191);\n"
        end
      else # no index
        utf8 = con.query("SELECT character_set_name FROM information_schema.`COLUMNS` C WHERE table_schema = '#{database}' AND table_name = '#{h["table_name"]}' AND column_name = '#{c["Field"]}' AND character_set_name='utf8';")
        if utf8.size > 0 # utf8 varchar
          if size > 191
            no_index_but_data_greater_than_191 << "ALTER TABLE `#{h["table_name"]}` CHANGE `#{c["Field"]}` `#{c["Field"]}` #{c["Type"]} CHARACTER SET '#{charset}' COLLATE '#{collation}';\n"
          else
            no_index_and_data_less_than_191 << "ALTER TABLE `#{h["table_name"]}` CHANGE `#{c["Field"]}` `#{c["Field"]}` #{c["Type"]} CHARACTER SET '#{charset}' COLLATE '#{collation}';\n"
          end
        end
      end
    end
  end
  # check and update table
  alter_tables << "ALTER TABLE `#{h["table_name"]}` CONVERT TO CHARACTER SET '#{charset}' COLLATE '#{collation}';\n"
end

puts "use `#{database}`;"
puts "/* ############ THESE COLUMNS HAVE INDEXES BUT NO DATA > THAN 191-- MAY NEED TO REBUILD INDEX ############ */"
puts has_index_but_less_than_191
puts "\n"
puts "/* ############ THESE COLUMNS HAVE INDEXES AND DATA > 191.. ALTERING THEM WOULD TRUNCATE DATA!!! RESIZE INDEX */"
puts needs_index_resized
puts "\n"
puts "/* ############ THESE COLUMNS DO NOT HAVE AN INDEX AND BUT HAVE DATA > 191 -- BE AWARE OF FUTURE INDICES ############ */"
puts no_index_but_data_greater_than_191
puts "\n"
puts "/* ############ THESE COLUMNS DO NOT HAVE AN INDEX AND CURRENTLY DO NOT HAVE DATA > 191 ############ */"
puts no_index_and_data_less_than_191
puts "\n"
puts "/* ############ CONVERT TABLES TO UTF8MB4 ############ */"
puts alter_tables
puts "\n"
puts "/* ############ CONVERT DATABASE ########### */"
puts "ALTER DATABASE `#{database}` CHARACTER SET = utf8mb4 COLLATE = `#{collation}`;"
puts "\n"
puts "/* ############ SOME HELPFULL COMMANDS TO RUN BEFORE AND/OR AFTER ########### */"
puts "# use `#{database}`; show session variables like 'character%';"
puts "# set names utf8mb4; use `#{database}`; show session variables like 'character%';"
puts "# select `SOMECOLUMN` from `#{database}` where NOT HEX(`SOMECOLUMN`) REGEXP '^([0-7][0-9A-F])*$'"

con.close

kadkins commented Apr 18, 2013

Also, as a workaround I had to monkey patch /usr/share/mysql/charsets/Index.xml (Ubuntu 10.04LTS Server) to get the mysql gem to work correctly, even though the databases are remote.

I simply renamed the default utf8 to utf8mb4 in order to get the mysql client to work:

<charset name="utf8mb4">
  <family>Unicode</family>
  <description>UTF-8 Unicode</description>
  <alias>utf-8</alias>
  <collation name="utf8_general_ci"     id="33">
   <flag>primary</flag>
   <flag>compiled</flag>
  </collation>
  <collation name="utf8_bin"            id="83">
    <flag>binary</flag>
    <flag>compiled</flag>
  </collation>
</charset>

I also wrote this script to help me convert my databases. It only outputs DDL statements and does nto automatically do anything. I wrote it so I could paste the results into my query window and execute them after review... use with caution and at your own risk.

Copy and paste into utf8mb4helper.rb and alter the settings in top of the script and then run ruby utf8mb4helper.rb > utf8mb4helper.sql

# ruby script to generate ddl statements to convert utf8 to utf8mb4
# you will need ruby, rubygems and the mysql2 gem to run this script
# => gem install mysql2 --no-rdoc --no-ri
require 'rubygems'
require 'mysql2'

### settings
dbhost='localhost'
dbuser='root'
dbuserpwd='password'
database='testdbname'
charset='utf8mb4'
collation='utf8mb4_unicode_ci'

### these will store the statements
no_index_and_data_less_than_191=""
alter_tables=""
alter_column_sizes=""
needs_index_resized=""
no_index_but_data_greater_than_191=""
has_index_but_less_than_191 = ""

### connect to database
con = Mysql2::Client.new(:host=>dbhost,:username=>dbuser,:password=>dbuserpwd)

# For each table in database
tables = con.query("SELECT table_name from information_schema.tables where table_schema = '" + database + "';")
tables.each do |h|
  # check and update columns
  columns = con.query("SHOW FULL COLUMNS FROM `" + database + "`.`" + h["table_name"] + "`;")
  columns.each do |c|
    # if the column has an index, set the max field size to 191
    checkindex=con.query("SELECT * FROM information_schema.statistics WHERE table_schema = '#{database}' AND table_name = '#{h["table_name"]}' AND column_name = '#{c["Field"]}';")
    if c["Type"] =~ /^varchar/
      # varchar.. check the size
      maxsize = con.query("SELECT MAX(LENGTH(`#{c["Field"]}`)) FROM `#{database}`.`#{h["table_name"]}`;")
      size = maxsize.first.each_value.collect.first || 0
      if checkindex.size > 0
        # varchar with an index
        if size > 191
          needs_index_resized << "/* ALTER TABLE `#{h["table_name"]}` CHANGE `#{c["Field"]}` `#{c["Field"]}` varchar(191); */\n"
        else
          has_index_but_less_than_191 << "ALTER TABLE `#{h["table_name"]}` CHANGE `#{c["Field"]}` `#{c["Field"]}` varchar(191);\n"
        end
      else # no index
        utf8 = con.query("SELECT character_set_name FROM information_schema.`COLUMNS` C WHERE table_schema = '#{database}' AND table_name = '#{h["table_name"]}' AND column_name = '#{c["Field"]}' AND character_set_name='utf8';")
        if utf8.size > 0 # utf8 varchar
          if size > 191
            no_index_but_data_greater_than_191 << "ALTER TABLE `#{h["table_name"]}` CHANGE `#{c["Field"]}` `#{c["Field"]}` #{c["Type"]} CHARACTER SET '#{charset}' COLLATE '#{collation}';\n"
          else
            no_index_and_data_less_than_191 << "ALTER TABLE `#{h["table_name"]}` CHANGE `#{c["Field"]}` `#{c["Field"]}` #{c["Type"]} CHARACTER SET '#{charset}' COLLATE '#{collation}';\n"
          end
        end
      end
    end
  end
  # check and update table
  alter_tables << "ALTER TABLE `#{h["table_name"]}` CONVERT TO CHARACTER SET '#{charset}' COLLATE '#{collation}';\n"
end

puts "use `#{database}`;"
puts "/* ############ THESE COLUMNS HAVE INDEXES BUT NO DATA > THAN 191-- MAY NEED TO REBUILD INDEX ############ */"
puts has_index_but_less_than_191
puts "\n"
puts "/* ############ THESE COLUMNS HAVE INDEXES AND DATA > 191.. ALTERING THEM WOULD TRUNCATE DATA!!! RESIZE INDEX */"
puts needs_index_resized
puts "\n"
puts "/* ############ THESE COLUMNS DO NOT HAVE AN INDEX AND BUT HAVE DATA > 191 -- BE AWARE OF FUTURE INDICES ############ */"
puts no_index_but_data_greater_than_191
puts "\n"
puts "/* ############ THESE COLUMNS DO NOT HAVE AN INDEX AND CURRENTLY DO NOT HAVE DATA > 191 ############ */"
puts no_index_and_data_less_than_191
puts "\n"
puts "/* ############ CONVERT TABLES TO UTF8MB4 ############ */"
puts alter_tables
puts "\n"
puts "/* ############ CONVERT DATABASE ########### */"
puts "ALTER DATABASE `#{database}` CHARACTER SET = utf8mb4 COLLATE = `#{collation}`;"
puts "\n"
puts "/* ############ SOME HELPFULL COMMANDS TO RUN BEFORE AND/OR AFTER ########### */"
puts "# use `#{database}`; show session variables like 'character%';"
puts "# set names utf8mb4; use `#{database}`; show session variables like 'character%';"
puts "# select `SOMECOLUMN` from `#{database}` where NOT HEX(`SOMECOLUMN`) REGEXP '^([0-7][0-9A-F])*$'"

con.close
@sodabrew

This comment has been minimized.

Show comment
Hide comment
@sodabrew

sodabrew Apr 18, 2013

Collaborator

Does this resolve the issue, may I close it?

Collaborator

sodabrew commented Apr 18, 2013

Does this resolve the issue, may I close it?

@kadkins

This comment has been minimized.

Show comment
Hide comment
@kadkins

kadkins Apr 18, 2013

Upgrading mysql clients to natively handle utf8mb4 would resolve the issue.. all I have done is monkey patched files to get mine to work.. and that only works if you only have utf8mb4 clients only (no mix and match), at least on Ubuntu systems.

I am not familiar enough with the Index.xml file, nor do I fully understand how it works within my os to get it to work with both utf8 and utf8mb4 charsets. Honestly, this has caused so much frustration and time converting everything to work with iOS emoji, that I am looking to migrate everything to postgres :(

utf8 should be utf8.

kadkins commented Apr 18, 2013

Upgrading mysql clients to natively handle utf8mb4 would resolve the issue.. all I have done is monkey patched files to get mine to work.. and that only works if you only have utf8mb4 clients only (no mix and match), at least on Ubuntu systems.

I am not familiar enough with the Index.xml file, nor do I fully understand how it works within my os to get it to work with both utf8 and utf8mb4 charsets. Honestly, this has caused so much frustration and time converting everything to work with iOS emoji, that I am looking to migrate everything to postgres :(

utf8 should be utf8.

@Vaduz

This comment has been minimized.

Show comment
Hide comment
@Vaduz

Vaduz Apr 30, 2013

I got same issue but install MySQL-devel version 5.5.20 BEFORE installing mysql2 0.3.12b fixed this issue.
The mysql client that this package includes is not support utf8mb4. So that you need to link shared library externaly.
The extconf.rb find whether libmysqlclient.so.18 is installed on your machine, check your generated lib/mysql2/mysql2.so dynamicaly linked libraries using ldd.

Vaduz commented Apr 30, 2013

I got same issue but install MySQL-devel version 5.5.20 BEFORE installing mysql2 0.3.12b fixed this issue.
The mysql client that this package includes is not support utf8mb4. So that you need to link shared library externaly.
The extconf.rb find whether libmysqlclient.so.18 is installed on your machine, check your generated lib/mysql2/mysql2.so dynamicaly linked libraries using ldd.

@sodabrew

This comment has been minimized.

Show comment
Hide comment
@sodabrew

sodabrew Jul 9, 2013

Collaborator

Definitively closing the issue with these instructions:

  • First, make sure you have upgraded your mysql client library to one that supports utf8mb4 (review the mysql release notes as needed).
  • Then, upgrade to mysql 0.3.12x (where x may be a beta release or the final 0.3.12 when it is available).
Collaborator

sodabrew commented Jul 9, 2013

Definitively closing the issue with these instructions:

  • First, make sure you have upgraded your mysql client library to one that supports utf8mb4 (review the mysql release notes as needed).
  • Then, upgrade to mysql 0.3.12x (where x may be a beta release or the final 0.3.12 when it is available).

@sodabrew sodabrew closed this Jul 9, 2013

@williamhqs

This comment has been minimized.

Show comment
Hide comment
@williamhqs

williamhqs Mar 6, 2015

I has the error on my CentOS 6.5 when i deploy my Sinatra app. I am trying to solve it by :

gem install mysql2 -- --with-mysql-dir=/var/lib/mysql --with-mysql-config=/usr/bin/mysql_config

But seems the path is not correct. How should i choose the path. Below is my system mysql path.
I get the ERROR: Any help? thanks!

ERROR:  Error installing mysql2:
    ERROR: Failed to build gem native extension.

    /home/william/.rbenv/versions/2.1.1/bin/ruby extconf.rb --with-mysql-dir=/usr/bin/mysql --with-mysql-config=/usr/bin/mysql_config
checking for ruby/thread.h... yes
checking for rb_thread_call_without_gvl() in ruby/thread.h... yes
checking for rb_thread_blocking_region()... yes
checking for rb_wait_for_single_fd()... yes
checking for rb_hash_dup()... yes
checking for rb_intern3()... yes
-----
Cannot find include dir(s) /usr/bin/mysql/include
-----

PS:

[william@ykj mysql]$ whereis mysql_config
mysql_config: /usr/bin/mysql_config /usr/share/man/man1/mysql_config.1.gz
[william@ykj mysql]$ 
[william@ykj mysql]$ 
[william@ykj mysql]$ gem install mysql2 -- --with-mysql-dir=/usr/bin/mysql

williamhqs commented Mar 6, 2015

I has the error on my CentOS 6.5 when i deploy my Sinatra app. I am trying to solve it by :

gem install mysql2 -- --with-mysql-dir=/var/lib/mysql --with-mysql-config=/usr/bin/mysql_config

But seems the path is not correct. How should i choose the path. Below is my system mysql path.
I get the ERROR: Any help? thanks!

ERROR:  Error installing mysql2:
    ERROR: Failed to build gem native extension.

    /home/william/.rbenv/versions/2.1.1/bin/ruby extconf.rb --with-mysql-dir=/usr/bin/mysql --with-mysql-config=/usr/bin/mysql_config
checking for ruby/thread.h... yes
checking for rb_thread_call_without_gvl() in ruby/thread.h... yes
checking for rb_thread_blocking_region()... yes
checking for rb_wait_for_single_fd()... yes
checking for rb_hash_dup()... yes
checking for rb_intern3()... yes
-----
Cannot find include dir(s) /usr/bin/mysql/include
-----

PS:

[william@ykj mysql]$ whereis mysql_config
mysql_config: /usr/bin/mysql_config /usr/share/man/man1/mysql_config.1.gz
[william@ykj mysql]$ 
[william@ykj mysql]$ 
[william@ykj mysql]$ gem install mysql2 -- --with-mysql-dir=/usr/bin/mysql
@sodabrew

This comment has been minimized.

Show comment
Hide comment
@sodabrew

sodabrew Mar 6, 2015

Collaborator

@williamhqs The default search for mysql-config includes /usr/bin, so while --with-mysql-config=/usr/bin/mysql_config is correct, it is not needed.

Your use of --with-mysql-dir is incorrect in all of the examples above. Please remove it and that should resolve the problem. --with-mysql-dir and --with-mysql-config are mutually exclusive per the README explanation of these options.

Collaborator

sodabrew commented Mar 6, 2015

@williamhqs The default search for mysql-config includes /usr/bin, so while --with-mysql-config=/usr/bin/mysql_config is correct, it is not needed.

Your use of --with-mysql-dir is incorrect in all of the examples above. Please remove it and that should resolve the problem. --with-mysql-dir and --with-mysql-config are mutually exclusive per the README explanation of these options.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment