Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Convert MySQL fields to 4 byte unicode #5530

Merged
merged 4 commits into from Jan 21, 2015

Conversation

@dimaursu
Copy link
Contributor

commented Jan 6, 2015

This will make Diaspora* running on postgres and on mysql behave consistently: in the postgres version, you can use 4-byte UTF8, or full UTF8. In MySQL utf8 == the 3 byte subset of UTF8, while utf8mb4 is the true, full UTF8 encoding.

t.boolean "reviewed", default: false
t.text "text"
t.datetime "created_at"
t.datetime "updated_at"
t.string "item_type", null: false
t.integer "user_id", null: false
end

This comment has been minimized.

Copy link
@jhass

jhass Jan 6, 2015

Member

No actual schema changes in the migration, please don't commit schema.rb changes.

}

def self.up
if ENV['DB'] == 'mysql'

This comment has been minimized.

Copy link
@jhass

jhass Jan 6, 2015

Member

I plan to eventually get rid of the DB environment variable, I think you should be able to query the adapter through AR::Base.config or something like that.

This comment has been minimized.

Copy link
@jhass

jhass Jan 6, 2015

Member

Actually I just remembered we already have https://github.com/diaspora/diaspora/blob/develop/lib/configuration_methods.rb#L121 which can be called through AppConfig.postgres?. Don't need that often luckily!

@jhass

This comment has been minimized.

Copy link
Member

commented Jan 6, 2015

I guess you want to add to the big fat changelog entries here, so podmins notice to update their database.yml.

Gemfile Outdated
@@ -57,7 +57,7 @@ gem 'sass-rails', '4.0.4'

# Database

ENV['DB'] ||= 'mysql'
ENV['DB'] ||= 'postgres'

This comment has been minimized.

Copy link
@jhass

jhass Jan 6, 2015

Member

As much as I'd wish for, but nope...

Gemfile Outdated
@@ -88,6 +88,7 @@ gem 'rails-assets-jquery', '1.11.1' # Should be kept in sync with jquery-rails
gem 'js_image_paths', '0.0.1'
gem 'js-routes', '0.9.9'
gem 'rails-assets-punycode', '1.3.2'
gem 'rails-assets-twemoji'

This comment has been minimized.

Copy link
@jhass

jhass Jan 6, 2015

Member

Let's keep that change separate, since I disagree with it while agreeing about changing the database encoding.

@dimaursu dimaursu force-pushed the dimaursu:emoji branch from 92e7148 to a1be751 Jan 6, 2015

@goobertron

This comment has been minimized.

Copy link

commented Jan 6, 2015

If this is merged, please can we also have a means of blocking these emoji from view?

@jhass

This comment has been minimized.

Copy link
Member

commented Jan 6, 2015

@goobertron it's actually not graphic emojis for this one and actually just badly argued through the unicode emojis, so it's rather unreleated to them, they're just a side effect. There's strong technical reasons to do this.

@goobertron

This comment has been minimized.

Copy link

commented Jan 6, 2015

Oh right, I was basing my comment on the opening post, which includes graphics. I guess that's Github's fault. I'd still like an opt-out from seeing any emoji, however they are displayed.

@jhass

This comment has been minimized.

Copy link
Member

commented Jan 6, 2015

For this one, just remove any fonts that would display them... :P

@dimaursu dimaursu force-pushed the dimaursu:emoji branch from a1be751 to dfd91a4 Jan 7, 2015

@jhass

This comment has been minimized.

Copy link
Member

commented Jan 7, 2015

Okay, getting somewhere, let's make a list of open tasks:

  • Check in the now changed db/schema.rb, since that's what new databases are actually build from.
  • Add a migration to apply the schema changes to existing databases, order it before the encoding change.
  • Change the encoding of everything, so it's consistent and consistent with newly created databases.
@@ -194,15 +194,15 @@ class CreateSchema < ActiveRecord::Migration
add_index "notifications", ["target_id"], :name => "index_notifications_on_target_id"
add_index "notifications", ["target_type", "target_id"], :name => "index_notifications_on_target_type_and_target_id"

create_table "o_embed_caches", :force => true do |t|
create_table "o_embed_caches", options: 'ENGINE=InnoDB DEFAULT CHARSET=utf8', :force => true do |t|

This comment has been minimized.

Copy link
@jhass

jhass Jan 7, 2015

Member

That doesn't break on Postgres? Also I think I'd rather like to keep the same encoding everywhere and shorten the index key below.

This comment has been minimized.

Copy link
@dimaursu

dimaursu Jan 7, 2015

Author Contributor

According to Travis, no. Sorry, I forgot to erase that one.

end

def self.up
change_encoding('utf8mb4') if AppConfig.mysql?

This comment has been minimized.

Copy link
@dimaursu

dimaursu Jan 8, 2015

Author Contributor
ActiveRecord::Base.connection.columns(table).each do |column|
# build a hash with all the columns that contain text
if (column.type == :string) || (column.type == :text)
UTF8_PAIRS[table] = { 'name' => column.name, 'type' => column.sql_type }

This comment has been minimized.

Copy link
@jhass

jhass Jan 8, 2015

Member

I'd use symbol keys here :)

@dimaursu

This comment has been minimized.

Copy link
Contributor Author

commented Jan 8, 2015

@jhass ROW_FORMAT=DYNAMIC, in combination with InnoDB engine, and those 3 parameters above (now set in the migration, no need to manually touch configs), increases the maximum size of the index columns from 767 bytes to 3072 (something like that); that gives us 768 chars for the indexes, in the worst case scenario (a whole index row is made out of emojis would be a good example). MySQL only considers this scenario, that's why it spits out errors.

@jhass

This comment has been minimized.

Copy link
Member

commented Jan 8, 2015

Okay, I see now how that transforms existing databases. How do we get to the same state when doing rake db:create db:schema:load?

@dimaursu

This comment has been minimized.

Copy link
Contributor Author

commented Jan 8, 2015

No clue about that. But if you give me an idea or some pointers, I will gladly add that too.

ActiveRecord::Base.connection.columns(table).each do |column|
# build a hash with all the columns that contain text
if (column.type == :string) || (column.type == :text)
UTF8_PAIRS[table] = { :name => column.name, :type => column.sql_type }

This comment has been minimized.

Copy link
@dimaursu

dimaursu Jan 11, 2015

Author Contributor

This is not good. We should have as key the column. This is a serious error from my side, sorry. Good thing I let the issue "bake" a little in my head, I noticed that something here is fishy :-)

@dimaursu dimaursu force-pushed the dimaursu:emoji branch from 20b78f6 to 87a8c8e Jan 13, 2015

t.integer "aspect_id", null: false
t.datetime "created_at", null: false
t.datetime "updated_at", null: false
t.string "shareable_type"

This comment has been minimized.

Copy link
@jhass

jhass Jan 13, 2015

Member

Where did the default went?

This comment has been minimized.

Copy link
@jhass

jhass Jan 19, 2015

Member

This is still an issue.


create_table "aspects", force: true do |t|
t.string "name", null: false
t.string "name"

This comment has been minimized.

Copy link
@jhass

jhass Jan 13, 2015

Member

Where did the not null constraint went?

This comment has been minimized.

Copy link
@jhass

jhass Jan 19, 2015

Member

This too is still an issue.

t.integer "user_id", null: false
t.string "jid", null: false
t.integer "user_id", null: false
t.string "jid", limit: 3071

This comment has been minimized.

Copy link
@jhass

jhass Jan 13, 2015

Member

Left over/typo? Also not null.

end

add_index "chat_contacts", ["user_id", "jid"], name: "index_chat_contacts_on_user_id_and_jid", unique: true, using: :btree
add_index "chat_contacts", ["user_id", "jid"], name: "index_chat_contacts_on_user_id_and_jid", length: {"user_id"=>nil, "jid"=>190}, using: :btree

This comment has been minimized.

Copy link
@jhass

jhass Jan 13, 2015

Member

uniqueness constraint went missing.

t.datetime "created_at", null: false
t.datetime "updated_at", null: false
t.integer "likes_count", default: 0, null: false
t.string "commentable_type", limit: 60

This comment has been minimized.

Copy link
@jhass

jhass Jan 14, 2015

Member

default, not null

@@ -366,7 +366,7 @@
t.string "provider_display_name"
t.string "actor_url"
t.string "objectId"
t.string "root_guid"
t.string "root_guid", limit: 64

This comment has been minimized.

Copy link
@jhass

jhass Jan 14, 2015

Member

New limit not present in migration.

This comment has been minimized.

Copy link
@jhass

jhass Jan 19, 2015

Member

Still an issue.

@@ -376,18 +376,18 @@
t.string "frame_name"
t.boolean "favorite", default: false
t.string "facebook_id"
t.string "tweet_id"
t.string "tweet_id", limit: 64

This comment has been minimized.

Copy link
@jhass

jhass Jan 14, 2015

Member

New limit

t.datetime "updated_at", null: false
t.boolean "hidden", default: false, null: false
t.integer "contact_id", null: false
t.string "shareable_type", limit: 60

This comment has been minimized.

Copy link
@jhass

jhass Jan 14, 2015

Member

default, not null

t.string "email", default: "", null: false
t.string "encrypted_password", default: "", null: false
t.string "email"
t.string "encrypted_password"

This comment has been minimized.

Copy link
@jhass

jhass Jan 14, 2015

Member

default, not null

add_index "users", ["email"], name: "index_users_on_email", using: :btree
add_index "users", ["invitation_service", "invitation_identifier"], name: "index_users_on_invitation_service_and_invitation_identifier", unique: true, using: :btree
add_index "users", ["email"], name: "index_users_on_email", length: {"email"=>191}, using: :btree
add_index "users", ["invitation_service", "invitation_identifier"], name: "index_users_on_invitation_service_and_invitation_identifier", unique: true, length: {"invitation_service"=>64, "invitation_identifier"=>nil}, using: :btree

This comment has been minimized.

Copy link
@jhass

jhass Jan 14, 2015

Member

The nil is intentional?

@dimaursu dimaursu force-pushed the dimaursu:emoji branch 5 times, most recently from 191833d to 31cf232 Jan 14, 2015

@@ -32,6 +32,10 @@ series and run our comprehensive test suite against it.
## Change in defaults.yml
The default for including jQuery from a CDN has changed. If you want to continue to include it from a CDN, please explicitly set the `jquery_cdn` setting to `true` in diaspora.yml.

## Change in database.yml
For MySQL databases, replace `charset: utf8` with `encoding: utf8mb4`. This is enables full UTF8 support (4bytes characters), including standard emoji characters. See `database.yml.example` for reference.
Also, do not forget to remove `collation: utf8_bin`. It will choose a compatible one automatically.

This comment has been minimized.

Copy link
@Flaburgan

Flaburgan Jan 14, 2015

Member

This is not precise enough imo. Where is that setting? Which file, which line? If you really need a podmin action, you should explain precisely what he should do.

@@ -32,6 +32,10 @@ series and run our comprehensive test suite against it.
## Change in defaults.yml
The default for including jQuery from a CDN has changed. If you want to continue to include it from a CDN, please explicitly set the `jquery_cdn` setting to `true` in diaspora.yml.

## Change in database.yml
For MySQL databases, replace `charset: utf8` with `encoding: utf8mb4` in the file `config/database.yml`. This is enables full UTF8 support (4bytes characters), including standard emoji characters. See `database.yml.example` for reference.
Also, do not forget to remove `collation: utf8_bin`. It will choose a compatible one automatically.

This comment has been minimized.

Copy link
@dimaursu

dimaursu Jan 14, 2015

Author Contributor

With database.yml.example at hand, I guess any podmin will figure this out.

@jhass

This comment has been minimized.

@denschub

This comment has been minimized.

Copy link
Member

commented Jan 15, 2015

As promised, ran the migration on a clone of my production database.

Size on disk before: 43062 MB
Size on disk after: 43167 MB

Full migration: 304.7380s, full log below. No performance differences as far as I can tell.

== 20150106050733 SetMysqlToUnicodeMb4: migrating =============================
-- remove_index("aspect_visibilities", {:name=>"shareable_and_aspect_id"})
   -> 0.0574s
-- add_index("aspect_visibilities", ["shareable_id", "shareable_type", "aspect_id"], {:name=>"shareable_and_aspect_id", :length=>{"shareable_type"=>189}, :using=>:btree})
   -> 16.0878s
-- remove_index("aspect_visibilities", {:name=>"index_aspect_visibilities_on_shareable_id_and_shareable_type"})
   -> 0.0500s
-- add_index("aspect_visibilities", ["shareable_id", "shareable_type"], {:name=>"index_aspect_visibilities_on_shareable_id_and_shareable_type", :length=>{"shareable_type"=>190}, :using=>:btree})
   -> 9.1848s
-- remove_index("chat_contacts", {:name=>"index_chat_contacts_on_user_id_and_jid"})
   -> 0.1278s
-- add_index("chat_contacts", ["user_id", "jid"], {:name=>"index_chat_contacts_on_user_id_and_jid", :length=>{"jid"=>190}, :using=>:btree, :unique=>true})
   -> 0.1719s
-- remove_index("comments", {:name=>"index_comments_on_guid"})
   -> 0.2913s
-- add_index("comments", ["guid"], {:name=>"index_comments_on_guid", :length=>{"guid"=>191}, :using=>:btree, :unique=>true})
   -> 33.2449s
-- remove_index("likes", {:name=>"index_likes_on_guid"})
   -> 0.3831s
-- add_index("likes", ["guid"], {:name=>"index_likes_on_guid", :length=>{"guid"=>191}, :using=>:btree, :unique=>true})
   -> 70.5352s
-- remove_index("o_embed_caches", {:name=>"index_o_embed_caches_on_url"})
   -> 0.0831s
-- add_index("o_embed_caches", ["url"], {:name=>"index_o_embed_caches_on_url", :length=>{"url"=>191}, :using=>:btree})
   -> 2.0182s
-- remove_index("participations", {:name=>"index_participations_on_guid"})
   -> 0.1053s
-- add_index("participations", ["guid"], {:name=>"index_participations_on_guid", :length=>{"guid"=>191}, :using=>:btree})
   -> 96.0637s
-- remove_index("people", {:name=>"index_people_on_diaspora_handle"})
   -> 0.0838s
-- add_index("people", ["diaspora_handle"], {:name=>"index_people_on_diaspora_handle", :unique=>true, :length=>{"diaspora_handle"=>191}})
   -> 0.9094s
-- remove_index("people", {:name=>"index_people_on_guid"})
   -> 0.0497s
-- add_index("people", ["guid"], {:name=>"index_people_on_guid", :length=>{"guid"=>191}, :using=>:btree, :unique=>true})
   -> 0.3075s
-- remove_index("photos", {:name=>"index_photos_on_status_message_guid"})
   -> 0.0415s
-- add_index("photos", ["status_message_guid"], {:name=>"index_photos_on_status_message_guid", :length=>{"status_message_guid"=>191}, :using=>:btree})
   -> 2.8110s
-- remove_index("posts", {:name=>"index_posts_on_guid"})
   -> 0.0748s
-- add_index("posts", ["guid"], {:name=>"index_posts_on_guid", :length=>{"guid"=>191}, :using=>:btree, :unique=>true})
   -> 26.3498s
-- remove_index("posts", {:name=>"index_posts_on_status_message_guid_and_pending"})
   -> 0.1532s
-- add_index("posts", ["status_message_guid", "pending"], {:name=>"index_posts_on_status_message_guid_and_pending", :length=>{"status_message_guid"=>190}, :using=>:btree})
   -> 6.0173s
-- remove_index("posts", {:name=>"index_posts_on_status_message_guid"})
   -> 0.0775s
-- add_index("posts", ["status_message_guid"], {:name=>"index_posts_on_status_message_guid", :length=>{"status_message_guid"=>191}, :using=>:btree})
   -> 6.4922s
-- remove_index("rails_admin_histories", {:name=>"index_rails_admin_histories"})
   -> 0.0716s
-- add_index("rails_admin_histories", ["item", "table", "month", "year"], {:name=>"index_rails_admin_histories", :length=>{"table"=>188}, :using=>:btree})
   -> 0.1415s
-- remove_index("schema_migrations", {:name=>"unique_schema_migrations"})
   -> 0.7815s
-- add_index("schema_migrations", ["version"], {:name=>"unique_schema_migrations", :length=>{"version"=>191}, :using=>:btree})
   -> 0.2091s
-- remove_index("services", {:name=>"index_services_on_type_and_uid"})
   -> 0.0873s
-- add_index("services", ["type", "uid"], {:name=>"index_services_on_type_and_uid", :length=>{"type"=>64, "uid"=>127}, :using=>:btree})
   -> 0.2988s
-- remove_index("taggings", {:name=>"index_taggings_on_taggable_id_and_taggable_type_and_context"})
   -> 0.0946s
-- add_index("taggings", ["taggable_id", "taggable_type", "context"], {:name=>"index_taggings_on_taggable_id_and_taggable_type_and_context", :length=>{"taggable_type"=>95, "context"=>95}, :using=>:btree})
   -> 23.9894s
-- remove_index("tags", {:name=>"index_tags_on_name"})
   -> 0.0702s
-- add_index("tags", ["name"], {:name=>"index_tags_on_name", :length=>{"name"=>191}, :using=>:btree, :unique=>true})
   -> 1.5240s
-- remove_index("users", {:name=>"index_users_on_invitation_service_and_invitation_identifier"})
   -> 0.1581s
-- add_index("users", ["invitation_service", "invitation_identifier"], {:name=>"index_users_on_invitation_service_and_invitation_identifier", :length=>{"invitation_service"=>64, "invitation_identifier"=>127}, :using=>:btree, :unique=>true})
   -> 0.5600s
-- remove_index("users", {:name=>"index_users_on_username"})
   -> 0.1131s
-- add_index("users", ["username"], {:name=>"index_users_on_username", :length=>{"username"=>191}, :using=>:btree, :unique=>true})
   -> 0.2943s
-- remove_index("users", {:name=>"index_users_on_email"})
   -> 0.1425s
-- add_index("users", ["email"], {:name=>"index_users_on_email", :length=>{"email"=>191}, :using=>:btree})
   -> 0.1858s
-- remove_index("notifications", {:name=>"index_notifications_on_target_type_and_target_id"})
   -> 0.1081s
-- add_index("notifications", ["target_type", "target_id"], {:name=>"index_notifications_on_target_type_and_target_id", :length=>{"target_type"=>190}, :using=>:btree})
   -> 4.1315s
== 20150106050733 SetMysqlToUnicodeMb4: migrated (304.7380s) ==================

👍

@dimaursu dimaursu force-pushed the dimaursu:emoji branch from f93b745 to fa873d0 Jan 18, 2015

dimaursu added 3 commits Jan 19, 2015
Add warning about encoding change
Add warning about MySQL collation

Fix database index length

This allows new databases to be created with utf8mb4, on MySQL. The maximum
column size is 767 bytes. Each character is 4 bytes long -> 767 / 4 = 191
characters for the column.
Refactor & DRY encoding migration
Dynamic row for MySQL

Set larger column index sizes

@dimaursu dimaursu force-pushed the dimaursu:emoji branch from fa873d0 to c2eb709 Jan 19, 2015

@@ -155,7 +155,7 @@
t.string "service"
t.string "identifier"
t.boolean "admin", default: false
t.string "language", default: "en"
t.string "language"

This comment has been minimized.

Copy link
@jhass

jhass Jan 19, 2015

Member

This one is making the tests fail, which doesn't mean the other ones shouldn't be fixed too.

@dimaursu dimaursu force-pushed the dimaursu:emoji branch from c2eb709 to a85f296 Jan 20, 2015

Shorten indexes
Fix merge conflict

@dimaursu dimaursu force-pushed the dimaursu:emoji branch from a85f296 to 28fdba5 Jan 20, 2015

@jhass

This comment has been minimized.

Copy link
Member

commented Jan 21, 2015

Alright, let's risk this :P

Thank you!

jhass added a commit that referenced this pull request Jan 21, 2015
Merge pull request #5530 from dimaursu/emoji
Convert MySQL fields to 4 byte unicode

@jhass jhass merged commit a7d652c into diaspora:develop Jan 21, 2015

1 check passed

continuous-integration/travis-ci The Travis CI build passed
Details

@jhass jhass added this to the next-major milestone Jan 21, 2015

@Flaburgan

This comment has been minimized.

Copy link
Member

commented Jan 21, 2015

I run that on diaspora-fr and have http://paste.mrzyx.de/pxfy0zy2m

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
5 participants
You can’t perform that action at this time.