Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MySQL connection not set to UTF-8? #5945

Closed
Manc opened this issue Oct 14, 2015 · 13 comments
Closed

MySQL connection not set to UTF-8? #5945

Manc opened this issue Oct 14, 2015 · 13 comments
Assignees
Labels
bug [triage] something behaving unexpectedly

Comments

@Manc
Copy link

Manc commented Oct 14, 2015

I believe, Ghost does not set the connection to utf8 when connecting to the MySQL server, which results in UTF-8 characters being displayed as "?" or similar characters, if the MySQL server itself is not configured to use UTF-8 as default. I am not 100% sure if this is an issue with Ghost or something else, but I've got the feeling that it is a Ghost problem.

I have converted my Ghost database to MySQL locally. Everything was fine there. Then I moved the database to a different server, where the characters where then interpreted wrongly.

I tripple-checked that the database and the tables used for Ghost were set to UTF-8, that the data in the tables were correctly encoded, that the output HTML page is served with UTF-8 encoding etc. Everything was as it should be.

The only difference between my local MySQL server and the production server was that in the server configuration itself the local server had all its defaults set to utf8 and the production server was set to Latin-1. Only when I changed the production MySQL server config and restarted it, the content was displayed correctly. I did not have to change anything else or re-import the tables.

Because the only way to fix this was to change the MySQL server config and any other MySQL client could read the UTF-8 encoded data correctly without the defaults being changed, I believe that Ghost doesn't set the connection to UTF-8, something like SET NAMES utf8.

My database object in the config.js file:

database: {
    client: 'mysql',
    connection: {
        host: '...',
        port: '...',
        user: '...',
        password: '...',
        database: '...',
        charset: 'utf8'
    }
}

I could solve this issue, so that's no personal problem for me, but I think it should always work for anybody, no matter what the MySQL default settings are. Some users may not have access to the config or must use a shared MySQL server where they can't change the defaults.

@ErisDS
Copy link
Member

ErisDS commented Oct 15, 2015

Since the first version of Ghost we have included charset: 'utf8' in the configuration. This is used by bookshelf/knex to handle encoding data sent to the database. However, unfortunately the version of the mysql binding library that was available at that time simply ignored this setting because it was lowercase and used whatever the database was set to.

This has put us in a position where, unless your database was set to utf8, all of your data will have been double encoded, as data is sent in utf8 but stored however your MySQL configuration says to store it.

This is only an issue for anyone using MySQL who does not have their DB set to utf8. We are aware of the problem, and at some point need to write a migration path so that we can upgrade the version of the mysql bindings to a version which does use the utf8 setting provided by Ghost. However, that's not a trivial task to fix something which doesn't cause problems with running Ghost itself.

In this case, you're hitting the problem because you're trying to migrate data between two databases with different settings. Short term you have two options: 1) make sure the databases have the same configuration or 2) migrate using Ghost's import/export tools instead of moving the DB

This issue is related to #5856 - the main difficulty this causes is that non-ascii chars in the database will appear unreadable if navigating the DB directly, because of the double encoding.

@szelpe
Copy link
Contributor

szelpe commented Feb 15, 2016

Wow, this was really hard to solve! I think this should be included in the installation manual somewhere.

Anyone else encountering this problem, this is how I was able to solve it, following the steps described here: http://www.whitesmith.co/blog/latin1-to-utf8/

Basically the steps are:

  • stop your ghost blog
  • use mysqldump with --default-character-set=latin1 to create a backup.sql
  • open backup.sql with a text editor and replace every occurrence of latin1 with utf8 (there will be two: after SET NAMES and after creating the database)
  • edit you MySQL configuration (e.g. my.cnf) and add set it to use utf8 by default:
[mysqld]
character-set-server    = utf8
collation-server        = utf8_unicode_ci
  • drop your ghost database
  • import the fixed backup.sql
  • set up user privileges
  • start your ghost blog

Everything should be fine now.

@ErisDS
Copy link
Member

ErisDS commented Jun 3, 2016

I was just re-reading this and figured I should point out, the steps above only work if your database was set to latin1 - there are other possibilities so you should check what your DB was set to before running anything like this as you could end up triple-encoding 🤐

@Kikobeats
Copy link

I have the issue after migrate to Ghost 0.9. Now my latin characters looks like ?.

It's so strange because with the same database data and configuration, in local works fine.

Notes also that the database data is correct, just is a problem retrieving from the client.

@kirrg001
Copy link
Contributor

kirrg001 commented Aug 3, 2016

Hey @Kikobeats Sorry to hear you are having trouble.
Did you try #5945 (comment)?

@Kikobeats
Copy link

@kirrg001 yes, but in my case I did:

  • update mysql dep to lastest version.
  • npm shrinkwrap.

And now works fine!

@riveraja
Copy link

riveraja commented Dec 1, 2016

Still got bit by this bug, the correct way (afaik) to declare utf8 is:

charset: 'UTF8_GENERAL_CI'

In part this bug is related to knex/knex#168 (comment)

It would be good if someone could edit the documentation in http://support.ghost.org/config/ if it's official.

@ErisDS
Copy link
Member

ErisDS commented Jan 5, 2017

@riveraja this bug is still open because the problem still exists. All Ghost blogs running on MySQL currently have double encoding. This is annoying, yes, but doesn't actually break anything.

The issue will be resolved in Ghost 1.0 and as everyone will have to go through a migration process, we'll have the perfect opportunity to resolve the issue for every blog running on MySQL, which we've never had previously.

@kirrg001 kirrg001 self-assigned this Jan 18, 2017
@kirrg001
Copy link
Contributor

kirrg001 commented Jan 19, 2017

Encoding can be a pain, same as timezones. As @ErisDS pointed out, this is only interesting for people who didn't configure their database to utf8.

So MySQL has fixed their charset issue in 2.3.0. In any previous version the utf8 charset option was not accepted and that has caused the connection to be set what your database is configured to. See https://github.com/mysqljs/mysql/blob/master/Changes.md#v230-2014-05-16

Actually if your database is configured to e.g latin1, it's not possible to store russian characters, as this is not supported. But with Ghost LTS you can insert any utf8 character and it's displayed correctly. I assume this is why the tables are created correctly, because the utf8 charset option was not ignored in Knex (see https://github.com/tgriesser/knex/blob/master/src/dialects/mysql/schema/tablecompiler.js#L38). You can see that by looking at the collation and charset option in the table schema.

If your database is configured to e.g latin1 and you change your database configuration to utf8, you won't solve the problem, because you would see that everything is double encoded. Updating then to the latest MySQL version, won't help either.

So here is my suggestion
I only found one true solution to this problem. Dumping your database with --skip-set-charset option, reimport with utf8 encoding and then you are able to use the new MySQL version.
http://dev.mysql.com/doc/refman/5.7/en/mysqldump.html#option_mysqldump_set-charset

Ghost 1.0.0 already uses the MySQL version >= 2.3.0 and as described, LTS users, which are affected, cannot simply update to the newer MySQL version. So the solution i am suggesting is, to not update the MySQL version for LTS users, but care about this problem and make it part of the migration path to 1.0.0.

mysqldump -u root -p --skip-set-charset DATABASE_NAME > ghost-dump.sql
mysql -u root -p --default-character-set=utf8 DATABASE_NAME < ghost-dump.sql

No matter if your database is configured to utf8 or latin1 or something else, you will get a dump of your database in the configured encoding and then import as utf8.

Ghost 1.0.0 uses utf8mb4 encoding, but that shouldn't make any difference.

@ErisDS
Copy link
Member

ErisDS commented Jan 20, 2017

This is great thanks 👍

When considering the migration path from LTS to 1.0, if we use an import -> export mechanism, this won't be a problem.

However, if we create a direct migration script we will need to include a check to see if a user was using MySQL and their DB was not utf8 and in that case ensure we provide a tool or step to handle this, recreating the MySQL database with the correct setting.

Anyone who starts on 1.0 won't encounter this.

The only TODOs here for Ghost 1.0 / Ghost-CLI is to:

  • LTS dump and reimport with utf8mb4
  • Triple check the default configuration for Ghost is w/ MySQL and sets the db to utf8mb4 correctly (I think there are 2 settings, one for encoding, one for the connection?!)
  • Determine, is there some sort of check we should build into Ghost-CLI to ensure the DB is configured correctly?
  • Upgrade the MySQL dependency (+ delete ignore GK PR's)

@kirrg001
Copy link
Contributor

However, if we create a direct migration script we will need to include a check to see if a user was using MySQL and their DB was not utf8 and in that case ensure we provide a tool or step to handle this, recreating the MySQL database with the correct setting.

The commands (mysqldump and reimport) should work for every user with MySQL. So there is the option to make this part of a migration script, if we decide to go with a script.

I will think and care about the TODO's you have listed 👍

@kirrg001
Copy link
Contributor

kirrg001 commented Jan 22, 2017

Determine, is there some sort of check we should build into Ghost-CLI to ensure the DB is configured correctly?

That is not needed in my opinion. It doesn't matter what your MySQL database is configured to. As long as we

  1. set the encoding for the client connection correctly (done by using >= 2.3.0 MySQL)
  2. create the tables with the correct encoding (was always the case)

@kirrg001
Copy link
Contributor

Closing. We have opened a migration path issue, which keeps track of the notes here, see #8141.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug [triage] something behaving unexpectedly
Projects
None yet
Development

No branches or pull requests

6 participants