-
Notifications
You must be signed in to change notification settings - Fork 2.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
UTF8 Encoded SQL Scripts in initdb #131
Comments
I'm having a similar issue. I can set character_set_client='utf8';
set character_set_connection='utf8';
set character_set_database='utf8';
set character_set_results='utf8';
set character_set_server='utf8';
show variables like 'char%'; Based on some reading, I think the solution is right here: The container operating system is not set, or not set to utf8. This line sets it. If that works and you're willing to merge it, I can try it out and submit a pull request. |
Any updates on this? I would prefer if I don't have to build my MariaDB container locally with a patched Dockerfile. |
I'd recommend adding See also #32 (comment) and docker-library/docs#502. |
This does not solve the problem. With using the two flags, I get: -- init.sql
CREATE TABLE `pages` (
`title` varchar(75) NOT NULL,
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
INSERT INTO `pages` (`title`) VALUES
('Så fungerar det'); > select title from pages limit 1;
+--------------------+
| title |
+--------------------+
| SÃ¥ fungerar det |
+--------------------+
1 row in set (0.00 sec) I believe the problem is not the default database character encoding setting, but as mentioned above, that the moment of initial import which happens from the shell towards the database imports the database with the wrong character encoding. As one can see I already set specifically for the table the utf8 character set. |
@djui - nice simple test, though you have an extra comma after On Ubuntu 16.04 running his test I get select title from pages;
+----------------+
| title |
+----------------+
| S fungerar det |
+----------------+ The issue may be with the terminal in the docker container and not in MySQL directly because I don't seem to have problems if I |
Correct, the issue is not MySQL, but the moment the terminal (shell) imports the SQL statements. One simple proof (in this case for MariaDB, but same for MySQL): # docker-compose.yml
:
db:
#image: mariadb:latest
build:
context: .
dockerfile: Dockerfile.mariadb
ports:
- '3306:3306'
environment:
- MYSQL_ROOT_PASSWORD=secret
- MYSQL_DATABASE=foo
- MYSQL_USER=bar
- MYSQL_PASSWORD=baz
volumes:
- data:/var/lib/mysql
- ./init.sql.gz:/docker-entrypoint-initdb.d/init.sql.gz
: # Dockerfile.mariadb
FROM mariadb:latest
# make the "en_US.UTF-8" locale so ~postgres~ mysql/mariadb will be utf-8 enabled by default
RUN apt-get update && apt-get install -y locales && rm -rf /var/lib/apt/lists/* $
&& localedef -i en_US -c -f UTF-8 -A /usr/share/locale/locale.alias en_US.UTF-8
ENV LANG en_US.utf8 -- init.sql
CREATE TABLE `pages` (
`title` varchar(75) NOT NULL,
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
INSERT INTO `pages` (`title`) VALUES
('Så fungerar det'); MariaDB [foo]> select title from pages limit 1;
+------------------+
| title |
+------------------+
| Så fungerar det |
+------------------+
1 row in set (0.00 sec) |
Can you try again adding |
I tried |
@tianon |
Here comes the no-brainer one liner fix for dummies like me. Place it to your Dockerfile. |
Closing, since this is solved! 👍 |
Can't we have this mostly harmless setting built into the official image? I don't believe it will impact existing scripts since UTF-8 is ASCII compatible. |
I will do a pull request during the New Years holiday if nobody beats me to it. I have to still build it and test it. |
I guess it depends on what the impact of setting If |
Alternatively, make the change that I noted at the top and have no potential side effects beyond the mysql command. Hmm. |
As I understand setting LANG to C.UTF-8 makes mysql client to use utf-8 by default. But I think that better option would be set up this option directly in the mysql config file alongside with the server options. Like this: ; /etc/mysql/conf.d/utf8.cnf
[mysqld]
init_connect=‘SET collation_connection = utf8_unicode_ci’
character-set-server = utf8
collation-server = utf8_unicode_ci
[client]
default-character-set = utf8
|
Just in case someone is interested in writing a proper docker-compose.yml that supports Unicode in a MySQL-compatible database, they may use something like services:
db:
image: mariadb:10.4.8
command: --character-set-server=utf8mb4 --collation-server=utf8mb4_unicode_ci
environment:
LANG: C.UTF-8
MYSQL_ROOT_PASSWORD: root
volumes:
- ./db:/docker-entrypoint-initdb.d:ro |
Setting The easiest way I have found to support non BMP characters in the sql files in /docker-entrypoint-initdb.d/ is to prefix the files with |
For anyone lost in this thread having problems with cyrillic chars or special chars being changed to special non sense characters at init, using docker-compose: The only thing you need to fix it is : This fixed my problem with init.sql in /docker-entrypoint-initdb.d with cyrillic texts |
I follow your discuss , but mysql prints out like this. mysql> show variables like 'char%';
+--------------------------+----------------------------+
| Variable_name | Value |
+--------------------------+----------------------------+
| character_set_client | utf8 |
| character_set_connection | utf8 |
| character_set_database | latin1 |
| character_set_filesystem | binary |
| character_set_results | utf8 |
| character_set_server | latin1 |
| character_set_system | utf8 |
| character_sets_dir | /usr/share/mysql/charsets/ |
+--------------------------+----------------------------+
8 rows in set (0.01 sec) It doesn't work for database and server. So I add a configuaration in docker-compose.yml as the following. command: --character-set-server=utf8mb4 --collation-server=utf8mb4_unicode_ci The whole yaml file is: version: "3"
services:
mysql:
container_name: mysql
image: mysql:5.7
environment:
MYSQL_ROOT_PASSWORD: root
LANG: C.UTF-8
volumes:
- ./mysql:/var/lib/mysql
ports:
- "3306:3306"
command: --character-set-server=utf8mb4 --collation-server=utf8mb4_unicode_ci Here is the final output. mysql> show variables like 'char%';
+--------------------------+----------------------------+
| Variable_name | Value |
+--------------------------+----------------------------+
| character_set_client | utf8 |
| character_set_connection | utf8 |
| character_set_database | utf8mb4 |
| character_set_filesystem | binary |
| character_set_results | utf8 |
| character_set_server | utf8mb4 |
| character_set_system | utf8 |
| character_sets_dir | /usr/share/mysql/charsets/ |
+--------------------------+----------------------------+
8 rows in set (0.01 sec) |
In case you also want to autoimport .sql file (for example From the docs:
If file is big with very long values to handle then the import may fail with 'MySql server has gone away' error. To avoid this just set bigger version: "3"
services:
mysql:
container_name: mysql
image: mysql:5.7
environment:
MYSQL_ROOT_PASSWORD: root
LANG: C.UTF-8
volumes:
- ./mysql:/var/lib/mysql
- ./db-import:/docker-entrypoint-initdb.d
ports:
- "3306:3306"
command: [ "--character-set-server=utf8mb4", "--collation-server=utf8mb4_unicode_ci", "--max-allowed-packet=67108864" ] |
Solved my problem. thank you. |
This |
Great! On my side, same result (of course) with |
I recently completed a project that populated the database with UTF8 encoded text. My script was able to create the database using the correct character-set, but the text data could not be properly imported. Apparently, the initdb scripts are assumed to be ascii encoded, so I had to make a minor modification to
docker-entrypoint.sh
.mysql=( mysql --protocol=socket --default_character_set=utf8 -uroot )
This change was not a big deal for me, but I am raising the question as to whether this would be a good default. I can't think of any way it breaks compatibility with existing scripts, and the new default would provide a real benefit to anyone pre-populating with UTF8 encoded data.
Of course, an even better solution may be to use a variable.
Please note, this is different from the discussion on the initial database's character-set (#14). This is a discussion about the encoding of the
docker-entrypoint-initdb.d
script files.The text was updated successfully, but these errors were encountered: