Conditionaly upgrade utf8 to utf8mb4 for MySQL 5.5.3 #317

Closed
wants to merge 6 commits into
from

Conversation

Projects
None yet
5 participants
@nicolas-grekas
Contributor

nicolas-grekas commented May 15, 2013

See http://dev.mysql.com/doc/refman/5.5/en/charset-unicode-utf8mb4.html

As utf8mb4 is a superset of utf8, this should be transparent and backward compatible.
For those really requiring the "utf8" meant by MySQL, they can use explicitely the utf8mb3 charset.
But IMHO by default, Doctrine should really use utf8mb4, which is what everybody expect from a charset named "utf8".

Conditionaly upgrade utf8 to utf8mb4 for MySQL 5.5.3
See http://dev.mysql.com/doc/refman/5.5/en/charset-unicode-utf8mb4.html

As utf8mb4 is a superset of utf8, this should be transparent and backward compatible.
For those really requiring the "utf8" meant by MySQL, they can use explicitely the utf8mb3 charset.
But IMHO by default, Doctrine should really use utf8mb4, which is what everybody expect from a charset named "utf8".
@nicolas-grekas

This comment has been minimized.

Show comment
Hide comment
Contributor

nicolas-grekas commented May 15, 2013

@Ocramius

View changes

lib/Doctrine/DBAL/Event/Listeners/MysqlSessionInit.php
@@ -53,8 +53,8 @@ class MysqlSessionInit implements EventSubscriber
*/
public function __construct($charset = 'utf8', $collation = false)
{
- $this->_charset = $charset;
- $this->_collation = $collation;
+ $this->_charset = strtolower($charset);

This comment has been minimized.

@Ocramius

Ocramius May 15, 2013

Member

Align = signs

@Ocramius

Ocramius May 15, 2013

Member

Align = signs

@Ocramius

View changes

lib/Doctrine/DBAL/Event/Listeners/MysqlSessionInit.php
@@ -63,8 +63,18 @@ public function __construct($charset = 'utf8', $collation = false)
*/
public function postConnect(ConnectionEventArgs $args)
{
- $collation = ($this->_collation) ? " COLLATE ".$this->_collation : "";
- $args->getConnection()->executeUpdate("SET NAMES ".$this->_charset . $collation);
+ $collation = ($this->_collation) ? " COLLATE ".$this->_collation : " ";

This comment has been minimized.

@Ocramius

Ocramius May 15, 2013

Member

No need for the first parenthetical

@Ocramius

Ocramius May 15, 2013

Member

No need for the first parenthetical

This comment has been minimized.

@Ocramius

Ocramius May 15, 2013

Member

CS: ' COLLATE ' . $this->_collation

@Ocramius

Ocramius May 15, 2013

Member

CS: ' COLLATE ' . $this->_collation

@Ocramius

View changes

lib/Doctrine/DBAL/Event/Listeners/MysqlSessionInit.php
- $collation = ($this->_collation) ? " COLLATE ".$this->_collation : "";
- $args->getConnection()->executeUpdate("SET NAMES ".$this->_charset . $collation);
+ $collation = ($this->_collation) ? " COLLATE ".$this->_collation : " ";
+ $sql = "SET NAMES ".$this->_charset . $collation;

This comment has been minimized.

@Ocramius

Ocramius May 15, 2013

Member

CS: $sql = 'SET NAMES ' . $this->_charset . $collation;

@Ocramius

Ocramius May 15, 2013

Member

CS: $sql = 'SET NAMES ' . $this->_charset . $collation;

@Ocramius

View changes

lib/Doctrine/DBAL/Event/Listeners/MysqlSessionInit.php
+ $mb4 = str_replace('utf8 ', 'utf8mb4 ', $sql);
+ $mb4 = str_replace('utf8_', 'utf8mb4_', $mb4);
+
+ if ($mb4 !== $sql)

This comment has been minimized.

@Ocramius

Ocramius May 15, 2013

Member

CS: if ($mb4 !== $sql) {

@Ocramius

Ocramius May 15, 2013

Member

CS: if ($mb4 !== $sql) {

@Ocramius

View changes

lib/Doctrine/DBAL/Event/Listeners/MysqlSessionInit.php
+ $sql = 'SET NAMES ' . $this->_charset . $collation;
+
+ $mb4 = str_replace('utf8 ', 'utf8mb4 ', $sql);
+ $mb4 = str_replace('utf8_', 'utf8mb4_', $mb4);

This comment has been minimized.

@Ocramius

Ocramius May 15, 2013

Member

Why are you replacing the same stuff in the same string with two method calls instead of using str_replace's ability to use replacements arrays?

@Ocramius

Ocramius May 15, 2013

Member

Why are you replacing the same stuff in the same string with two method calls instead of using str_replace's ability to use replacements arrays?

This comment has been minimized.

@nicolas-grekas

nicolas-grekas May 15, 2013

Contributor

No reason, just a matter of preference. I can change that if required.

@nicolas-grekas

nicolas-grekas May 15, 2013

Contributor

No reason, just a matter of preference. I can change that if required.

This comment has been minimized.

@Ocramius

Ocramius May 15, 2013

Member

Do it please, it wasn't obvious when first reading it.

@Ocramius

Ocramius May 15, 2013

Member

Do it please, it wasn't obvious when first reading it.

@beberlei

This comment has been minimized.

Show comment
Hide comment
@beberlei

beberlei May 15, 2013

Member

I don't think this should happen magically. Developers should do this explicitly themselves

Member

beberlei commented May 15, 2013

I don't think this should happen magically. Developers should do this explicitly themselves

@nicolas-grekas

This comment has been minimized.

Show comment
Hide comment
@nicolas-grekas

nicolas-grekas May 15, 2013

Contributor

Well, what I personally think is that Doctrine should do something about utf8mb4. What exactly is the purpose of this pull request.
My HO is that when people write (or stick to the default) "utf8", they really mean UTF-8 from Unicode.
Also, nobody expect to loose data, even high plane Unicode characters. Think http://www.fileformat.info/info/unicode/char/1f4a9/index.htm for example ;-)
Then, when people get educated that utf8===utf8mb3 and Unicode-UTF-8===utf8mb4, they can choose.
My patch is coded with this "learned path" in mind.

At least the default for Doctrine should be safe for any Unicode-UTF-8 string.
The only pb is that utf8mb4 exists since MySQL 5.5.3, and Doctrine has a lower requirement for MySQL server version.
So to deal with that, we could either use PDO::getAttribute(PDO::ATTR_SERVER_VERSION) (would be required to upgrade MySqlPlatform.php), or these conditional comment tricks that the MySQL parser allows.

Contributor

nicolas-grekas commented May 15, 2013

Well, what I personally think is that Doctrine should do something about utf8mb4. What exactly is the purpose of this pull request.
My HO is that when people write (or stick to the default) "utf8", they really mean UTF-8 from Unicode.
Also, nobody expect to loose data, even high plane Unicode characters. Think http://www.fileformat.info/info/unicode/char/1f4a9/index.htm for example ;-)
Then, when people get educated that utf8===utf8mb3 and Unicode-UTF-8===utf8mb4, they can choose.
My patch is coded with this "learned path" in mind.

At least the default for Doctrine should be safe for any Unicode-UTF-8 string.
The only pb is that utf8mb4 exists since MySQL 5.5.3, and Doctrine has a lower requirement for MySQL server version.
So to deal with that, we could either use PDO::getAttribute(PDO::ATTR_SERVER_VERSION) (would be required to upgrade MySqlPlatform.php), or these conditional comment tricks that the MySQL parser allows.

@nicolas-grekas

This comment has been minimized.

Show comment
Hide comment
@nicolas-grekas

nicolas-grekas May 22, 2013

Contributor

So, just to be consistent, I updated my patch so that both mysqli and pdomysql drivers also upgrade to utf8mb4 when possible.

Contributor

nicolas-grekas commented May 22, 2013

So, just to be consistent, I updated my patch so that both mysqli and pdomysql drivers also upgrade to utf8mb4 when possible.

@beberlei

This comment has been minimized.

Show comment
Hide comment
@beberlei

beberlei May 26, 2013

Member

Sorry, but I think this is too dangerous. This is something we need to keep developers deciding on.

Member

beberlei commented May 26, 2013

Sorry, but I think this is too dangerous. This is something we need to keep developers deciding on.

@beberlei beberlei closed this May 26, 2013

@gagarine

This comment has been minimized.

Show comment
Hide comment
@gagarine

gagarine Jul 18, 2013

But how can you force utf8mb4 when you create a schema?

But how can you force utf8mb4 when you create a schema?

@beberlei

This comment has been minimized.

Show comment
Hide comment
@beberlei

beberlei Jul 18, 2013

Member

@gagarine Doctrine does not create a schema, you have to do this yourself anyways. At that point you can do it.

Member

beberlei commented Jul 18, 2013

@gagarine Doctrine does not create a schema, you have to do this yourself anyways. At that point you can do it.

@gagarine

This comment has been minimized.

Show comment
Hide comment
@gagarine

gagarine Jul 19, 2013

@beberlei I used \Doctrine\DBAL\Schema\Schema();
$schema->toSql()
$app['db']->exec();

The sql provided by "toSql' was using utf8 encoding and I don't see anyway to force utf8mb4 even if I create the Database by hand using utf8mb4.

I even created the table by hand. But after inserting was not working neither. Look like the connexion encoding is not right. How can I test it?

I'm very new to doctrine and perhaps is not the place to get support... any-pointer would be welcome.

@beberlei I used \Doctrine\DBAL\Schema\Schema();
$schema->toSql()
$app['db']->exec();

The sql provided by "toSql' was using utf8 encoding and I don't see anyway to force utf8mb4 even if I create the Database by hand using utf8mb4.

I even created the table by hand. But after inserting was not working neither. Look like the connexion encoding is not right. How can I test it?

I'm very new to doctrine and perhaps is not the place to get support... any-pointer would be welcome.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment