Skip to content

Loading…

Conditionaly upgrade utf8 to utf8mb4 for MySQL 5.5.3 #317

Closed
wants to merge 6 commits into from

5 participants

@nicolas-grekas

See http://dev.mysql.com/doc/refman/5.5/en/charset-unicode-utf8mb4.html

As utf8mb4 is a superset of utf8, this should be transparent and backward compatible.
For those really requiring the "utf8" meant by MySQL, they can use explicitely the utf8mb3 charset.
But IMHO by default, Doctrine should really use utf8mb4, which is what everybody expect from a charset named "utf8".

@nicolas-grekas nicolas-grekas Conditionaly upgrade utf8 to utf8mb4 for MySQL 5.5.3
See http://dev.mysql.com/doc/refman/5.5/en/charset-unicode-utf8mb4.html

As utf8mb4 is a superset of utf8, this should be transparent and backward compatible.
For those really requiring the "utf8" meant by MySQL, they can use explicitely the utf8mb3 charset.
But IMHO by default, Doctrine should really use utf8mb4, which is what everybody expect from a charset named "utf8".
abbcbdc
@Ocramius Ocramius commented on an outdated diff
lib/Doctrine/DBAL/Event/Listeners/MysqlSessionInit.php
@@ -53,8 +53,8 @@ class MysqlSessionInit implements EventSubscriber
*/
public function __construct($charset = 'utf8', $collation = false)
{
- $this->_charset = $charset;
- $this->_collation = $collation;
+ $this->_charset = strtolower($charset);
@Ocramius Doctrine member

Align = signs

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
@Ocramius Ocramius commented on an outdated diff
lib/Doctrine/DBAL/Event/Listeners/MysqlSessionInit.php
@@ -63,8 +63,18 @@ public function __construct($charset = 'utf8', $collation = false)
*/
public function postConnect(ConnectionEventArgs $args)
{
- $collation = ($this->_collation) ? " COLLATE ".$this->_collation : "";
- $args->getConnection()->executeUpdate("SET NAMES ".$this->_charset . $collation);
+ $collation = ($this->_collation) ? " COLLATE ".$this->_collation : " ";
@Ocramius Doctrine member

No need for the first parenthetical

@Ocramius Doctrine member

CS: ' COLLATE ' . $this->_collation

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
@Ocramius Ocramius commented on an outdated diff
lib/Doctrine/DBAL/Event/Listeners/MysqlSessionInit.php
@@ -63,8 +63,18 @@ public function __construct($charset = 'utf8', $collation = false)
*/
public function postConnect(ConnectionEventArgs $args)
{
- $collation = ($this->_collation) ? " COLLATE ".$this->_collation : "";
- $args->getConnection()->executeUpdate("SET NAMES ".$this->_charset . $collation);
+ $collation = ($this->_collation) ? " COLLATE ".$this->_collation : " ";
+ $sql = "SET NAMES ".$this->_charset . $collation;
@Ocramius Doctrine member

CS: $sql = 'SET NAMES ' . $this->_charset . $collation;

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
@Ocramius Ocramius commented on an outdated diff
lib/Doctrine/DBAL/Event/Listeners/MysqlSessionInit.php
@@ -63,8 +63,18 @@ public function __construct($charset = 'utf8', $collation = false)
*/
public function postConnect(ConnectionEventArgs $args)
{
- $collation = ($this->_collation) ? " COLLATE ".$this->_collation : "";
- $args->getConnection()->executeUpdate("SET NAMES ".$this->_charset . $collation);
+ $collation = $this->_collation ? ' COLLATE ' . $this->_collation : ' ';
+ $sql = 'SET NAMES ' . $this->_charset . $collation;
+
+ $mb4 = str_replace('utf8 ', 'utf8mb4 ', $sql);
+ $mb4 = str_replace('utf8_', 'utf8mb4_', $mb4);
+
+ if ($mb4 !== $sql)
@Ocramius Doctrine member

CS: if ($mb4 !== $sql) {

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
@Ocramius Ocramius commented on an outdated diff
lib/Doctrine/DBAL/Event/Listeners/MysqlSessionInit.php
@@ -63,8 +63,18 @@ public function __construct($charset = 'utf8', $collation = false)
*/
public function postConnect(ConnectionEventArgs $args)
{
- $collation = ($this->_collation) ? " COLLATE ".$this->_collation : "";
- $args->getConnection()->executeUpdate("SET NAMES ".$this->_charset . $collation);
+ $collation = $this->_collation ? ' COLLATE ' . $this->_collation : ' ';
+ $sql = 'SET NAMES ' . $this->_charset . $collation;
+
+ $mb4 = str_replace('utf8 ', 'utf8mb4 ', $sql);
+ $mb4 = str_replace('utf8_', 'utf8mb4_', $mb4);
@Ocramius Doctrine member

Why are you replacing the same stuff in the same string with two method calls instead of using str_replace's ability to use replacements arrays?

No reason, just a matter of preference. I can change that if required.

@Ocramius Doctrine member

Do it please, it wasn't obvious when first reading it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
@beberlei
Doctrine member

I don't think this should happen magically. Developers should do this explicitly themselves

@nicolas-grekas

Well, what I personally think is that Doctrine should do something about utf8mb4. What exactly is the purpose of this pull request.
My HO is that when people write (or stick to the default) "utf8", they really mean UTF-8 from Unicode.
Also, nobody expect to loose data, even high plane Unicode characters. Think http://www.fileformat.info/info/unicode/char/1f4a9/index.htm for example ;-)
Then, when people get educated that utf8===utf8mb3 and Unicode-UTF-8===utf8mb4, they can choose.
My patch is coded with this "learned path" in mind.

At least the default for Doctrine should be safe for any Unicode-UTF-8 string.
The only pb is that utf8mb4 exists since MySQL 5.5.3, and Doctrine has a lower requirement for MySQL server version.
So to deal with that, we could either use PDO::getAttribute(PDO::ATTR_SERVER_VERSION) (would be required to upgrade MySqlPlatform.php), or these conditional comment tricks that the MySQL parser allows.

@nicolas-grekas

So, just to be consistent, I updated my patch so that both mysqli and pdomysql drivers also upgrade to utf8mb4 when possible.

@beberlei
Doctrine member

Sorry, but I think this is too dangerous. This is something we need to keep developers deciding on.

@beberlei beberlei closed this
@gagarine

But how can you force utf8mb4 when you create a schema?

@beberlei
Doctrine member

@gagarine Doctrine does not create a schema, you have to do this yourself anyways. At that point you can do it.

@gagarine

@beberlei I used \Doctrine\DBAL\Schema\Schema();
$schema->toSql()
$app['db']->exec();

The sql provided by "toSql' was using utf8 encoding and I don't see anyway to force utf8mb4 even if I create the Database by hand using utf8mb4.

I even created the table by hand. But after inserting was not working neither. Look like the connexion encoding is not right. How can I test it?

I'm very new to doctrine and perhaps is not the place to get support... any-pointer would be welcome.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Commits on May 15, 2013
  1. @nicolas-grekas

    Conditionaly upgrade utf8 to utf8mb4 for MySQL 5.5.3

    nicolas-grekas committed
    See http://dev.mysql.com/doc/refman/5.5/en/charset-unicode-utf8mb4.html
    
    As utf8mb4 is a superset of utf8, this should be transparent and backward compatible.
    For those really requiring the "utf8" meant by MySQL, they can use explicitely the utf8mb3 charset.
    But IMHO by default, Doctrine should really use utf8mb4, which is what everybody expect from a charset named "utf8".
  2. @nicolas-grekas

    CS update

    nicolas-grekas committed
  3. @nicolas-grekas
Commits on May 22, 2013
  1. @nicolas-grekas
  2. @nicolas-grekas
  3. @nicolas-grekas

    Fix

    nicolas-grekas committed
View
7 lib/Doctrine/DBAL/Driver/Mysqli/MysqliConnection.php
@@ -42,7 +42,12 @@ public function __construct(array $params, $username, $password, array $driverOp
}
if (isset($params['charset'])) {
- $this->_conn->set_charset($params['charset']);
+ if (50503 <= $this->_conn->server_version && 0 === strcasecmp($params['charset'], 'utf8')) {
+ $this->_conn->set_charset('utf8mb4');
+ }
+ else {
+ $this->_conn->set_charset($params['charset']);
+ }
}
}
View
17 lib/Doctrine/DBAL/Driver/PDOMySql/Driver.php
@@ -39,6 +39,23 @@ class Driver implements \Doctrine\DBAL\Driver
*/
public function connect(array $params, $username = null, $password = null, array $driverOptions = array())
{
+ if (isset($params['charset']) && 0 === strcasecmp($params['charset'], 'utf8')) {
+ try {
+ $conn = new \Doctrine\DBAL\Driver\PDOConnection(
+ $this->_constructPdoDsn(array('charset' => 'utf8mb4') + $params),
+ $username,
+ $password,
+ $driverOptions
+ );
+ return $conn;
+
+ } catch(\PDOException $e) {
+ if (2019 !== $e->getCode()) {
+ throw $e;
+ }
+ }
+ }
+
$conn = new \Doctrine\DBAL\Driver\PDOConnection(
$this->_constructPdoDsn($params),
$username,
View
22 lib/Doctrine/DBAL/Event/Listeners/MysqlSessionInit.php
@@ -51,10 +51,10 @@ class MysqlSessionInit implements EventSubscriber
* @param string $charset
* @param string $collation
*/
- public function __construct($charset = 'utf8', $collation = false)
+ public function __construct($charset = 'utf8', $collation = '')
{
- $this->_charset = $charset;
- $this->_collation = $collation;
+ $this->_charset = strtolower($charset);
+ $this->_collation = strtolower($collation);
}
/**
@@ -63,8 +63,20 @@ public function __construct($charset = 'utf8', $collation = false)
*/
public function postConnect(ConnectionEventArgs $args)
{
- $collation = ($this->_collation) ? " COLLATE ".$this->_collation : "";
- $args->getConnection()->executeUpdate("SET NAMES ".$this->_charset . $collation);
+ $collation = $this->_collation ? ' COLLATE ' . $this->_collation : ' ';
+ $sql = 'SET NAMES ' . $this->_charset . $collation;
+
+ $mb4 = str_replace(
+ array('utf8 ', 'utf8_'),
+ array('utf8mb4 ', 'utf8mb4_'),
+ $sql
+ );
+
+ if ($mb4 !== $sql) {
+ $sql .= '/*!50503,' . $mb4 . '*/';
+ }
+
+ $args->getConnection()->executeUpdate($sql);
}
public function getSubscribedEvents()
Something went wrong with that request. Please try again.