Skip to content

Conversation

@greg0ire
Copy link
Member

@greg0ire greg0ire commented Feb 8, 2022

collate was never supposed to be supported and used in the first place,
and DBAL 3.3.2 fixed that.
Here, we avoid the situation where a user willing to use the correct
option ends up with both collate and collation defined, and we remap
collate to collation when DBAL 3.3 is detected. DBAL 3.3.0 and 3.3.1 are
avoided thanks to a composer version constraint.

Might fix #1468 , might also fix doctrine/migrations#1240, and might also fix doctrine/DoctrineMigrationsBundle#470

How can I test this?

composer config repositories.greg0ire vcs https://github.com/greg0ire/DoctrineBundle
composer require doctrine/doctrine-bundle "dev-collation as 2.5.5"

@greg0ire
Copy link
Member Author

greg0ire commented Feb 9, 2022

@aprat84 @dmaicher @natewiebe13 @floviolleau please test

@bobvandevijver
Copy link
Contributor

Tested this as I found the same behaviour described in doctrine/dbal#5243, it doesn't change anything. However, changing the collate option in the dbal configuration to collation does solve the issue for me, and that doesn't need this change.

@natewiebe13
Copy link

Similar experience here. Using this branch using collate produces the down with unexpected queries, using collation doesn't produce a migration.

@natewiebe13
Copy link

natewiebe13 commented Feb 9, 2022

@greg0ire
Copy link
Member Author

greg0ire commented Feb 9, 2022

@natewiebe13 that should be done in a follow-up PR targeting 2.6.x: this one is not about the public interface of DoctrineBundle, but about adapting to the public interface of doctrine/dbal in the least disrupting way.

I'm noting that this improves your experience only when switching to collation, but I don't understand why it does not change anything when you use collate. I don't experience that issue myself so if you can debug this and find out what's happening, that would be great.

@natewiebe13
Copy link

natewiebe13 commented Feb 10, 2022

@greg0ire I've updated by test project here: https://github.com/natewiebe13/doctrine-dbal-bug

This is likely the way most Symfony projects are configured, as it was at one time either suggested through a recipe/recommended through docs. Specifically this: natewiebe13/doctrine-dbal-bug@c834489

What's happening is because of the logic on the parent if statement, the path completely skips the logic you added and directly inputs the configuration into here: https://github.com/doctrine/DoctrineBundle/blob/2.5.x/ConnectionFactory.php#L98

@greg0ire
Copy link
Member Author

Ok, so this means this bundle is not involved at all in the bug you are experiencing with this project, right?

@natewiebe13
Copy link

natewiebe13 commented Feb 10, 2022

Based on the example I provided, that's my thinking as well in terms of the bundle's involvement in configuring the dbal connection. That being said, it is unfortunate that the change in doctrine/dbal is causing this to happen.

Do you think it's worth adding a conflicts with dbal > 3.3.0 and then resolving that in 2.6 or something? I'm just wondering if there's a way to make it more obvious for people that run into this without it triggering new issues being created.

@greg0ire
Copy link
Member Author

I don't think it's worth it, IMO it might just prevent people from upgrading the DoctrineBundle.

@pps1
Copy link

pps1 commented Feb 12, 2022

+1 on this fix @greg0ire

After your patch diff no longer generates empty up and false down statements that do not introduce any changes to the existing schema.

doctrine:

  dbal:
    # https://symfony.com/doc/current/reference/configuration/doctrine.html#doctrine-dbal-configuration
    dbname: '%env(resolve:DATABASE_NAME)%'
    host: '%env(resolve:DATABASE_HOST)%'
    port: '%env(resolve:DATABASE_PORT)%'
    user: '%env(resolve:DATABASE_USER)%'
    password: '%env(resolve:DATABASE_ROOT_PASSWORD)%'
    server_version: '%env(resolve:DATABASE_SERVER_VERSION)%'

  orm:
    auto_generate_proxy_classes: true
    //{...}
--
-- Table structure for table `address`
--

DROP TABLE IF EXISTS `address`;
/*!40101 SET @saved_cs_client     = @@character_set_client */;
/*!50503 SET character_set_client = utf8mb4 */;
CREATE TABLE `address` (
  `id` bigint unsigned NOT NULL AUTO_INCREMENT,
  `guid` char(36) COLLATE utf8mb4_unicode_ci DEFAULT NULL COMMENT '(DC2Type:guid)',
  `address1` varchar(255) COLLATE utf8mb4_unicode_ci DEFAULT NULL,
  `address1_blind_idx` varchar(255) COLLATE utf8mb4_unicode_ci DEFAULT NULL,
  `address2` varchar(255) COLLATE utf8mb4_unicode_ci DEFAULT NULL,
  `address2_blind_idx` varchar(255) COLLATE utf8mb4_unicode_ci DEFAULT NULL,
  `city` varchar(64) COLLATE utf8mb4_unicode_ci DEFAULT NULL,
  `state_or_province` varchar(64) COLLATE utf8mb4_unicode_ci NOT NULL,
  `postal_code` varchar(24) COLLATE utf8mb4_unicode_ci NOT NULL,
  `country_code` varchar(2) COLLATE utf8mb4_unicode_ci NOT NULL,
  `telephone_number` varchar(255) COLLATE utf8mb4_unicode_ci DEFAULT NULL,
  `telephone_number_blind_idx` varchar(255) COLLATE utf8mb4_unicode_ci DEFAULT NULL,
  `fax_number` varchar(255) COLLATE utf8mb4_unicode_ci DEFAULT NULL,
  `fax_number_blind_idx` varchar(255) COLLATE utf8mb4_unicode_ci DEFAULT NULL,
  `purpose` smallint NOT NULL,
  PRIMARY KEY (`id`),
  UNIQUE KEY `UNIQ_D4E6F812B6FCFB2` (`guid`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci;
/*!40101 SET character_set_client = @saved_cs_client */;
final class Version20220212202808 extends AbstractMigration
{
    public function getDescription(): string
    {
        return '';
    }

    public function up(Schema $schema): void
    {
        // this up() migration is auto-generated, please modify it to your needs

    }

    public function down(Schema $schema): void
    {
        // this down() migration is auto-generated, please modify it to your needs
        $this->addSql('ALTER TABLE address CHANGE address1 address1 VARCHAR(255) DEFAULT NULL COLLATE `utf8mb4_unicode_ci`, CHANGE address1_blind_idx address1_blind_idx VARCHAR(255) DEFAULT NULL COLLATE `utf8mb4_unicode_ci`, CHANGE address2 address2 VARCHAR(255) DEFAULT NULL COLLATE `utf8mb4_unicode_ci`, CHANGE address2_blind_idx address2_blind_idx VARCHAR(255) DEFAULT NULL COLLATE `utf8mb4_unicode_ci`, CHANGE city city VARCHAR(64) DEFAULT NULL COLLATE `utf8mb4_unicode_ci`, CHANGE state_or_province state_or_province VARCHAR(64) NOT NULL COLLATE `utf8mb4_unicode_ci`, CHANGE postal_code postal_code VARCHAR(24) NOT NULL COLLATE `utf8mb4_unicode_ci`, CHANGE country_code country_code VARCHAR(2) NOT NULL COLLATE `utf8mb4_unicode_ci`, CHANGE telephone_number telephone_number VARCHAR(255) DEFAULT NULL COLLATE `utf8mb4_unicode_ci`, CHANGE telephone_number_blind_idx telephone_number_blind_idx VARCHAR(255) DEFAULT NULL COLLATE `utf8mb4_unicode_ci`, CHANGE fax_number fax_number VARCHAR(255) DEFAULT NULL COLLATE `utf8mb4_unicode_ci`, CHANGE fax_number_blind_idx fax_number_blind_idx VARCHAR(255) DEFAULT NULL COLLATE `utf8mb4_unicode_ci`, CHANGE guid guid CHAR(36) DEFAULT NULL COLLATE `utf8mb4_unicode_ci` COMMENT \'(DC2Type:guid)\'');
//{...}
    }
}

"doctrine/annotations": "^1",
"doctrine/cache": "^1.11 || ^2.0",
"doctrine/dbal": "^2.13.1|^3.1",
"doctrine/dbal": "^2.13.1|^3.3.2",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since you're targeting the 2.5 branch: Are we sure we want to do this kind of bump in a bugfix release?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's OK since only 3.3 is supported.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But I'm not sure what's the policy on this. Maybe @ostrolucky does?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't mind

collate was never supposed to be supported and used in the first place,
and DBAL 3.3.2 fixed that.
Here, we avoid the situation where a user willing to use the correct
option ends up with both collate and collation defined, and we remap
collate to collation when DBAL 3.3 is detected. DBAL 3.3.0 and 3.3.1 are
avoided thanks to a composer version constraint.
@greg0ire
Copy link
Member Author

greg0ire commented Feb 15, 2022

Thanks for this, I think I'm going to open a PR to fix this sweat_smile

But before that, I need to figure out several things:

  • what are the recommended settings when using MySQL for charset and collation?
  • what does DBAL do by default, when you don't specify those? Apparently it picks utf8, and then sticks _unicode_ci to that to obtain the collation.
  • what does DoctrineBundle do when you don't specify those in the configuration?

UPD: Created

@arderyp
Copy link

arderyp commented Feb 15, 2022

Thanks for creating those issues, and if UTF8 does indeed seem like a wrong default, then opening an issue with Symfony Docs would be helpful.

In the mean time, how explicitly would you change the following for it to be more sensible and explicit:

doctrine:
  dbal:
    driver: pdo_mysql
    charset: ???
    default_table_options:
      collation: ???
    host: '%env(DATABASE_HOST)%'
    port: '%env(DATABASE_PORT)%'
    dbname: '%env(DATABASE_NAME)%'
    user: '%env(DATABASE_USER)%'
    password: '%env(DATABASE_PASSWORD)%'

Do any other keyed elements need to be added? server_version?

Once we are clear on the recommended default config for utf8-style configuration, do you think I should focus more on the up or down schema generation for further debugging? Should up() be generating migrations for these fields:

table | CREATE TABLE `table` (
  `id` int NOT NULL AUTO_INCREMENT,
  `name` varchar(255) NOT NULL,
  `short` varchar(50) NOT NULL,
  `description` longtext NOT NULL,
  `created` datetime NOT NULL,
  `modified` datetime NOT NULL,
  PRIMARY KEY (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=20 DEFAULT CHARSET=utf8mb3 COMMENT='table' 

My guess is "yes" given the CHARSET=utf8mb3 part, and you suggest that's not the advisible charset.

Based on the comments here and my debugging, it sounds like the up() function is mapping UTF8 charset configuration to utf8mb3 (_platformOptions: [charset => utf8, collation => utf8_general_ci]) causing the up() method to see the schema as in sync since my tables are utf8mb3, while the down() function maps the UTF8 configuration to some other value or perhaps nothing at all (_platformOptions: [version => false]). _platformOptions examples come from my earlier comment.

@greg0ire
Copy link
Member Author

For the moment this is my recommendation

doctrine:
  dbal:
    driver: pdo_mysql
    charset: utf8mb4
    default_table_options:
      charset: utf8mb4
      collation: utf8mb4_unicode_ci # or even utf8mb4_0900_ai_ci if using MySQL 8
    host: '%env(DATABASE_HOST)%'
    port: '%env(DATABASE_PORT)%'
    dbname: '%env(DATABASE_NAME)%'
    user: '%env(DATABASE_USER)%'
    password: '%env(DATABASE_PASSWORD)%'

Then you can try inserting emojis in your table 🙂
From what I understand, up() is not generated because platform-aware comparison only works at the column level for now, and not at the table level: doctrine/dbal#4945

@greg0ire
Copy link
Member Author

greg0ire commented Feb 15, 2022

@arderyp until the DBAL is able to do the charset/collation migration, maybe use utf8mb3 everywhere? Might make the down() diff go away? Also, I'm wondering if that issue is specific to platform-aware comparison. If yes, you might be able to downgrade the DBAL to 3.3.0, use utf8mb4 everywhere, migrate your tables, then upgrade back to 3.3.2

@arderyp
Copy link

arderyp commented Feb 15, 2022

I am comparing $fromTable->getOptions and $toTable->getOptions() here for the table I've been using in my examples, and it's clear they are just not parsing the same options:

# FROM
array(6) {
  ["create_options"]=>
  array(0) {
  }
  ["engine"]=>
  string(6) "InnoDB"
  ["collation"]=>
  string(15) "utf8_general_ci"
  ["charset"]=>
  string(4) "utf8"
  ["autoincrement"]=>
  string(2) "20"
}

# TO
array(3) {
  ["create_options"]=>
  array(0) {
  }
  ["collation"]=>
  string(15) "utf8_general_ci"
  ["charset"]=>
  string(4) "UTF8"
}

setting charset to utf8mb3 in my config throws an error about unrecognized charset. Are you suggesting I migrate the tables manually to utf8mb3 and that this may resolve errors with the charset: UTF8 config?

if it makes any difference, my database is set to charset=utf8 and collation=utf8_general_ci. I have yet to run a comprehensive check on app tables.

@greg0ire
Copy link
Member Author

Are you suggesting I migrate the tables manually to utf8mb3 and that this may resolve errors with the charset: UTF8 config?

No, I thought your tables where already using that, as show by SHOW CREATE TABLE

setting charset to utf8mb3 in my config throws an error about unrecognized charset.

What throws the error? The server? PDO? doctrine/dbal? Where does that uppercase UTF8 come from? Your DSN maybe?

@arderyp
Copy link

arderyp commented Feb 15, 2022

upper case charset: UTF8 comes from the symfony documentation I linked earlier.

charset: utf8mb3 in config throws:

$ ./bin/console doctrine:migrations:diff

In ExceptionConverter.php line 119:
                                                                                     
  An exception occurred in the driver: SQLSTATE[HY000] [2019] Unknown character set  
                                                                                     

In Exception.php line 30:
                                                
  SQLSTATE[HY000] [2019] Unknown character set  
                                                

In Driver.php line 28:
                                                
  SQLSTATE[HY000] [2019] Unknown character set  

@greg0ire
Copy link
Member Author

Well that's confusing to say the least 🤔

@arderyp
Copy link

arderyp commented Feb 15, 2022

ok, I am making some progress. If I change my config to charset: utf8 (not `UTF8), it no longer generates a down migration for my test table.

I am still getting other down migrations, but that is for the handful of tables that are defined as COLLATE utf8_unicode_ci`` instead of utf8_general_ci. So, I am guessing I could resolve my specific situation by using:

charset: utf8
  driver: pdo_mysql
  default_table_options:
    collation: utf8_general_ci

then manually creating up() migrations for my utf8_unicode_ci tables to utf8_general_ci.

However, there is still something wrong with the code. I;m now even more convinced of my proposal above that the issue may in fact not be with down() generating junk, but rather with up() not actually generating what it should (in my case, migrating my utf8_unicode_ci tables to utf8_general_ci)

There's also the remaining question, should UTF8 and utf8 be treated the same. MySql seems to use the lower case, whereas Symfony docs (and users like me who followed it) use upper case.

@arderyp
Copy link

arderyp commented Feb 15, 2022

in other words, down() is responding to changes I make to my config, but up() does nothing in all cases.

@greg0ire
Copy link
Member Author

ok, I am making some progress. If I change my config to charset: utf8 (not `UTF8), it no longer generates a down migration for my test table.

Congrats!

then manually creating up() migrations for my utf8_unicode_ci tables to utf8_general_ci.

Again, not a specialist, but I think I would recommend migrating to utf8_unicode_ci

However, there is still something wrong with the code. I;m now even more convinced of my proposal above that the issue may in fact not be with down() generating junk, but rather with up() not actually generating what it should (in my case, migrating my utf8_unicode_ci tables to utf8_general_ci)

That was my thinking as well, and it's explained by doctrine/dbal#4945 I think.

There's also the remaining question, should UTF8 and utf8 be treated the same. MySql seems to use the lower case, whereas Symfony docs (and users like me who followed it) use upper case.

Good question. Using your MySQL client, can you try creating a table for both cases, then use SHOW CREATE TABLE on those? I'm suspecting that it will result in something normalized.

@arderyp
Copy link

arderyp commented Feb 15, 2022

@greg0ire sure I'm happy to do that, but can you be more explicit? You want me to run the create at the mysql terminal? Example command?

I can't really remember if I created my newer tables manually then generated Doctrine entities from that, or vice versa.

@greg0ire
Copy link
Member Author

You want me to run the create at the mysql terminal? Example command?

Yes

CREATE TABLE Foo (Bar INT NOT NULL) DEFAULT CHARACTER SET utf8;
CREATE TABLE Bar (Foo INT NOT NULL) DEFAULT CHARACTER SET UTF8;
SHOW CREATE TABLE Foo;
SHOW CREATE TABLE Bar;

@arderyp
Copy link

arderyp commented Feb 15, 2022

mysql> SHOW CREATE TABLE Foo;
+-------+-----------------------------------------------------------------------------------+
| Table | Create Table                                                                      |
+-------+-----------------------------------------------------------------------------------+
| Foo   | CREATE TABLE `Foo` (
  `Bar` int NOT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb3 |
+-------+-----------------------------------------------------------------------------------+
1 row in set (0.01 sec)

mysql> SHOW CREATE TABLE Bar;
+-------+-----------------------------------------------------------------------------------+
| Table | Create Table                                                                      |
+-------+-----------------------------------------------------------------------------------+
| Bar   | CREATE TABLE `Bar` (
  `Foo` int NOT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb3 |
+-------+-----------------------------------------------------------------------------------+
1 row in set (0.00 sec)

@greg0ire
Copy link
Member Author

Interesting. Can you also try utf8mb3 ? That way we will now if the error message you got really came from the server or from PDO.

Anyway, I think you'll agree with me that UTF8 should not be mentioned in any docs.

@arderyp
Copy link

arderyp commented Feb 15, 2022

for documentation, this works, and this does not.

Sequence:

I have not yet determined if the issue lies with MySQL/Comparator::diffTable or Schema/Comparator::diffTable

@arderyp
Copy link

arderyp commented Feb 15, 2022

Yes, it seems UTF8 shouldn't be in the docs, but I wonder if the proper solution is for doctrine to convert it to utf8. But even that seems to map to utf8mb3, which isn't advisible according to your earlier posts. Maybe there should either be a more sensible default (along what you outlined here) or perhaps there should be no defaults and the implementer should be forced to explicitly define charset default_table_options[charset] and default_table_options[collation], which should be reflected in symfony docs.

Anywho, you said "Can you also try utf8mb3"... what's that in reference to? Setting that as my config charset or re-running the table creations using utf8mb3 instead of utf8?

@greg0ire
Copy link
Member Author

re-running the table creations using utf8mb3 instead of utf8?

Yes, for science. I'm wondering if you will get an error message this time.

@arderyp
Copy link

arderyp commented Feb 15, 2022

for science 🥂

mysql> CREATE TABLE Foo (Bar INT NOT NULL) DEFAULT CHARACTER SET utf8mb3;
Query OK, 0 rows affected, 1 warning (0.08 sec)

mysql> show create table Foo;
+-------+-----------------------------------------------------------------------------------+
| Table | Create Table                                                                      |
+-------+-----------------------------------------------------------------------------------+
| Foo   | CREATE TABLE `Foo` (
  `Bar` int NOT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb3 |
+-------+-----------------------------------------------------------------------------------+
1 row in set (0.00 sec)

@arderyp
Copy link

arderyp commented Feb 16, 2022

@greg0ire unfortunately, I feel like I'm reaching the limit of my debugging capabilities here.

As far as I've gotten is tracking the issue to here: https://github.com/doctrine/dbal/blob/3.3.x/src/Platforms/MySQL/Comparator.php#L56

My live table schema is having it's table definition mapped properly to its columns:

# TABLE OPTIONS: var dump of $table->getOptions() in MySQL/Comparator::normalizeColumns
# for table that's getting down() migration
array(6) {
  ["create_options"]=>
  array(0) {
  }
  ["engine"]=>
  string(6) "InnoDB"
  ["collation"]=>
  string(15) "utf8_general_ci"
  ["charset"]=>
  string(4) "utf8"
  ["autoincrement"]=>
  string(2) "20"
  ["comment"]=>
  string(37) "List of applications in CEE's system."
}

# DEFAULTS: var dump of $defaults in MySQL/Comparator::normalizeColumns
# for table that's getting down() migration
array(2) {
  ["collation"]=>
  string(15) "utf8_general_ci"
  ["charset"]=>
  string(4) "utf8"
}

# COLUMN PLATFORM OPTIONS: var dump of $column->getPlatformOptions() in MySQL/Comparator::normalizeColumns
# for field "name" on table that's getting down() migration
array(2) {
  ["charset"]=>
  string(4) "utf8"
  ["collation"]=>
  string(15) "utf8_general_ci"
}

# DIFF: var dump of $diff in MySQL/Comparator::normalizeColumns
# for field "name" on table that's getting down() migration
array(0) {
}

However, the charset and collation from my configuration Table object are not being properly mapped to it's corresponding Column children:

# TABLE OPTIONS: var dump of $table->getOptions() in MySQL/Comparator::normalizeColumns
# for table that's getting down() migration
array(3) {
  ["create_options"]=>
  array(0) {
  }
  ["charset"]=>
  string(7) "utf8mb4"
  ["collation"]=>
  string(18) "utf8mb4_unicode_ci"
}

# DEFAULTS: var dump of $defaults in MySQL/Comparator::normalizeColumns
# for table that's getting down() migration
array(2) {
  ["collation"]=>
  string(15) "utf8_general_ci"
  ["charset"]=>
  string(4) "utf8"
}

# COLUMN PLATFORM OPTIONS: var dump of $column->getPlatformOptions() in MySQL/Comparator::normalizeColumns
# for field "name" on table that's getting down() migration
array(1) {
  ["version"]=>
  bool(false)
}

# DIFF: var dump of $diff in MySQL/Comparator::normalizeColumns
# for field "name" on table that's getting down() migration
array(1) {
  ["version"]=>
  bool(false)
}

I'm kind of at a dead end, but maybe this will give you and idea for where we should look next.

@dmaicher
Copy link
Contributor

thinking from what I read here, it is recommended to use _unicode_ over _general_.

So somehow it seems to be a mismatch between utf8_general_ci and utf8_unicode_ci?

I would expect up() to mention what is defined in your config, but for down(), I would expect it to mention whatever is introspected. If you use SHOW CREATE TABLE address, what do you get?

@greg0ire this is what I get

| address | CREATE TABLE `address` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `street` longtext COLLATE utf8_unicode_ci,
  `additional_info_1` longtext COLLATE utf8_unicode_ci,
  `additional_info_2` longtext COLLATE utf8_unicode_ci,
  `city` longtext COLLATE utf8_unicode_ci,
  `zip_code` longtext COLLATE utf8_unicode_ci,
  `country` longtext COLLATE utf8_unicode_ci,
  `latitude` longtext COLLATE utf8_unicode_ci,
  `longitude` longtext COLLATE utf8_unicode_ci,
  PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci |

@greg0ire
Copy link
Member Author

Ok so down() seems to behave well, in your case, since you instructed to use utf8_general_ci, down() is supposed to do the opposite of that and restore what you currently have, which is utf8_unicode_ci.

@dmaicher
Copy link
Contributor

Ok so down() seems to behave well, in your case, since you instructed to use utf8_general_ci, down() is supposed to do the opposite of that and restore what you currently have, which is utf8_unicode_ci.

Yes indeed. Just the empty up() is then probably a bug?

@greg0ire
Copy link
Member Author

Yes, and I think it is that bug: doctrine/dbal#4945

I don't understand why it affects only up() though.

@arderyp
Copy link

arderyp commented Feb 16, 2022

@greg0ire the description of doctrine/dbal#4945 seems to reflect the broken up() issues, so hopefully that fixes it. If you need me to fig further just let me know.

@arderyp
Copy link

arderyp commented Mar 3, 2022

@greg0ire , given this comment...

My mysql server is currently 5.7 but will be updated soon to 8. I'm considering converting all tables/fields to utf8mb4/utf8mb4_unicode_ci now with a manually built migration to resolve this issue. However, since mysql 8 defaults to utf8mb4_0900_ai_ci(which both you and this post mention), I'm wondering how I should proceed.

Should I migrate to utf8mb4/utf8mb4_unicode_ci now while on 5.7, or wait until mysql is upgraded to 8 then migrate to utf8mb4/utf8mb4_0900_ai_ci, or perhaps migrate now to utf8mb4_unicode_ci and then update to utf8mb4_0900_ai_ci when upgrading to 8. It isn't entirely clear to me if using utf8mb4/utf8mb4_unicode_ci on mysql 8 would present a problem (the aforementioned post has an explicit warning about the collation setting, but I'm struggling to make sense of it).

Given that you know a great deal about this, I consider your insight valuable, if you have the time. Thanks in advance.

@greg0ire
Copy link
Member Author

greg0ire commented Mar 3, 2022

Given that you know a great deal about this, I consider your insight valuable, if you have the time.

Actually I don't, I just skimmed through the posts I found. I never paid attention to collation until starting to work on this.
My current idea is

  • utf8 = utf8mb3 bad proprietary collation, avoid it
  • utf8mb4 is what should have been called aliased to utf8, and should have been the default, and won't cause any issues on MySQL 8
  • utf8mb4_0900_ai_ci is a cool new collation that provides accent insensitivity. Only you know if that feature would be interesting to you.

@arderyp
Copy link

arderyp commented Mar 3, 2022

Thanks for the feedback @greg0ire!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

9 participants