Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Query ALTER TABLE always transmitted if collate specified on an attribute #6146

Open
DamienLinux opened this issue Sep 8, 2023 · 4 comments
Labels

Comments

@DamienLinux
Copy link

DamienLinux commented Sep 8, 2023

Bug Report

Q A
Version 3.6.6

Summary

ALTER TABLE always sent with precision options={"collation":"XXX"} at ORM level (MySQL 5.7 / MariaDB 10.1 database)

Current behaviour

recision of options={"collation":"XXX"} of an attribute of an entity

Tested with utf8_general_ci, utf8mb4_general_ci, utf8_unicode_ci, utf8mb4_unicode_ci, latin1_swedish_ci

A query like this is returned:
ALTER TABLE table CHANGE attribute attribute VARCHAR(15) CHARACTER SET utf8 NOT NULL COLLATE utf8_general_ci;

attribute was in this example initially in "utf8" and "utf8_general_ci". Even after executing this database query, this happens again.

Tested with:

  • A table and an attribute not having the same charset/collation
  • A table and an attribute with the same charset/collation

How to reproduce

To reproduce the bug according to the example provided:

By ORM specify the attribute of an entity: options={"collation":"utf8_general_ci"} and a non-null string of length 15

Run a doctrine:schema:update --dump-sql

Expected behaviour

No ALTER TABLE should be transmitted

@derrabus derrabus added the Bug label Sep 11, 2023
@derrabus
Copy link
Member

Can you reproduce the issue by only using DBAL APIs? If not, we will also need the ORM version that you are using.

@MAXakaWIZARD
Copy link

MAXakaWIZARD commented Dec 19, 2023

It is the same case as described here: #5338
Seems like it should be fixed in 3.3.x and 4.x, but I'm using version 3.7.2 and have exactly the same problem.
/** * @ORM\Column(type="string", length=100, options={"collation":"utf8_bin"}) */ protected $alias = '';
Schema validate command always proposes this change:
ALTER TABLE aliases CHANGE alias alias VARCHAR(100) CHARACTER SET utf8 NOT NULL COLLATEutf8_bin;

@fisharebest
Copy link

fisharebest commented Mar 3, 2024

Here's a failing test script using just the DBAL API calls.
I run the schema-diff/migration five times.
It should only produce output once.

<?php
// Using MySQL 8.3.0 and doctrine/dbal 4.0.0

use Doctrine\DBAL\DriverManager;
use Doctrine\DBAL\Schema\Column;
use Doctrine\DBAL\Schema\Schema;
use Doctrine\DBAL\Schema\Table;
use Doctrine\DBAL\Types\StringType;

require __DIR__ . '/vendor/autoload.php';

$conn = DriverManager::getConnection(params: [
    'dbname'   => 'test',
    'user'     => 'test',
    'password' => 'test',
    'host'     => 'localhost',
    'driver'   => 'pdo_mysql',
]);

$conn->executeStatement(sql: 'DROP TABLE IF EXISTS foo');

echo 'STEP 1', PHP_EOL;
for ($i = 0; $i < 5; $i++) {
    $column         = new Column(
        name: 'bar',
        type: new StringType(),
        options: ['length' => 32, 'platformOptions' => ['collation' => 'utf8mb4_unicode_ci']]
    );
    $table          = new Table(name: 'foo', columns: [$column]);
    $new_schema     = new Schema(tables: [$table]);
    $schema_manager = $conn->createSchemaManager();
    $old_schema     = $schema_manager->introspectSchema();
    $comparator     = $schema_manager->createComparator();
    $schema_diff    = $comparator->compareSchemas(oldSchema: $old_schema, newSchema: $new_schema);
    $queries        = $conn->getDatabasePlatform()->getAlterSchemaSQL(diff: $schema_diff);

    foreach ($queries as $query) {
        echo $query, ';', PHP_EOL;
        $conn->executeStatement(sql: $query);
    }
}

echo 'STEP 2', PHP_EOL;
for ($i = 0; $i < 5; $i++) {
    $column         = new Column(
        name: 'bar',
        type: new StringType(),
        options: ['length' => 32, 'platformOptions' => ['collation' => 'utf8_unicode_ci']]
    );
    $table          = new Table(name: 'foo', columns: [$column]);
    $new_schema     = new Schema(tables: [$table]);
    $schema_manager = $conn->createSchemaManager();
    $old_schema     = $schema_manager->introspectSchema();
    $comparator     = $schema_manager->createComparator();
    $schema_diff    = $comparator->compareSchemas(oldSchema: $old_schema, newSchema: $new_schema);
    $queries        = $conn->getDatabasePlatform()->getAlterSchemaSQL(diff: $schema_diff);

    foreach ($queries as $query) {
        echo $query, ';', PHP_EOL;
        $conn->executeStatement(sql: $query);
    }
}

echo 'STEP 3', PHP_EOL;
for ($i = 0; $i < 5; $i++) {
    $column         = new Column(
        name: 'bar',
        type: new StringType(),
        options: ['length' => 32, 'platformOptions' => ['collation' => 'utf8mb4_unicode_ci']]
    );
    $table          = new Table(name: 'foo', columns: [$column]);
    $new_schema     = new Schema(tables: [$table]);
    $schema_manager = $conn->createSchemaManager();
    $old_schema     = $schema_manager->introspectSchema();
    $comparator     = $schema_manager->createComparator();
    $schema_diff    = $comparator->compareSchemas(oldSchema: $old_schema, newSchema: $new_schema);
    $queries        = $conn->getDatabasePlatform()->getAlterSchemaSQL(diff: $schema_diff);

    foreach ($queries as $query) {
        echo $query, ';', PHP_EOL;
        $conn->executeStatement(sql: $query);
    }
}

Output:

$ php test.php 
STEP 1
CREATE TABLE foo (bar VARCHAR(32) NOT NULL COLLATE `utf8mb4_unicode_ci`);
STEP 2
ALTER TABLE foo CHANGE bar bar VARCHAR(32) NOT NULL COLLATE `utf8_unicode_ci`;
ALTER TABLE foo CHANGE bar bar VARCHAR(32) NOT NULL COLLATE `utf8_unicode_ci`;
ALTER TABLE foo CHANGE bar bar VARCHAR(32) NOT NULL COLLATE `utf8_unicode_ci`;
ALTER TABLE foo CHANGE bar bar VARCHAR(32) NOT NULL COLLATE `utf8_unicode_ci`;
ALTER TABLE foo CHANGE bar bar VARCHAR(32) NOT NULL COLLATE `utf8_unicode_ci`;
STEP 3
ALTER TABLE foo CHANGE bar bar VARCHAR(32) NOT NULL COLLATE `utf8mb4_unicode_ci`;

Interestingly, if step-2 uses utf8mb3_unicode_ci instead of utf8_unicode_ci (these are synonymous), then it works OK:

$ php test.php 
STEP 1
CREATE TABLE foo (bar VARCHAR(32) NOT NULL COLLATE `utf8mb4_unicode_ci`);
STEP 2
ALTER TABLE foo CHANGE bar bar VARCHAR(32) CHARACTER SET utf8mb3 NOT NULL COLLATE `utf8mb3_unicode_ci`;
STEP 3
ALTER TABLE foo CHANGE bar bar VARCHAR(32) NOT NULL COLLATE `utf8mb4_unicode_ci`;

I note that this uses the CHARACTER SET attribute. Perhaps the equivalence of utf8 and utf8mb3 is causing this?

@fisharebest
Copy link

fisharebest commented Apr 20, 2024

I have analysed this issue, and found several parts, all related to utf8 / utf8mb3.

Overview: the comparator generates two SQL snippets:

  • One from the existing/introspected schema. This will always contain both the charset and collation options.
  • One from the user-specified/target schema. If this contains only one of charset/collation, the code looks up the other.

Then, if these SQL snippets differ, an ALTER TABLE statement is emitted.

Issue 1: The introspected schema may use utf8mb3 whereas the target schema uses utf8 (or vice versa).
This triggers an unnecessary change.

We can fix this in \Platforms\MySQL\Comparator::normalizeOptions(). Just replace utf8mb3 with utf8.

Issue 2: The ConnectionCollationMetadataProvider examines the table information_schema.COLLATIONS.

In versions of MySQL up to 8.0.29, this contains entries for utf8.
In versions of MySQL after 8.0.30, this contains entries for utf8mb3.
(Reference: https://dev.mysql.com/doc/refman/8.0/en/charset-unicode-utf8mb3.html)

The result is that the charset may or may not be found, depending on the version. If not found, getCollationCharset() will return null. This triggers a difference.

We can fix this by unsetting charset whenever collation is set - it is redundant. (class ConnectionCollationMetadataProvider also becomes redundant and can be deleted. It is marked as @internal.)

These two simple changes will fix the reported issue. Here's an updated version of normalizeOptions() that does this:

   private function normalizeOptions(array $options): array
    {
        if (isset($options['charset']) && ! isset($options['collation'])) {
            $options['collation'] = $this->charsetMetadataProvider->getDefaultCharsetCollation($options['charset']);
        } elseif (isset($options['collation'])) {
            // If collation is specified, charset is redundant.
            unset($options['charset']);
        }

        // utf8mb3 is a synonym for utf8.  Be consistent.
        if (isset($options['collation']) && str_starts_with($options['collation'], 'utf8mb3_')) {
            $options['collation'] = 'utf8_' . substr($options['collation'], 8);
        }

        if (isset($options['charset']) && $options['charset'] === 'utf8mb3') {
            $options['charset'] = 'utf8';
        }

        return $options;
    }

However, there is one more case - where the user only specifies the charset and leaves dbal to look up the default collation. This will be different for MySQL 8.0.29 and 8.0.30.

I'm not sure this is a common use-case. I've never seen anyone just set charset. However, fixing this properly will require knowing the MySQL version. Alternatively, we could bodge it using code such as SELECT REPLACE(DEFAULT_COLLATE_NAME, 'utf8mb3', 'utf8') AS DEFAULT_COLLATE_NAME.

fisharebest added a commit to fisharebest/dbal that referenced this issue Apr 29, 2024
fisharebest added a commit to fisharebest/dbal that referenced this issue Apr 30, 2024
fisharebest added a commit to fisharebest/dbal that referenced this issue Apr 30, 2024
fisharebest added a commit to fisharebest/dbal that referenced this issue May 29, 2024
fisharebest added a commit to fisharebest/dbal that referenced this issue Jun 4, 2024
fisharebest added a commit to fisharebest/dbal that referenced this issue Jun 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants