Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make the comparator API platform aware #4659

Closed
wants to merge 16 commits into from

Conversation

ausi
Copy link
Contributor

@ausi ausi commented May 29, 2021

Q A
Type feature
BC Break no
Fixed issues #2663 #4551

This pull request adds platform awareness to the comparator API as discussed in #4551 (comment)

I think it works pretty well as shown by the tests. Some tests that previously used code like this:

$diff = (new Comparator())->diffTable($table1, $table2);
self::assertNotFalse($diff);
self::assertEmpty($this->platform->getAlterTableSQL($diff));

Could be changed to the following code because the comparator now correctly detects these cases would end up in an empty ALTER TABLE:

$diff = (new Comparator())->diffTable($table1, $table2);
self::assertFalse($diff);

One failing test remains: ComparatorTest::testDiffColumnPlatformOptions as I was not able to understand what exactly this test expects to happen. With the new comparator design it should be OK to report all platform option changes that have an effect in the resulting SQL code I think.


For backwards compatibility I would make the $platform argument defaulting to null execute the old code for diffColumn in this case and triggering a deprecation notice. This should make the upgrade path for users of this library pretty straight forward.

Copy link
Member

@morozov morozov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the patch, @ausi. API wise it looks good but logic-wise we need to work through some details. Please see the comments inline.

src/Schema/Comparator.php Show resolved Hide resolved
src/Schema/Comparator.php Outdated Show resolved Hide resolved
src/Schema/Comparator.php Outdated Show resolved Hide resolved
src/Schema/Comparator.php Outdated Show resolved Hide resolved
src/Schema/Comparator.php Outdated Show resolved Hide resolved
@morozov
Copy link
Member

morozov commented May 29, 2021

[...] what exactly this test expects to happen.

The test expects the comparator to identify a change because the current comparator isn't platform-aware. The new one is, and it understands that the "bar" option doesn't apply to any platform.

With the new comparator design it should be OK to report all platform option changes that have an effect in the resulting SQL code I think.

Right. But the test needs to be reworked to test a real platform-specific option with a specific platform that supports it.

@ausi
Copy link
Contributor Author

ausi commented May 30, 2021

But the test needs to be reworked to test a real platform-specific option with a specific platform that supports it.

Done in 5dbb715.

API wise it looks good but logic-wise we need to work through some details.

I think all comments should be addressed now.

Should I proceed with the deprecation layer to make the changed comparator API backwards compatible?

@ausi
Copy link
Contributor Author

ausi commented May 30, 2021

Should I proceed with the deprecation layer to make the changed comparator API backwards compatible?

Done in d2a53b5.

src/Schema/Comparator.php Outdated Show resolved Hide resolved
src/Schema/Comparator.php Outdated Show resolved Hide resolved
src/Schema/Comparator.php Outdated Show resolved Hide resolved
src/Schema/Comparator.php Outdated Show resolved Hide resolved
tests/Functional/Schema/ComparatorTest.php Outdated Show resolved Hide resolved
tests/Platforms/AbstractMySQLPlatformTestCase.php Outdated Show resolved Hide resolved
tests/Platforms/AbstractMySQLPlatformTestCase.php Outdated Show resolved Hide resolved
tests/Platforms/AbstractMySQLPlatformTestCase.php Outdated Show resolved Hide resolved
tests/Schema/ComparatorMySQLTest.php Show resolved Hide resolved
@@ -199,8 +212,18 @@ public function diffSequence(Sequence $sequence1, Sequence $sequence2)
*
* @throws SchemaException
*/
public function diffTable(Table $fromTable, Table $toTable)
public function diffTable(Table $fromTable, Table $toTable, ?AbstractPlatform $platform = null)
Copy link
Member

@morozov morozov May 30, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe we could use this as an opportunity to introduce better method names. diffTable() could be renamed to compareTables(). This way, we won't have to introduce an optional parameter but could introduce the whole old method. The same applies to the rest of the methods except for compareSchemas() which already exists.

Copy link
Contributor

@trompette trompette Jul 10, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yesterday I openned #4707 to introduce a more consistent Comparator API.

But as @ausi commented, it will make merging this PR harder, mainly because of the optional $platform parameter on all public methods.

After sleeping on it, I think injecting the platform as a constructor parameter when creating a Comparator object would help with avoiding conflcts and would make sense design wise: it seems comparing schemas depends on the platform, given all the linked pull requests.

Actually, it is precisely for enabling this type of API evolution that #4707 deprecates static calls of compareSchemas(), as stated in #4704 (comment).

What do you think ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But as @ausi commented, it will make merging this PR harder, mainly because of the optional $platform parameter on all public methods.

Let's not worry about that. I can help resolve all the conflicts once they occur.

I think injecting the platform as a constructor parameter when creating a Comparator object would help with avoiding conflcts and would make sense design wise: it seems comparing schemas depends on the platform, given all the linked pull requests.

Whether it's injected in the constructor or passed as an argument, it's the same design-wise: the comparator does depend on the platform. The question is, whether or not a given comparator instance needs be usable with different platforms, and the answer is "probably not". I think we can move the platform dependency a constructor argument as originally proposed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The compareColumns() method now returns bool in this pull request. This would conflict with #4707 (but only if there would actually be a version release between the two pull requests).

@morozov
Copy link
Member

morozov commented Jul 9, 2021

@ausi, I linked a few pull requests that will be superseded by this one once it's done. I believe there are at least as many open issues (to be identified), so it appears to be a quite important improvement. No pressure, just FYI 😅

@ausi
Copy link
Contributor Author

ausi commented Jul 11, 2021

@morozov I think I addressed all comments now. This should be ready for another review 🎉

@ausi
Copy link
Contributor Author

ausi commented Jul 11, 2021

Some tests seem to have problems with the platformOptions now if they come from the live database. E.g. https://github.com/doctrine/dbal/runs/3040652078?check_suite_focus=true#step:7:115

AFAIK this is due to charset and collation being reported even if they use the default.

For example, offline schema:

'platformOptions' => [],

online schema:

'platformOptions' => [
    'charset' => 'utf8',
    'collation' => 'utf8_unicode_ci',
]

I’m not sure how to best fix this with the concept of doctrine, as the compareColumns() method doesn’t know about the defaults of a specific table.

One possibility would be that the schema manager stores the information about which platformOptions don’t differ in the Column object. Like an array of platform options that match the default value. This would make it possible for the Comparator to compare the meaning of these options.

@morozov
Copy link
Member

morozov commented Jul 12, 2021

Why did this issue surface only now? As far as I see from debugging the test, when the unexpected diff is produced, the columns are still compared on a per-property basis. This is how it already works.

One possibility would be that the schema manager stores the information about which platformOptions don’t differ in the Column object.

I believe it's a bit risky. Omitted values on the defining side mean that the defaults on the database should kick in. If an online column has different properties, the comparison should result in a diff but it won't which is a bug.

What we could do instead is explicitly add the defaults to the DBAL version of the column before comparison. They are currently implemented in:

// Charset
if (! isset($options['charset'])) {
$options['charset'] = 'utf8';
}
$tableOptions[] = sprintf('DEFAULT CHARACTER SET %s', $options['charset']);

We'll also need to double-check how it will work when the defaults are omitted (see #4644).

@ausi
Copy link
Contributor Author

ausi commented Jul 12, 2021

Why did this issue surface only now?

Currently doctrine only compares platform options that exist in both Column instances, see

$platformOptions1 = $column1->getPlatformOptions();
$platformOptions2 = $column2->getPlatformOptions();
foreach (array_keys(array_intersect_key($platformOptions1, $platformOptions2)) as $key) {
if ($properties1[$key] === $properties2[$key]) {
continue;
}
$changedProperties[] = $key;
}

But this is wrong IMO as it would not detect changes from a different charset back to the default for example.

One possibility would be that the schema manager stores the information about which platformOptions don’t differ in the Column object.

I believe it's a bit risky. Omitted values on the defining side mean that the defaults on the database should kick in.

No, I meant explicitly storing information about which platform options are set to the default of the table. This way the comparator can compare the platform options and if a defining side option is missing it can check if the database side option is default or not.

What we could do instead is explicitly add the defaults to the DBAL version of the column before comparison. They are currently implemented in:

// Charset
if (! isset($options['charset'])) {
$options['charset'] = 'utf8';
}
$tableOptions[] = sprintf('DEFAULT CHARACTER SET %s', $options['charset']);

These are the table options, not the column options AFAIK. The charset/collation of the columns are set to the TABLE_COLLATION of the accociated table by default I think.

@morozov
Copy link
Member

morozov commented Jul 14, 2021

But this is wrong IMO as it would not detect changes from a different charset back to the default for example.

I agree. It would be nice to reproduce this issue separately by writing a test for it.

No, I meant explicitly storing information about which platform options are set to the default of the table.

Where can we store this information? Keep in mind that the DBAL should be able to introspect and compare the objects created by other clients.

These are the table options, not the column options AFAIK.

Yes, but eventually there will be the same problem: if we compare the DDL of the table generated from its definition in the DBAL and the DDL generated from the introspected data, there may be a mismatch due to some implied defaults. At least when the defaults are removed from the DBAL in #4644.

The charset/collation of the columns are set to the TABLE_COLLATION of the accociated table by default I think.

Yes, I think in order to compare the columns properly, we need to introspect the deployed table first. I.e.:

  1. Introspect the needed properties of the table.
  2. Augment the definition on the DBAL side with the defaults.
  3. Compare. At this point, all properties should be explicitly set on both sides of the comparison.

@ausi
Copy link
Contributor Author

ausi commented Jul 17, 2021

I tried out the concept now as you can see in the last two commits. It works pretty well but there is one major issue: the default for the collation depends on the charset option. So only changing the charset without specifying the collation would still result in wrongly detected diffs between the online and offline table.

But this is wrong IMO as it would not detect changes from a different charset back to the default for example.

I agree. It would be nice to reproduce this issue separately by writing a test for it.

Done in 4c061bb

No, I meant explicitly storing information about which platform options are set to the default of the table.

Where can we store this information?

I stored it in the Column objects now in 8fad743


I think it could make sense to let the abstraction gain knowledge about charset and collation to handle these cases better, instead of just putting them in the platformOptions array. But I don’t have experience in the other database systems, so I don’t know if there is a system that all databases agree on there.

Copy link
Member

@morozov morozov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it could make sense to let the abstraction gain knowledge about charset and collation to handle these cases better, instead of just putting them in the platformOptions array.

I don't think we're there yet. I see quite some bits in the PR that I don't understand and I don't understand the overall design either. For instance, the property-based comparison in the presence of a specific platform goes again the entire idea of comparing SQL to SQL.

I don't think any platform-specific column parameters should be promoted to the DBAL: handling charsets is done differently in different platforms, and there's no way it could be properly described in an abstract API. It should be the opposite: we expose the platform to the comparator, so if any platform-specific comparison needs to be done, it should delegate it to the platform.

Please let me know if you need help rebasing the PR. It might help if you squash all the commits. This way, the least possible number of merge conflicts will have to be resolved.

@@ -2254,6 +2266,13 @@ public function getColumnDeclarationSQL($name, array $column)
return $name . ' ' . $declaration;
}

public function getColumnSQL(string $name, Column $column): string
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From the purpose perspective, how is it different from the existing getColumnDeclarationSQL()? I'd rather add the support for Column $column to the existing method.

'WHERE COLLATION_NAME = TABLES.TABLE_COLLATION ' .
') FROM information_schema.TABLES ' .
'WHERE TABLE_SCHEMA = ' . $database . ' AND TABLE_NAME = ' . $table .
'), @@character_set_database) AS DefaultCharacterSet, ' .
Copy link
Member

@morozov morozov Jul 17, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I read it right, it means that if the charset is not specified on the column, the query will return the default database charset. Does the database work this way?

I.e. if I create a column and don't specify the charset, will it be introspected as NULL? And furthermore, if I change the default charset, will it affect all columns with the default charset? That would be weird. I'd expected it to be copied to the column metadata during creation or maybe modification.

@@ -44,6 +44,9 @@ class Column extends AbstractAsset
/** @var mixed[] */
protected $_platformOptions = [];

/** @var mixed[] */
protected $_platformDefaults = [];
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn't look like it belongs here. A column is a value object representing the specification defined by the user or introspected from the database. I'd say it's the responsibility of the platform to represent its defaults if needed for comparison.

* of the column being declared as array indexes.
* See getColumnDeclarationSQL() for more information.
*/
protected function _getColumnArrayFromColumn(Column $column): array
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's better to declare it private. Also, please do not use the leading underscore for new methods.

$changedProperties = [];

foreach (array_keys(array_merge($properties1, $properties2)) as $key) {
if (($properties1[$key] ?? null) === ($properties2[$key] ?? null)) {
Copy link
Member

@morozov morozov Jul 18, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we still need to do a property-based comparison while the platform is available? Note that the strict comparison alone will trigger false-positive diffs.

@@ -22,7 +22,7 @@ protected function setUp(): void
parent::setUp();

$this->schemaManager = $this->connection->getSchemaManager();
$this->comparator = new Comparator();
$this->comparator = new Comparator($this->schemaManager->getDatabasePlatform());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please do not modify existing tests. All existing tests should pass without changes.

*
* @return string[]
*/
public function diffColumn(Column $column1, Column $column2)
public function diffColumn(Column $column1, Column $column2): array
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Enforcing the return type on a method is a breaking change, please revert.

*
* @return Column[]
*/
private function stripDefaultValuesFromColumns(Column $column1, Column $column2): array
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Stripping properties based on their value may be unnecessarily complex and error-prone. I'd rather use the opposite approach and augment the definition provided by the application with the database defaults.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

augment the definition provided by the application with the database defaults

At which point would you do that? The column doesn’t store a reference to the table nor to the database connection.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question. Regardless of whether we strip the defaults or augment the definition, we'll need the DB connection to understand what the omitted options default to. But at the same time, we want to be able to compare pure definitions, w/o any runtime component.

Even if we keep the approach of stripping the defaults, false positives will be possible. E.g. the offline column has its charset explicitly set to the default value, but the online column has it stripped from the definition. Comparing the two will yield a diff, however, it shouldn't.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But at the same time, we want to be able to compare pure definitions, w/o any runtime component.

It seems to be impossible by definition: whether or not the two definitions are different depends on the runtime. Instead of making the comparator dependent on the platform, it may need to depend on the schema manager which can use both the platform and the connection to normalize the offline column definition.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even if we keep the approach of stripping the defaults, false positives will be possible. E.g. the offline column has its charset explicitly set to the default value, but the online column has it stripped from the definition. Comparing the two will yield a diff, however, it shouldn't.

That case is already handled by my current implementation. If one or both columns have platform defaults, these defaults are stripped from both columns (online and offline). This way all comparisons should be possible (offline to offline, online to offline and online to online). If two online columns are compared and they have different platform defaults my current implementation does not strip the options.

@morozov
Copy link
Member

morozov commented Aug 20, 2021

FWIW, I'm close to finishing #4746 which takes the approach from here and reimplements this w/o introducing the undesired API changes. Just need to implement a comparator for SQL Server and decide on the way to rework unit tests which don't have access to a database connection (needed to determine the default database collation for SQL Server).

@morozov
Copy link
Member

morozov commented Aug 23, 2021

Closing in favor of #4746.

@morozov morozov closed this Aug 23, 2021
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Aug 24, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants