Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use Native JSON type instead of LONGTEXT for MariaDB #5916

Merged
merged 4 commits into from Mar 6, 2023

Conversation

cgknx
Copy link
Contributor

@cgknx cgknx commented Feb 9, 2023

Q A
Type improvement
Fixed issues #3202

Summary

Adds a new MariaDb1043Platform with JSON type instead of LONGTEXT to be used for MariaDb versions at or above 10.4.3 when additional functionality specific to JSON columns was added.

Deals with introspection issue flagged in pull request 5100 (the previous attempt to add this functionality) by inversing the JSON to LONGTEXT aliasing on table introspection (if applicable based on platform and MariaDb version).

Tested against MariaDb 10.6.11. See commit message for further details.

Copy link
Member

@derrabus derrabus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for working on this feature.

src/Platforms/MariaDBPlatform.php Outdated Show resolved Hide resolved
src/Schema/MySQLSchemaManager.php Outdated Show resolved Hide resolved
tests/Functional/Types/JsonTest.php Outdated Show resolved Hide resolved
tests/Functional/Types/JsonTest.php Outdated Show resolved Hide resolved
@cgknx cgknx force-pushed the native-mariadb-json branch 2 times, most recently from 246bf28 to 355c62a Compare February 15, 2023 19:10
@cgknx cgknx requested a review from derrabus February 16, 2023 12:09
@cgknx cgknx force-pushed the native-mariadb-json branch 2 times, most recently from 0c46972 to fc88321 Compare February 20, 2023 09:49
*
* @return array{string, string}
*/
public function getColumnTypeSQLSnippets(string $tableAlias = 'c'): array
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this override happen in MariaDb1043Platform instead?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I put it in the base MariaDbPlatform to allow flexibility to choose between JSON and LONGTEXT columns for json types. It calls the parent where LONGTEXT is chosen and the executable comments prevent the additional SQL running on versions of MariaDb that don't support it. In other words, I saw adding the method to the base platform as a flexible way of supporting json columns on all versions of MariaDb. If it's confusing, I can move it to the specific platform (i.e. 1043).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, let's move it. This also allows us to get rid of those conditional comments.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will you have time to make this adjustment?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will do. I may have time tomorrow but more likely at the weekend.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have updated the pull request as follows:

  1. I've moved the getColumnTypeSQLSnippets() method to MariaDb1043Platform. Since it made little sense to override certain methods in MariaDBPlatform and others in MariaDb1043Platform, I also moved getListTableColumnsSQL() and getColumnTypeSQLSnippets() meaning there are no changes to the base MariaDBPlatform class.

  2. I added a couple of tests to ensure MariaDb's treatment of collation and character set did not cause problems (MariaDb forces collation to binary and reports it to doctrine so without these patches, the comparator sees a change from the general collation it was expecting and was generating incorrect migrations).

  3. I spent time tracking through where this enforced collation triggered the unnecessary migrations and, as a result, moved the fix from the originally private columnToArray() method to getColumnDeclarationSQL(), the column parameter for which was generated by columnToArray(). This is the method called by columnsEqual() to detect table changes. The amended patch is functionally equivalent to the earlier version. I still so no way of moving this into the schema manager as columnsEqual() is called on the metadata based table and the introspected table and the schema manager doesn't see the metadata based table.

  4. Once the collation issues are dealt with, contrary to what I'd said earlier, no migration is generated when moving from the old to new platforms, i.e. existing json tables are not upgraded to native JSON. This is because the columns are marked with DC2Type comments which override detection of underlying column types i.e. old and new columns appear equal to the comparator. I have added a final patch (it currently has effect only for MariaDb1043 platforms) which also compares the reported database data type where DC2Types are specified. This enables the comparator to detect the LONGTEXT to JSON change. If similar issues occur in future, the functionality could be activated for other platforms by overriding the new AbstractPlatform->columnDeclarationsMatch() method but the relevant platform schema manager would also need patching to detect the schema change and I have only done this for MySQLSchemaManager in the proposed patch.

I have kept the above separate to the original patches to make your review easier (at least, that's what I intended). If you're okay with the proposals, I'd be happy to squash items 1, 2 and 3 into the original patches but I think 4 makes sense kept separate or, perhaps, dropped and users would then either have to manually migrate MariaDb JSON columns or live with the old LONGTEXT declarations for old tables.

Sorry this has taken me a while to amend - it turned out to be rather more involved than I at first thought and I wanted to make sure I had tested it locally before burdening you.

@derrabus
Copy link
Member

Sorry that this PR is taking so long, but my open source time is limited at the moment and I need to get my head around how MariaDB treats JSON columns and how your proposed change reacts to it.

So basically, if I tell MariaDB to create a column of type JSON, it will instead create a LONGTEXT column and add additional constraints to it. And what your patch does is it detects this special kind of LONGTEXT column on schema introspection and reports a JSON column instead. Did I get this right?

@cgknx
Copy link
Contributor Author

cgknx commented Feb 26, 2023

Exactly right. And in addition, MariaDb also forces column collation to utf8mb4_bin irrespective of table or column specification so collation and charset need to be ignored when introspecting a MariaDb JSON column as they will likely not match what the dbal thinks they should be.

@derrabus
Copy link
Member

Okay, got it. The MariaDB folks really chose an odd way of implementing JSON columns. 😓

Assuming an application has created a table with JSON columns using a previous version of DBAL and generates a migration after your patch has been applied, what would happen? The table would have LONGTEXT columns as it had before and the only difference would be that the constraint as been added and the charset would've changed to utf8mb4?

@cgknx
Copy link
Contributor Author

cgknx commented Feb 26, 2023

After the patch, the dbal would detect that schema no longer matched the definition and the generated migration would alter the table from LONGTEXT to JSON (and MariaDb would then automatically add the CHECK constraint and change the column collation).

@derrabus derrabus added this to the 3.7.0 milestone Feb 26, 2023
@derrabus derrabus changed the base branch from 3.6.x to 3.7.x March 2, 2023 19:29
@cgknx cgknx force-pushed the native-mariadb-json branch 2 times, most recently from f2ff1ad to ae4d249 Compare March 4, 2023 17:39
Copy link
Member

@derrabus derrabus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The part where we invalidate the definition of a column that has been inferred from a type comment looks really odd to me.

@greg0ire, I think you've worked on phasing out the type comments lately. Can you have a look at this PR please?

Comment on lines 38 to 40
if ($mariadb && version_compare($this->getMariaDbMysqlVersionNumber($version), '10.4.3', '>=')) {
return new MariaDb1043Platform();
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've just merged #5946 which introduces a platform class for MariaDB 10.5.2. We need to adjust the inheritance chain accordingly.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated. About to rebase and push.

Comment on lines 116 to 127
/**
* Whether the database data type matches that expected for the doctrine type
*/
public function columnDeclarationsMatch(Column $column1, ?Column $column2 = null): bool
{
$column2 ??= $column1;

return ! (
$column1->hasPlatformOption('declarationMismatch') ||
$column2->hasPlatformOption('declarationMismatch')
);
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code does not appear to be specific to MariaDB at all. We're merely checking for a flag that the schema manager might have set during comparison, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right that the code is generic. Moved to AbstractPlatform and the matching test case to AbstractPlatfformTestCase.

*
* To be implemented in specific platforms as necessary.
*/
public function columnDeclarationsMatch(Column $column1, ?Column $column2 = null): bool
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apart from tests: When is this method ever called with one parameter or with the second parameter set to null?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Method signature amended to require two parameters.

@greg0ire
Copy link
Member

greg0ire commented Mar 5, 2023

I think you've worked on phasing out the type comments lately. Can you have a look at this PR please?

Yes, it was in #5107

My understanding is that this PR has code to deal with such comment, but does not rely on them, and will work on 4.0.x as well. After merging it up, there will be useless code that should be removed on 4.0.x, correct?

@cgknx
Copy link
Contributor Author

cgknx commented Mar 5, 2023

That sounds right. This bit of the code is only necessary because the DC2Type overrides the underlying LONGTEXT type with JSON which, in this case, is the same as the desired type so the comparator sees JSON to JSON i.e. no change. Without the DC2Type, the comparator would see LONGTEXT to JSON and the migration would be generated, i.e. removing the DC2Type solves the problem rendering part of the patch irrelevant for 4.0.x.

greg0ire
greg0ire previously approved these changes Mar 5, 2023
MariaDb aliases columns specified as JSON to LONGTEXT. Since version 10.4.3,
MariaDb adds a CHECK constraint to JSON columns to ensure they contain valid
JSON and sets the column collation to utf8mb4_bin.

Simply setting JSON as the column type in the platform results in introspection
failures as the database reports LONGTEXT (not JSON) with changed collation.

Therefore, inverse the aliasing where the relevant check constraint exists and
ignore collation changes for JSON columns in the relevant database versions.

The main functional changes are in:
 - MariaDB1043Platform->getListTableColumnsSQL() and
 - MySQLSchemaManager->selectTableColumns()

MariaDb1043Platform extends MariaDb1027Platform rather than MariaDBPlatform
to ensure the platform checks in MySQLSchemaManager (based on
MariaDb1027Platform) continue to work as expected.

The alternative approach of adding the CHECK constraint to the SQL for the
column is not robust as MariaDb syntax seems to require the CHECK constraint to
be the final part of the column declaration and it was not obvious how to
guarantee this (specifically, appending the CHECK constraint to the output of
getJsonTypeDeclarationSQL() did not work).

New public methods:
MariaDBPlatform->getColumnTypeSQLSnippets() generates SQL snippets to
reverse the JSON to LONGTEXT aliasing for JSON-specified columns. It is also
used in MySQLSchemaManager. It still checks that json columns are JSON data
type so that switching the getJsonTypeDeclarationSQL to LONGTEXT is all that
is necessary to revert to old behaviour which is helpful for testing.

Overridden methods in MariaDb1043Platform:
MariaDb1043Platform->getJsonTypeDeclarationSQL(). To return JSON.

MariaDBPlatform->getListTableColumnsSQL(). To reverse the JSON to LONGTEXT
                                           aliasing carried out by MariaDb.

MariaDBPlatform->getColumnDeclarationSQL(). To unset collation and charset
                                            (used by the comparator).

New test cases:
1. Types/JsonTest. Test storage and retrieval of json data.
2. Platforms/MariaDb1043PlatformTest. A clone of MariaDb1027Platform test.
3. Schema/MySQLSchemaManagerTest->testColumnIntrospection(). Ensures
   introspected table matches original table. Tests all doctrine types not
   just json. Based on failure highlighted in pull request doctrine#5100.

For previous discussion on this topic, see:
doctrine#5100
doctrine#3202

Further background at:
https://mariadb.com/kb/en/json-data-type/
https://mariadb.com/kb/en/information-schema-check_constraints-table/
https://jira.mariadb.org/browse/MDEV-13916

Notes regarding JsonTest.php:
1. Some platforms return json as a stream so convert streams to strings.

2. 'json_table' is a reserved word in MySQL (and a function name in
   Oracle SQL). Use json_test_table instead.

3. Some platforms reorder json arrays so the insert order will not
   necessarily match the select order. Resort the arrays before testing
   for equality.
Select MariaDb1043Platform where MariaDb database is at version 10.4.3
or later.
MariaDb sets character set to utf8mb4 and collation to utf8mb4_bin
for columns declared as JSON but still reports them as LONGTEXT
with CHARACTER SET and COLLATE so comparator needs to ignore
character set and collation for columns declared as JSON.

Attempting to set CHARACTER SET or COLLATE for values other than
utf8mb4 and utf8mb4_bin causes a database error on MariaDb.

Test that character set and collation specifications are not passed
to the database and are ignored by the comparator.
MariaDb JSON columns are marked as json with a DC2Type comment hint.

The schema manager maps such columns to a json doctrine type which
matches the metadata whether the column was declared as LONGTEXT (by the
previous MariaDb1027 platform) or JSON. This means the comparator will
not detect that a column should be upgraded from LONGTEXT to native JSON
and no migration would be generated.

Therefore, during column introspection check that the data type of the
introspected columns match that which should be set for any doctrine
types inferred from DC2Type comments. Set a flag (a platformOption
for the relevant column) where it does not.

Check whether the flag is set when the comparator diffs the expected
and actual tables and mark a column type difference where the flag is
set so a migration is generated.
@derrabus derrabus merged commit 42877bf into doctrine:3.7.x Mar 6, 2023
@cgknx cgknx deleted the native-mariadb-json branch March 6, 2023 20:50
@derrabus derrabus mentioned this pull request Mar 6, 2023
derrabus pushed a commit that referenced this pull request Sep 29, 2023
|      Q       |   A
|------------- | -----------
| Type         | bug
| Fixed issues | contao/contao#6409

#### Summary

Since #5916 some column
configurations always show up as changed in the Comparator. For example
a `Types::SIMPLE_ARRAY, ['length' => 255]` column.

This pull request moves the `->expectedDbType()` check to the correct
position to fix this issue
and adds a test that verifies the fix.
@PowerKiKi
Copy link
Contributor

This is a breaking change because MariaDB will throw failed CONSTRAINT if inserting/updating with invalid json value such as an empty string. So it requires updating our application code to always submit a valid JSON.

I realize it sound a bit strange to complain that I cannot store invalid JSON anymore, however the default value of empty string was supported before and was a nice convenience.

@derrabus
Copy link
Member

derrabus commented Nov 2, 2023

I realize it sound a bit strange to complain that I cannot store invalid JSON anymore

This is why I would not consider this as a breaking change. If you declare a column as JSON, storing JSON is covered by the contract of this library. Storing something else is not.

the default value of empty string was supported before and was a nice convenience

Use NULL instead.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants