[DMIG-30] Make :migrations:diff smart and generate Schema object changes, not SQL directly #79

Open
wants to merge 9 commits into
from

Projects

None yet
@tyler-sommer

See: http://www.doctrine-project.org/jira/browse/DMIG-30

I refactored the way DiffCommand generates migrations, moving the SQL generation into Migrations\Generator\SqlGenerator.

I then created PhpGenerator that iterates over the given SchemaDiff and generates PHP code for the migration, using DBAL\Schema and related.

Usage (with symfony and MigrationsBundle): app/console doctrine:migrations:diff --generator=php

There are a couple of hitches, though: dropping indexes and dropping foreign keys.

Table#dropIndex was added in 2.2, but seeing as how migrations currently supports doctrine 2.0 and greater, I didn't want to rely on that.

So, the generator uses Reflection to get the Table$_indexes and Table$_fkConstraints property and then modify them. Terrible, I know, but I don't see a way around it.

I am thinking the generator should simply not support the removing of foreign keys. I've used this code exclusively for a production application for the past few months, and dropping foreign keys in MySQL requires careful ordering of the SQL commands. Modifying the Schema object doesn't really allow you to enforce an order, and much of the time it simply does not work-- complaining about "General error: 1025 Error on rename of ..."

As for dropping of indexes, I was thinking I could check the Doctrine version to see if Table#dropIndex is available, otherwise fallback to the reflection.

Otherwise, I realize it is pretty messy. It needs to be refactored and cleaned up quite a bit. I just wanted some input on how to move forward before I spend more time on it.

Thoughts or suggestions?

@stof
Member
stof commented Jun 28, 2012

you can rely on 2.2 IMO for a refactoring. 2.0 is EOL since several months, and 2.1 will be EOL soon (if it is not already).

@stof stof commented on an outdated diff Jun 28, 2012
lib/Doctrine/DBAL/Migrations/Generator/PhpGenerator.php
+use Doctrine\DBAL\Migrations\Configuration\Configuration;
+use Doctrine\DBAL\Schema\Comparator;
+use Doctrine\DBAL\Schema;
+use Doctrine\DBAL\Types\Type;
+
+/**
+ * Generates migrations using PHP-native DBAL methods
+ *
+ * @author Tyler Sommer <sommertm@gmail.com>
+ */
+class PhpGenerator implements GeneratorInterface
+{
+ /**
+ * @var Doctrine\DBAL\Migrations\Configuration\Configuration
+ */
+ protected $_configuration;
@stof
stof Jun 28, 2012 Member

please don't use the _ for protected and private properties. This is a coding standard inherited from PHP 4 and Doctrine stopped using long ago (even if code written before is still using it because of BC)

@stof stof commented on an outdated diff Jun 28, 2012
lib/Doctrine/DBAL/Migrations/Generator/PhpGenerator.php
+ $options['autoincrement'] = $column->getAutoincrement();
+ }
+
+ if ($column->getComment() !== null) {
+ $options['comment'] = $column->getComment();
+ }
+
+ return $options;
+ }
+
+ /**
+ * @param \Doctrine\DBAL\Schema\AbstractAsset $asset
+ *
+ * @return string
+ */
+ protected function _getQuotedIdentifier(Schema\AbstractAsset $asset)
@stof
stof Jun 28, 2012 Member

please don't use the underscore for protected methods

@stof stof and 1 other commented on an outdated diff Jun 28, 2012
lib/Doctrine/DBAL/Migrations/Generator/PhpGenerator.php
+ /**
+ * Constructor
+ *
+ * @param Doctrine\DBAL\Migrations\Configuration\Configuration A Migration configuration
+ */
+ public function __construct(Configuration $configuration)
+ {
+ $this->_configuration = $configuration;
+
+ $reflected = new \ReflectionClass('Doctrine\DBAL\Schema\AbstractAsset');
+ $this->_assetQuotedProperty = $reflected->getProperty('_quoted');
+ $this->_assetQuotedProperty->setAccessible(true);
+
+ $reflected = new \ReflectionClass('Doctrine\DBAL\Schema\ForeignKeyConstraint');
+ $this->_foreignKeyOptionsProperty = $reflected->getProperty('_options');
+ $this->_foreignKeyOptionsProperty->setAccessible(true);
@stof
stof Jun 28, 2012 Member

do you really need to use reflection here ? It is clearly a hack in this place.

@tyler-sommer
tyler-sommer Jun 28, 2012

@stof It seems they are necessary. Both properties are needed to recreate the changes in the Schema, but neither are exposed by any method.

The AbstractAsset$_quoted property is necessary to know if a column or table name should be quoted.

ForeignKeyConstraint$_options is necessary to create a new foreign key. The only way to retrieve an option is to know its name and use ForeignKeyConstraint#getOption -- and thats just for a single option. Is there a comprehensive list of available options?

Both could be easily solved in DBAL, but that would make this non-BC for everything except the next version of DBAL and later.

Thanks for your feedback. It's very much appreciated!

@stof
stof Jun 28, 2012 Member

then please send a PR on DBAL to make them accessible in 2.3

@jwage
Member
jwage commented Jun 28, 2012

This is awesome. I've always wanted to do this!

@beberlei
Member

coolcoolcool :-)

@stof stof commented on an outdated diff Jun 28, 2012
lib/Doctrine/DBAL/Migrations/Generator/PhpGenerator.php
+ $column->getType()->getName(),
+ $this->exportVar($this->getColumnOptions($column))
+ );
+ }
+
+ /**
+ * @param \Doctrine\DBAL\Schema\Index $index
+ *
+ * @return string
+ */
+ protected function getCreateIndexCode(Schema\Index $index)
+ {
+ if ($index->isPrimary()) {
+ $str = '$table->setPrimaryKey(%s, \'%s\');';
+ }
+ else if ($index->isUnique()) {
@stof
stof Jun 28, 2012 Member

else if should be on the same line than the closing curly brace

@stof stof commented on the diff Jun 28, 2012
lib/Doctrine/DBAL/Migrations/Generator/PhpGenerator.php
+ /**
+ * @param \Doctrine\DBAL\Schema\ForeignKeyConstraint $foreignKey
+ *
+ * @return array
+ */
+ protected function getForeignKeyOptions(Schema\ForeignKeyConstraint $foreignKey)
+ {
+ return $this->foreignKeyOptionsProperty->getValue($foreignKey);
+ }
+
+ /**
+ * @param \Doctrine\DBAL\Schema\Column $column
+ *
+ * @return array
+ */
+ protected function getColumnOptions(Schema\Column $column) {
@stof
stof Jun 28, 2012 Member

the curly brace should be on its own line

@stof
Member
stof commented Jun 28, 2012

I see a weird stuff here: the generated code is not a migration class, meaning this migration cannot be applied like others, and cannot be rollbacked.

@tyler-sommer

@stof Code is generated for both the up and down methods, and then applied to the template defined in GenerateCommand.

Thanks for all of your notes. I'll take care of the things you've pointed out as well as clean everything up a bit as I have time.

@stof
Member
stof commented Jun 28, 2012

ah yeah, indeed. I missed the way it was integrated in the command. This is great (and I think we could make it the default generation as it makes the generated migrations cross-platform)

@tyler-sommer

@stof After digging in a bit more, it seems that as of DBAL 2.2.0, removing of foreign keys is supported. However, dropping of indexes isn't going to be supported until 2.3. What would you advise as I move forward?

Should I leave the "reflection hack" as it currently is, and deal with these things once 2.3 comes out?
This would maintain support for 2.0 and up, but leave lots of nasty hacked code around.

Should I throw an exception if the PhpGenerator is used with anything less than DBAL 2.2 and leave the "reflection hack" in for only dropping of indexes?
This would still leave a little bit of ugly hacked code in generated code.

Should we simply wait to add this functionality until DBAL 2.3 is released?
This would allow proper implementation using only the exposed DBAL apis, without any hacking.

I'm leaning toward the last one, since that is the cleanest and really only complete solution.

@stof
Member
stof commented Jul 4, 2012

@beberlei what do you think about it ?

@fran6co
fran6co commented Oct 22, 2012

2.3 is out, are the version concerns still valid?

@tyler-sommer

@Fran6co Good to see some interest. I've been caught up with other work, but I'll try to find some time to dig into this this weekend and see where it stands with 2.3.

@fran6co
fran6co commented Nov 7, 2012

@tyler-sommer Awesome!

@tyler-sommer

A bit late, but here we are. The PhpGenerator requires 2.3.0 or later. I still need to write tests for it.

Side note, on contributing, I did not rebase or merge my local repository with the upstream master. Is this correct?

@stof
Member
stof commented Nov 15, 2012

@tyler-sommer Github tells me the branch conflicts with the current master branch, so rebasing would be great.

@tyler-sommer

@stof I think that did it. Would you mind taking a look to see if I broke everything?

@jrjohnson

This is some great work! I applied as a path and it saved me a couple of tedious hours. Thanks!
Any thing I can do to help get this merged?

@Baachi
Baachi commented Apr 16, 2013

@tyler-sommer What is the status of this PR?

@hason
hason commented May 21, 2013

👍

@DavidMikeSimon

👍 This feature is essential for using migrations in test environments.

@evillemez

@tyler-sommer Yeah, this is awesome and badly needed... any news?

@evillemez evillemez referenced this pull request Mar 26, 2014
Closed

[WIP] 2.0 Prototype #162

@mikeSimonson mikeSimonson added the 3.0 label Feb 14, 2015
@mikeSimonson mikeSimonson modified the milestone: 1.1 Feb 15, 2015
@stof
Member
stof commented Jun 30, 2015

I'm actually -1 on this change. Changing a schema object and then executing generated SQL update queries is totally unsafe, which is exactly why the schema:update command of the ORM is considered unsafe for prod.
Unless your database is empty, it is very rare to be able to execute the generated schema changes directly, because you don't only need to migrate the schema, but also the data. And data migrations will almost always need to be performed in the middle of the schema migration rather than at the end.

If you want to have DB-agnostic migrations, a better alternative would be to generate code building the SQL with the Platform API (instead of calling the platform at generation time and writing the output SQL in the migration), which would still allow to insert extra queries in between (though it might still be hard in some cases, and you would have to take care of making the data migration DB-agnostic too).

So 👎 for that

@mikeSimonson
Collaborator

@stof I have not reviewed that PR yet, but doesn't your remark also apply to the way doctrine migrations behave now ?

@stof
Member
stof commented Jun 30, 2015

@mikeSimonson no, because the current system generates a bunch of SQL queries and then give you the power to edit the generated migrations the way you want, inserting SQL queries anywhere between the generated ones

@felicitus

I'm really looking forward to see this feature. As an application developer where users download and install my application, I want to be able to support as many databases as possible. DoctrineORM already gives me the power to write code once and mostly ignoring database differences.

With Doctrine Migrations, I would need to write and test migrations for each database platform, even for platforms which I can't test due to licensing etc.

@stof
Member
stof commented Aug 2, 2015

@felicitus see my comment in #79 (comment) The way the generation works in this PR suffers from many drawbacks.

And you will still have to test your migrations on supported platforms, because your migrations need to migrate data, not only the schema (if you can afford dropping all prod data before migrating, just use the schema:update command of the ORM and it will work on all supported platforms. But real systems cannot afford migrating only empty databases). And the migration of data will always be queries written by you.

@tyler-sommer

For what it's worth, after having used Doctrine and Migrations in production for several years now, I have to agree with @stof.

I've found that the abstraction offered by the Schema API is not sufficient for a lot of migrations. In many cases, migrations are not simply "add a column" or "remove a column". Migrations often require the creating, copying, and modifying of data during the migration. These kind of operations cannot be abstracted by DBAL alone, and so must be handwritten in the necessary SQL dialect.

I'm not necessarily against trying to get this merged, as I think it is still useful in some scenarios (like generating initial migrations for brand new deployments). That being said, I can't personally commit to getting this PR polished for merging in any sort of near timeframe. I'd be more than happy to give someone else access to my fork if they want to take up the endeavour-- or you could fork it yourself and open a new PR.

Cheers!

@lrlopez
lrlopez commented Dec 17, 2015

Hi guys,

I've been thinking about this PR for some time. Here are my thoughts: What @tyler-sommer proposes is an alternative method of generating migrations, not a replacement of the classic one (platform specific SQL).

I agree with @stof, this is no magic bullet that allows automatic and platform agnostic migrations, but neither is the platform specific SQL currently generated. Developers should know that eventually some data must be changed in order to migrate, and that code must be written by hand. Many of these data migration operations can be written in a ANSI-SQL standard way so they can work in most (if not all) database engines. Some won't, but in this cases using one or many platform specific SQL queries prepended with a $this->connection->getDatabasePlatform()->getName() switch would be needed anyway.

As @tyler-sommer said on his last reply, this PR would be quite useful in some scenarios. I.e.: Translating ALTER TABLE sentences into addColumn() or addIndex() by hand is tedious and error-prone. The new generator could be a really good starting point, which could be consequently tuned by the developer.

Could you reconsider to merge this PR so we can have the choice? Actual way of migrating would be still the default one.

If @tyler-sommer don't have time to invest into it I could lend a hand into polishing.

Thanks!

@stof
Member
stof commented Dec 30, 2015

@lrlopez The issue is that using Schema changes forbids the user to provide custom SQL migrating the data in the middle of the Schema changes. And after years of using DB migrations, I can tell that at least 99% of my data migrations had to happen in the middle of the schema change, not after the schema change.

What you could do though is to generate migrations which would use the Platform object at runtime to generate some SQL and then call addSql (instead of using the Platform object at generation time). This would still allow to insert queries to migrate data in the middle of schema changes (but would still not allow to alter schema changes themselves, which is sometimes necessary to make part of it before the data change and part of it after, especially on stricter platforms like PostgreSQL).

But I will still continue to vote -1 on merging the current PR, as it can only hurt people (it works only for people migrating empty tables)

@lrlopez
lrlopez commented Dec 30, 2015

Christophe, I see your point. I agree most schema changes need data migrations as well, but there are still some valid use cases that makes this option useful (i.e. new tables with no data, new indices, new columns with default values, etc.)

Developers should be aware of the risks of using this kind of migrations in a project. I see some resemblance with the schema:update command you mentioned: it's not safe for production environments, but if the dev want to go ahead and apply an update in their prod server despite the warning, then it is their problem.

Would you agree to merge this if we add a warning prompt before generating platform independent migrations? Something along these lines:

Warning: This operation does not take into account potential data migrations needed for the schema changes. The generated code should not be executed in a production environment unless it is reviewed by the developer. Use at your own risk!

We could also add some remarks to the generated code if you wish.

For simple migrations, auto-generated DB-agnostic code would be a time saver. For more complex ones, you can always use traditional approach.

Please reconsider merging this with a warning prompt, 🚨, alarm klaxons or whatever you like 😄

Happy New Year!

@stof
Member
stof commented Jan 5, 2016

Would you agree to merge this if we add a warning prompt before generating platform independent migrations? Something along these lines:

Warning: This operation does not take into account potential data migrations needed for the schema changes. The generated code should not be executed in a production environment unless it is reviewed by the developer. Use at your own risk!

This will not help: the developer cannot review the changes, as they cannot see the SQL. Your message is something suited for the existing generation (which is not safe for data changes either if you don't modify it)

For simple migrations, auto-generated DB-agnostic code would be a time saver. For more complex ones, you can always use traditional approach.

As I said previously, generating DB-agnostic code should be done by generating code using the Platform object at runtime rather than doing the whole schema diffing at runtime. This would still not be 100% flexible (as it does not give the full SQL to the developer), but would allow to hook between any such call (and to change these calls).

Using the Schema in the migration means you are actually running schema:update (as using the Schema is exactly what it does).

On a side note, using the Schema is also slow as hell (we are currently working on avoid doing Schema diffs when not needed, leading to a 95% perf increase when running migrations), so I would not encourage it. A Platform-based migration would not suffer from this penalty (once the perf improvments are merged)

@mikeSimonson mikeSimonson removed the 1.1 label Jan 8, 2016
@lrlopez
lrlopez commented Jan 9, 2016

Ok, last try in the search of a compromise! What if we let the user "enhance" the current platform-based generator? Currently its output consists on blocks of SQL sentences. We could precede each block with a PHP comment that includes the equivalent DBAL\Schema operation using the @tyler-sommer code. The behaviour could be opt-in using a command line switch if you want. It would not have any runtime penalties when running migrations as the code is commented out.

This way, developers would know what action does every particular bundle of addSql operations and have the chance insert data migrations queries in between if needed.

In those cases where no data migration is needed, developers could just strip the SQL sentences and uncomment the operation to get platform independent code, at their own risk.

This is my last bullet. If you still don't like the idea I'll let it go, I promise.

Disclaimer: I'm working on an (soon to be) open-source project which should be able to run in any database platform supported by Doctrine. Up to now, migrations have been "hand-crafted" every time a new column, table or index has been added to the schema or renamed, which is error prone and time consuming (in fact I've already made a couple of mistakes in the past). I cannot afford generating platform specific code for all the platforms and it would be a shame to restrict the deployment to a few of them. That is the reason I'm pushing this PR so hard, I really don't want to annoy anyone 😁

@driesvints driesvints referenced this pull request in laravel-doctrine/migrations Feb 10, 2016
Closed

Schema builder commands from diff tool #21

@stof
Member
stof commented Feb 17, 2016

Well, if you need platform-independent migrations, you can work on a PR to generate migrations using ->addSql($platform->*). but the current PR generates unusable migrations, so it has no chance to be merged IMO. You need a different approach (and I can tell you that I won't be the one spending time on such a generator as I think it is also flawed, even though it is less flawed than this PR).

and note that even the approach using the platform won't work that well as soon as you have data to migrate. For instance, adding a new non-nullable column in Postgresql requires to modify the SQL generated by the Platform, to split it in 2 with a data migration query between them (adding a nullable column, then populating it for existing rows, and then making it non-nullable). I think at most 10% of the migrations of my work project (using Postgresql) are using SQL generated by Doctrine directly (and maybe even less than that). All others required some tweaks to the generated SQL.

And writing platform-independent queries for the data migration part will still be a pain for you, as Doctrine won't be able to help you much there.

your idea of putting comments about the Schema operation is even worse than the current PR IMO: any change done in the Schema is applied after any query added through addSql (the Schema is compared at the end of the method and then calls addSql too, so adding its queries at the end). This means that the developer would have to uncomment all or nothing, but cannot uncomment only a single operation in the middle (and then, they suffer from all flaws of this PR when they uncomment it)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment