Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Already on GitHub? Sign in to your account

Fix #DBAL-615 - ALTER with reserved keywords #379

Closed
wants to merge 1 commit into
from

Conversation

Projects
None yet
8 participants
Contributor

Thinkscape commented Sep 26, 2013

PR #302 introduced platform quoting for manipulation of PK, FK and indexes. Alter queries are left unquoted, which means that ALTER with table names that happen to be reserved platform keywords, will fail.
For example:

ALTER TABLE order DROP FOREIGN KEY FK_F5299398D5289B7F;
ALTER TABLE set DROP FOREIGN KEY FK_E61425DC1BF22D93;
ALTER TABLE storage ADD CONSTRAINT FK_547A1B3487068541 FOREIGN KEY (`addressId`) REFERENCES `address` (`id`);

http://www.doctrine-project.org/jira/browse/DBAL-615

After merging changes from this PR, the above together with MySQL platform becomes :

ALTER TABLE `order` DROP FOREIGN KEY FK_F5299398D5289B7F;
ALTER TABLE `set` DROP FOREIGN KEY FK_E61425DC1BF22D93;
ALTER TABLE `storage` ADD CONSTRAINT FK_547A1B3487068541 FOREIGN KEY (`addressId`) REFERENCES `address` (`id`);

Hello,

thank you for creating this pull request. I have automatically opened an issue
on our Jira Bug Tracker for you. See the issue link:

http://www.doctrine-project.org/jira/browse/DBAL-616

We use Jira to track the state of pull requests and the versions they got
included in.

Owner

beberlei commented Sep 26, 2013

We don't auto quote in Platforms. You need to put the quoted data in there yourself. Quoting introduces case-sensitivity issues, which is why we don't do this by default.

@beberlei beberlei closed this Sep 26, 2013

Contributor

Thinkscape commented Sep 27, 2013

@beberlei The problem is with quoting identifiers not data.

And for identifiers, your statement is not true - we do a lot of quoting and quote-trimming

ALTER table produced by schema tool currently ignores QuoteStrategy, while other statements (like adding/dropping indexes) do quote identifiers.

Unless you meant something else... please elaborate.

Contributor

Thinkscape commented Sep 27, 2013

btw: I've given a thought about case-sensitivity - that's why we have (bool)Schema\Table::$_quoted (defined in AbstractAsset) and that's why 2.3 introduced QuoteStrategyInterface to handle this (Platforms already provide identifier quoting facilities).

SchemaTool already uses QuoteStrategies extensively ... however ALTER queries end up having unquoted names regardless of the Strategy (because it's just string concat).

Contributor

Thinkscape commented Sep 28, 2013

I've got a good example to back it up.

Here's what schema-tool:update generated for me:

ALTER TABLE order ADD `datePickup` DATETIME DEFAULT NULL;
ALTER TABLE set DROP FOREIGN KEY FK_E61425DC2E74F1A4;
ALTER TABLE set ADD CONSTRAINT FK_E61425DCF8BF42CD FOREIGN KEY (`orderId`) 
    REFERENCES `order` (`id`) ON DELETE CASCADE;
ALTER TABLE set ADD CONSTRAINT FK_E61425DC2E74F1A4 FOREIGN KEY (`storageId`) 
    REFERENCES `storage` (`id`) ON DELETE SET NULL;

Notice the inconsistencies here - see how referenced columns and tables are quoted automatically (i.e. storage), while the main tables' names are not.

I can also report the exact same issue, Doctrine is ignoring my quotes around the order table when doing alter.

Owner

Ocramius commented Oct 17, 2013

@Thinkscape that's an inconsistency we had for a long time... quoting should be opt-in in DBAL as well

Contributor

Thinkscape commented Oct 17, 2013

@Ocramius hmm.. I get it...

... but why doesn't quoting strategies work as they're supposed to? The moment I've enabled a "quote-all" strategy, all hell broke loose! Schema tool was basically broken, couldn't even compare annotations with db schemas, the whole ORM went berserk and I had dozens of other bugs throughout the code. I've also noticed, that there's tons of quote-unquote-quote with hard-coded stuff like:

 $identifier = trim($identifier, "'`");

I've noticed that ORM keeps quoting and then un-quoting identifiers, depending on what it needs to do at any particular moment. Because of that, if you have a "quote all" strategy, schema comparisons stop working. Apparently it's because, quoting strategies are not used consistently (sometimes they are, sometimes it's a trim like above and sometimes it's manually sniffing for (bool)$field->quoted and such :\ )

Contributor

Thinkscape commented Oct 17, 2013

Simplest reproduce scenario:

  • Download EagerQuoteStrategy.php
  • Use it as quoting strategy
  • Create a schema with tables set (as in set of elements) and order (because we're simulating an e-commerce system)
  • Run schema tool - try to do update, or comparison or something
Member

stof commented Oct 17, 2013

@Thinkscape I would advice you to avoid using reserved keyword as table names (remember that table names don't need to be the same than the entity short class name).

Owner

Ocramius commented Oct 17, 2013

Yeah, reserved keywords = PITA.

As far as it concerns quoting strategies, if we can ALL the quoting opt-in in DBAL then the code in ORM should also be simplified by a lot, and we'd get rid of all these problems as well...

Contributor

Thinkscape commented Oct 17, 2013

@stof I know that. There are also other usage patterns, such as matching container/table names with your model names, which is what's being used in this example. Quoting is a mechanism for properly forming SQL queries regardless of particular platform namespace.

Other example could be something like ITEMS - it might not be a reserved keyword, but in MySQL 6 or newest Oracledb it might become a reserved keyword.

The point is - ORM and/or DBAL should produce platform-neutral queries and not be picky about anything.

As far as it concerns quoting strategies, if we can ALL the quoting opt-in in DBAL then the code in ORM should also be simplified by a lot, and we'd get rid of all these problems as well...

Exactly - as in the example above. Quoting is the best way to ensure that current and future SQL queries will work, regardless of platform and its version. The same applies to quoting data in php, html, json, xml and other formats and meta-formats. If you create any form of abstraction, better to quote it all in the most secure and safe way possible.

Member

stof commented Oct 17, 2013

@Thinkscape the issue is that quoting all identifiers is a way to reduce the platform interoperability of queries, because the behavior of quoted identifiers is a huge mess accross platforms: http://www.alberton.info/dbms_identifiers_and_case_sensitivity.html#.UI_knoYY3YI
This is why the default behavior of the ORM is to avoid the quoting

Contributor

Thinkscape commented Oct 17, 2013

Some examples of edge cases with keywords: The word user is reserved in PostgreSQL among others, authorization, preorder, diagnostics are reserved in TS-SQL, client, names, comparisons are reserved in db2 etc.

(The list changes with each version)

Contributor

Thinkscape commented Oct 17, 2013

This is why the default behavior of the ORM is to avoid the quoting

Hmm.. fair argument. However I could point out the "using mixed-case in table names is PITA" argument. If all your tables were lowercased, there's no problem.

Contributor

Thinkscape commented Oct 17, 2013

Except for case (in)sensitivity, is there any other argument for not-quoting identifiers ?

Owner

Ocramius commented Oct 17, 2013

@Thinkscape usually not, but we've been bitten back by it every.single.time we tried to quote stuff by default.

Member

stof commented Oct 17, 2013

@Thinkscape the issue is that we cannot expect all our users to use lowercased identifiers (if it were the case, we would not have faced so much issues when trying to quote identifiers by default).
The issue with the using mixed-case in table names is PITA argument, is that these people can give the using reserved keywords in table names is PITA argument in response, and we cannot make everyone happy for this.
As mixed cased table names are working in Doctrine currently, changing to the other camp needs to wait until Doctrine 3.0 for BC reasons

Contributor

Thinkscape commented Oct 17, 2013

we cannot make everyone happy for this.

I do agree, but current behavior with implicit quoting "some" identifiers, silent quoting in some queries, while throwing SQL errors in other gave me, and possibly others, a lot of headache... not mentioning hours spent in debugger going line after line in ORM and DBAL.

It's not true that "not quoting" is saving us work. The "not quoting" behavior of doctrine 2.* actually means tens of lines of code more, to make it work!

As mixed cased table names are working in Doctrine currently,

Meh... that's not entirely true.

I.e. for MySQL, it only works if you've got case-insensitive filesystem etc. (as per the article you've mentioned). It's basically relying on a feature-bug, or inconsistency in some DBMS that give silent success with miss-spelled identifier names...

In case you've developed your app on a Mac (with case-insensitive system by default) and then deployed on Linux server (with Ext* which is case-sensitive) you are going to hit a wall. Doctrine is not going to help with that nor should it.

changing to the other camp needs to wait until Doctrine 3.0 for BC reasons

yeah :-(

Both practices are PITA... I've noticed doctrine 2 team has done bigger changes in the past (i.e. I've just notice onUpdate removal among other stuff).

I believe that it's not ORM or DBAL's job to work around bad practices and use feature-bugs. ORM should blindly obey definitions, safely quote values and identifiers, leave "magic" to DBMS. If a user happens to have case-insensitive MySQL 5 running on M$ Windows and uses mixedCapsTableNames without any consistency in userland code --- that is his/her problem.

Marco also mentioned the fact, that removing a lot of this magic quoting and keyword dodging could slim-down the library and enhance performance, which is always good 👍

btw: what's current status on 3.0 ?

Contributor

Thinkscape commented Oct 17, 2013

btw: we can still make the behavior opt-in (configurable) by using something like quoting strategies.... so if someone really really wants to use case insensitivity (or the other way around) he/she should be able to use different strategy. That would mean cleaning up ORM and DBAL to use quoting strategies properly and consistently.

@Thinkscape I realize this is an old discussion, but I wanted to thank you for plugging for a more robust quote strategy for doctrine. In my view its a huge limitation of ORM, and almost reason enough to ditch it for something else. In an age where semantics are (or should be) king, and performance is the new commodity, having to structure your database tables to avoid a litany of reserved words reeks of old-world smell.

Have you succeeded in getting your EagerQuoteStrategy to work with the schema tools?

Contributor

Thinkscape commented Jun 12, 2014

@justin-schroeder Yes, it worked for most operations although it wasn't perfect. I've spend about 10h alone fixing schema-tool (and related classes) which were main offenders and were duplicating quoting mechanics. It goes a bit deeper and there's bits and pieces throughout the whole thing that are inconsistent with each other.

Basically what happened is: classes that were actually with the quoting bug (and someone decided that particular functionality deserves to be fixed) were monkey patched by manually injecting a hard-coded quote char or something like that. Then there were other classes that would then receive the quoted stuff, so they were "unquoting" - yes, they were actually trim()-ing previously concatenated identifier.

rjmunro commented Jul 8, 2014

  1. Always quoting in all major databases is absolutely consistent.
    • It is always case sensitive.
    • MySQL has a bug when run on case insensitive filesystems, where it uses table names as filenames on disc and therefore treats them case insensitively
  2. Not quoting behaves differently in different engines and causes issues with reserved words.
    • Some engines upper case the table names
    • Some lower case the table names

I can see how sometimes when you are quickly writing an sql query in a mysql terminal, you might leave out the quotes to save typing, particularly if you have a keyboard layout where the backticks are hard to reach, but if you are writing any kind of code to talk to a database, particularly a library, missing out the quotes is surely a terrible idea.

The only problem is going to be handling the transition. Maybe something that checks the case of tables in the database after an SQL error and retries.

Member

stof commented Jul 9, 2014

@rjmunro please read the blog post I linked in #379 (comment)

Contributor

Thinkscape commented Jul 9, 2014

@stof the article mentioned there actually proves my points above - not quoting only masks the problem when someone develops an application with mysql on a case-insensitive filesystem and then tries to install it on case-sensitive filesystem.

If you're trying to sell the idea that "not quoting is good because it's more portable", that is basically promoting ambiguity and a "bug-feature" of mysql. Each and every modern RDBS (including sqlite) has identifier quoting implemented and ORM should not be opinionated on any naming style, convention, language or character set (some RDBS allow unicode identifiers such as khoai tây).

ATM unfortunately doctrine 2.* is opinionated, forces this or misbehaves otherwise, which was the primary focus of this PR.

Owner

Ocramius commented Jul 9, 2014

Note that mysql is really not the biggest issue here. IIRC, pgsql reports identifiers with different (inconsistent) casing in different parts of its API.

That's why we didn't get around quoting by default in first place.

Contributor

Thinkscape commented Jul 9, 2014

@Ocramius assuming this is true, can't this be worked around by fetching identifiers from a single (consistent) part of the API ?

There's a lot counter-examples too - as a typical exercise I always recommend configuring D2 with DB2 and trying to use Entity\Name with table names which, surprise surprise, is reserved on this platform ...

Owner

Ocramius commented Jul 12, 2014

@Ocramius assuming this is true, can't this be worked around by fetching identifiers from a single (consistent) part of the API ?

there is none - describe table does what it wants :-)

As for corner-cases, we actually thought about simply creating a no-op quoting strategy that just throws exceptions in case of reserved keywords.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment