Fixed SQL placeholder parsing #113

Merged
merged 1 commit into from Jan 17, 2013

Conversation

Projects
None yet
6 participants
@avit
Contributor

avit commented Feb 14, 2012

I reworked DoctrineSQLParser to fix these problems:

Named placeholder patterns

The parser used a pattern that would match multiple colons, so junk like :fir:stNa::m:e was a "valid" placeholder name.

The parser neglected the underscore so :first_name was invalid and only captured :first.

This is cleaned up so placeholder names are consistent with PHP variable names: "starts with a letter or underscore, followed by any number of letters, numbers, or underscores".

Quote characters in strings

The parser had a simplistic way of matching quote pairs, ignoring that a quote character could be escaped by a backslash inside a string.

It now recognizes 'it\'s a trap?' as a whole literal string instead of breaking out after it\ and parsing the ? as a SQL placeholder.

Speed

The original algorithm looped a regular expression over each character position in the whole statement. The new algorithm handles matching in two steps:

  • One regular expression to select unquoted SQL.
  • One regular expression to match placeholder tokens in the result.

I measured the result to be over 100% faster given the simple strings from the test case (even with 2 new runs added to the test data). The improvement would be greater on longer statements:

Doctrine\DBAL\SQLParserUtils::getPlaceholderPositions
| version:     | invocation count: | total inclusive time: |
| original     | 37                | 5719                  |
| this patch   | 39                | 2380                  |
@stof

This comment has been minimized.

Show comment Hide comment
@stof

stof Feb 14, 2012

Member

@avit will it work if a \ is escaped just before a quote, e.g. "SELECT * FROM foo WHERE bar = 'something\\'" ?

Member

stof commented Feb 14, 2012

@avit will it work if a \ is escaped just before a quote, e.g. "SELECT * FROM foo WHERE bar = 'something\\'" ?

@avit

This comment has been minimized.

Show comment Hide comment
@avit

avit Feb 14, 2012

Contributor

@stof Yes it did work, but it didn't check if the backslashes were paired, so it could be fooled. This addition I just posted is more rigorous about that. (Also: refactored a bit.)

Contributor

avit commented Feb 14, 2012

@stof Yes it did work, but it didn't check if the backslashes were paired, so it could be fooled. This addition I just posted is more rigorous about that. (Also: refactored a bit.)

@stof

View changes

lib/Doctrine/DBAL/SQLParserUtils.php
+ * @param string $statement
+ * @return array
+ */
+ static public function getUnquotedStatementFragments($statement) {

This comment has been minimized.

Show comment Hide comment
@stof

stof Feb 14, 2012

Member

the curly brace should be on its own line

@stof

stof Feb 14, 2012

Member

the curly brace should be on its own line

This comment has been minimized.

Show comment Hide comment
@stof

stof Feb 14, 2012

Member

and does it make sense to use this method in another context ? If no, it should be private

@stof

stof Feb 14, 2012

Member

and does it make sense to use this method in another context ? If no, it should be private

This comment has been minimized.

Show comment Hide comment
@avit

avit Feb 14, 2012

Contributor

Is there a such thing as a static private function? If this is not useful for anything I can just fold it back into getPlaceholderPositions.

@avit

avit Feb 14, 2012

Contributor

Is there a such thing as a static private function? If this is not useful for anything I can just fold it back into getPlaceholderPositions.

This comment has been minimized.

Show comment Hide comment
@stof

stof Feb 14, 2012

Member

static has nothing to do with the visibility. You can use it for public, protected or private methods

@stof

stof Feb 14, 2012

Member

static has nothing to do with the visibility. You can use it for public, protected or private methods

@stof

View changes

tests/Doctrine/Tests/DBAL/SQLParserUtilsTest.php
@@ -28,12 +28,20 @@ static public function dataGetPlaceholderPositions()
array("SELECT '?' FROM foo", true, array()),
array('SELECT "?" FROM foo WHERE bar = ?', true, array(32)),
array("SELECT '?' FROM foo WHERE bar = ?", true, array(32)),
+ array(
+<<<SQLDATA
+SELECT * FROM foo WHERE bar = 'it\\'s a trap? \\\\' OR bar = ?

This comment has been minimized.

Show comment Hide comment
@stof

stof Feb 14, 2012

Member

is it valid ? It seems to me that their is an extra escape here. Should be it\'s, isn't it ?

@stof

stof Feb 14, 2012

Member

is it valid ? It seems to me that their is an extra escape here. Should be it\'s, isn't it ?

This comment has been minimized.

Show comment Hide comment
@avit

avit Feb 14, 2012

Contributor

Yes it's valid. Backslashes are all doubled in PHP strings, in any kind of quotes. The resulting string shows 1 backslash in "it's" and 2 backslashes after "trap?"

@avit

avit Feb 14, 2012

Contributor

Yes it's valid. Backslashes are all doubled in PHP strings, in any kind of quotes. The resulting string shows 1 backslash in "it's" and 2 backslashes after "trap?"

This comment has been minimized.

Show comment Hide comment
@stof

stof Feb 14, 2012

Member

@avit switch it to Nowdoc instead of Heredoc so that you don't need to escape the backslashes. It will make things more readable

@stof

stof Feb 14, 2012

Member

@avit switch it to Nowdoc instead of Heredoc so that you don't need to escape the backslashes. It will make things more readable

This comment has been minimized.

Show comment Hide comment
@avit

avit Feb 14, 2012

Contributor

Cheers. Today I learned...

@avit

avit Feb 14, 2012

Contributor

Cheers. Today I learned...

+ // Quote characters within string literals can be preceded by a backslash.
+ const ESCAPED_SINGLE_QUOTED_TEXT = "'(?:[^'\\\\]|\\\\'|\\\\\\\\)*'";
+ const ESCAPED_DOUBLE_QUOTED_TEXT = '"(?:[^"\\\\]|\\\\"|\\\\\\\\)*"';
+

This comment has been minimized.

Show comment Hide comment
@avit

avit Feb 14, 2012

Contributor

These backslashes are also doubled due to PHP quoting. I could make them "nowdoc" as well, if it reads better. (The indentation is ugly though.)

@avit

avit Feb 14, 2012

Contributor

These backslashes are also doubled due to PHP quoting. I could make them "nowdoc" as well, if it reads better. (The indentation is ugly though.)

@beberlei

This comment has been minimized.

Show comment Hide comment
@beberlei

beberlei Mar 24, 2012

Member

Can you rebase this to master? Then i will gladly merge it.

Member

beberlei commented Mar 24, 2012

Can you rebase this to master? Then i will gladly merge it.

@avit avit referenced this pull request Dec 9, 2012

Closed

Fixed SQL placeholder parsing #216

@avit

This comment has been minimized.

Show comment Hide comment
@avit

avit Jan 6, 2013

Contributor

@beberlei I did rebase this for you at the time last year... If you still want to use this PR, let me know if you need it rebased once more. (I'm not using Symfony anymore, so I haven't been keeping up to date with recent updates.)

Contributor

avit commented Jan 6, 2013

@beberlei I did rebase this for you at the time last year... If you still want to use this PR, let me know if you need it rebased once more. (I'm not using Symfony anymore, so I haven't been keeping up to date with recent updates.)

@mvrhov

This comment has been minimized.

Show comment Hide comment
@mvrhov

mvrhov Jan 7, 2013

IMO this would be a nice addition to the 2.4 ....

mvrhov commented Jan 7, 2013

IMO this would be a nice addition to the 2.4 ....

@beberlei

This comment has been minimized.

Show comment Hide comment
@beberlei

beberlei Jan 7, 2013

Member

@avit i want this PR, i tried rebasing but its not so easy anymore :-( If you could try as well it would be great.

Member

beberlei commented Jan 7, 2013

@avit i want this PR, i tried rebasing but its not so easy anymore :-( If you could try as well it would be great.

@avit

This comment has been minimized.

Show comment Hide comment
@avit

avit Jan 7, 2013

Contributor

@beberlei Please try the freshly rebased code: tests are passing on my end. The only significant change was updating from:

array(':placeholder' => array(22, 44))
# to
array(22 => 'placeholder', 44 => 'placeholder')

I hope it's right.

Contributor

avit commented Jan 7, 2013

@beberlei Please try the freshly rebased code: tests are passing on my end. The only significant change was updating from:

array(':placeholder' => array(22, 44))
# to
array(22 => 'placeholder', 44 => 'placeholder')

I hope it's right.

@avit

This comment has been minimized.

Show comment Hide comment
@avit

avit Jan 8, 2013

Contributor

Another conflict was with the recent 1105261 (Ticket DBAL-389), but it looks like my original regex covered that too. Tests are green, but @roverwolf can verify to be sure?

Contributor

avit commented Jan 8, 2013

Another conflict was with the recent 1105261 (Ticket DBAL-389), but it looks like my original regex covered that too. Tests are green, but @roverwolf can verify to be sure?

@avit

This comment has been minimized.

Show comment Hide comment
@avit

avit Jan 17, 2013

Contributor

@beberlei will you be able to review and merge this soon? I can't keep updating this... I'm forgetting more and more PHP every day! 😉

Contributor

avit commented Jan 17, 2013

@beberlei will you be able to review and merge this soon? I can't keep updating this... I'm forgetting more and more PHP every day! 😉

@beberlei

This comment has been minimized.

Show comment Hide comment
@beberlei

beberlei Jan 17, 2013

Member

@avit yes, will review today then merge. Sorry to keep you waiting os long

Member

beberlei commented Jan 17, 2013

@avit yes, will review today then merge. Sorry to keep you waiting os long

@roverwolf

This comment has been minimized.

Show comment Hide comment
@roverwolf

roverwolf Jan 17, 2013

Contributor

Verifying that this does seem to fix the issue from DBAL-389 as well. Thanks (this seems much cleaner as well).

Contributor

roverwolf commented Jan 17, 2013

Verifying that this does seem to fix the issue from DBAL-389 as well. Thanks (this seems much cleaner as well).

@guilhermeblanco

This comment has been minimized.

Show comment Hide comment
@guilhermeblanco

guilhermeblanco Jan 17, 2013

Member

Patch is good. We can still improve it a bit, removing the else condition, but it can be done later.

Member

guilhermeblanco commented Jan 17, 2013

Patch is good. We can still improve it a bit, removing the else condition, but it can be done later.

guilhermeblanco added a commit that referenced this pull request Jan 17, 2013

Merge pull request #113 from avit/master
Fixed SQL placeholder parsing

@guilhermeblanco guilhermeblanco merged commit f8604e1 into doctrine:master Jan 17, 2013

1 check passed

default The Travis build passed
Details
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment