Skip to content
This repository

HTTPS clone URL

Subversion checkout URL

You can clone with HTTPS or Subversion.

Download ZIP

Fixed SQL placeholder parsing #113

Merged
merged 1 commit into from about 1 year ago

6 participants

Andrew Vit Christophe Coevoet Benjamin Eberlei Miha Vrhovnik David Ward Guilherme Blanco
Andrew Vit

I reworked DoctrineSQLParser to fix these problems:

Named placeholder patterns

The parser used a pattern that would match multiple colons, so junk like :fir:stNa::m:e was a "valid" placeholder name.

The parser neglected the underscore so :first_name was invalid and only captured :first.

This is cleaned up so placeholder names are consistent with PHP variable names: "starts with a letter or underscore, followed by any number of letters, numbers, or underscores".

Quote characters in strings

The parser had a simplistic way of matching quote pairs, ignoring that a quote character could be escaped by a backslash inside a string.

It now recognizes 'it\'s a trap?' as a whole literal string instead of breaking out after it\ and parsing the ? as a SQL placeholder.

Speed

The original algorithm looped a regular expression over each character position in the whole statement. The new algorithm handles matching in two steps:

  • One regular expression to select unquoted SQL.
  • One regular expression to match placeholder tokens in the result.

I measured the result to be over 100% faster given the simple strings from the test case (even with 2 new runs added to the test data). The improvement would be greater on longer statements:

Doctrine\DBAL\SQLParserUtils::getPlaceholderPositions
| version:     | invocation count: | total inclusive time: |
| original     | 37                | 5719                  |
| this patch   | 39                | 2380                  |
Christophe Coevoet

@avit will it work if a \ is escaped just before a quote, e.g. "SELECT * FROM foo WHERE bar = 'something\\'" ?

Andrew Vit

@stof Yes it did work, but it didn't check if the backslashes were paired, so it could be fooled. This addition I just posted is more rigorous about that. (Also: refactored a bit.)

lib/Doctrine/DBAL/SQLParserUtils.php
... ...
@@ -176,4 +173,22 @@ static public function expandListParameters($query, $params, $types)
176 173
 
177 174
         return array($query, $params, $types);
178 175
     }
  176
+
  177
+    /**
  178
+     * Slice the SQL statement around pairs of quotes and
  179
+     * return string fragments of SQL outside of quoted literals.
  180
+     * Each fragment is captured as a 2-element array:
  181
+     *
  182
+     * 0 => matched fragment string,
  183
+     * 1 => offset of fragment in $statement
  184
+     *
  185
+     * @param string $statement
  186
+     * @return array
  187
+     */
  188
+    static public function getUnquotedStatementFragments($statement) {
4
Christophe Coevoet
stof added a note

the curly brace should be on its own line

Christophe Coevoet
stof added a note

and does it make sense to use this method in another context ? If no, it should be private

Andrew Vit
avit added a note

Is there a such thing as a static private function? If this is not useful for anything I can just fold it back into getPlaceholderPositions.

Christophe Coevoet
stof added a note

static has nothing to do with the visibility. You can use it for public, protected or private methods

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
tests/Doctrine/Tests/DBAL/SQLParserUtilsTest.php
... ...
@@ -28,12 +28,20 @@ static public function dataGetPlaceholderPositions()
28 28
             array("SELECT '?' FROM foo", true, array()),
29 29
             array('SELECT "?" FROM foo WHERE bar = ?', true, array(32)),
30 30
             array("SELECT '?' FROM foo WHERE bar = ?", true, array(32)),
  31
+            array(
  32
+<<<SQLDATA
  33
+SELECT * FROM foo WHERE bar = 'it\\'s a trap? \\\\' OR bar = ?
4
Christophe Coevoet
stof added a note

is it valid ? It seems to me that their is an extra escape here. Should be it\'s, isn't it ?

Andrew Vit
avit added a note

Yes it's valid. Backslashes are all doubled in PHP strings, in any kind of quotes. The resulting string shows 1 backslash in "it's" and 2 backslashes after "trap?"

Christophe Coevoet
stof added a note

@avit switch it to Nowdoc instead of Heredoc so that you don't need to escape the backslashes. It will make things more readable

Andrew Vit
avit added a note

Cheers. Today I learned...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Andrew Vit avit commented on the diff
lib/Doctrine/DBAL/SQLParserUtils.php
... ...
@@ -32,6 +32,13 @@
32 32
  */
33 33
 class SQLParserUtils
34 34
 {
  35
+    const POSITIONAL_TOKEN = '\?';
  36
+    const NAMED_TOKEN      = ':[a-zA-Z_][a-zA-Z0-9_]*';
  37
+
  38
+    // Quote characters within string literals can be preceded by a backslash.
  39
+    const ESCAPED_SINGLE_QUOTED_TEXT = "'(?:[^'\\\\]|\\\\'|\\\\\\\\)*'";
  40
+    const ESCAPED_DOUBLE_QUOTED_TEXT = '"(?:[^"\\\\]|\\\\"|\\\\\\\\)*"';
  41
+
1
Andrew Vit
avit added a note

These backslashes are also doubled due to PHP quoting. I could make them "nowdoc" as well, if it reads better. (The indentation is ugly though.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Benjamin Eberlei
Owner

Can you rebase this to master? Then i will gladly merge it.

Andrew Vit

@beberlei I did rebase this for you at the time last year... If you still want to use this PR, let me know if you need it rebased once more. (I'm not using Symfony anymore, so I haven't been keeping up to date with recent updates.)

Miha Vrhovnik

IMO this would be a nice addition to the 2.4 ....

Benjamin Eberlei
Owner

@avit i want this PR, i tried rebasing but its not so easy anymore :-( If you could try as well it would be great.

Andrew Vit

@beberlei Please try the freshly rebased code: tests are passing on my end. The only significant change was updating from:

array(':placeholder' => array(22, 44))
# to
array(22 => 'placeholder', 44 => 'placeholder')

I hope it's right.

Andrew Vit

Another conflict was with the recent 1105261 (Ticket DBAL-389), but it looks like my original regex covered that too. Tests are green, but @roverwolf can verify to be sure?

Andrew Vit

@beberlei will you be able to review and merge this soon? I can't keep updating this... I'm forgetting more and more PHP every day! :wink:

Benjamin Eberlei
Owner

@avit yes, will review today then merge. Sorry to keep you waiting os long

David Ward

Verifying that this does seem to fix the issue from DBAL-389 as well. Thanks (this seems much cleaner as well).

Guilherme Blanco

Patch is good. We can still improve it a bit, removing the else condition, but it can be done later.

Guilherme Blanco guilhermeblanco merged commit f8604e1 into from
Guilherme Blanco guilhermeblanco closed this
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Showing 1 unique commit by 1 author.

Jan 07, 2013
Andrew Vit Fix SQL placeholder parsing for consistent names and escaped literals 153b752
This page is out of date. Refresh to see the latest.
53  lib/Doctrine/DBAL/SQLParserUtils.php
@@ -32,6 +32,13 @@
32 32
  */
33 33
 class SQLParserUtils
34 34
 {
  35
+    const POSITIONAL_TOKEN = '\?';
  36
+    const NAMED_TOKEN      = ':[a-zA-Z_][a-zA-Z0-9_]*';
  37
+
  38
+    // Quote characters within string literals can be preceded by a backslash.
  39
+    const ESCAPED_SINGLE_QUOTED_TEXT = "'(?:[^'\\\\]|\\\\'|\\\\\\\\)*'";
  40
+    const ESCAPED_DOUBLE_QUOTED_TEXT = '"(?:[^"\\\\]|\\\\"|\\\\\\\\)*"';
  41
+
35 42
     /**
36 43
      * Get an array of the placeholders in an sql statements as keys and their positions in the query string.
37 44
      *
@@ -49,27 +56,18 @@ static public function getPlaceholderPositions($statement, $isPositional = true)
49 56
             return array();
50 57
         }
51 58
 
52  
-        $count = 0;
53  
-        $inLiteral = false; // a valid query never starts with quotes
54  
-        $stmtLen = strlen($statement);
  59
+        $token = ($isPositional) ? self::POSITIONAL_TOKEN : self::NAMED_TOKEN;
55 60
         $paramMap = array();
56  
-        for ($i = 0; $i < $stmtLen; $i++) {
57  
-            if ($statement[$i] == $match && !$inLiteral && ($isPositional || $statement[$i+1] != '=')) {
58  
-                // real positional parameter detected
  61
+
  62
+        foreach (self::getUnquotedStatementFragments($statement) as $fragment) {
  63
+            preg_match_all("/$token/", $fragment[0], $matches, PREG_OFFSET_CAPTURE);
  64
+            foreach ($matches[0] as $placeholder) {
59 65
                 if ($isPositional) {
60  
-                    $paramMap[$count] = $i;
  66
+                    $paramMap[] = $placeholder[1] + $fragment[1];
61 67
                 } else {
62  
-                    $name = "";
63  
-                    // TODO: Something faster/better to match this than regex?
64  
-                    for ($j = $i + 1; ($j < $stmtLen && preg_match('(([a-zA-Z0-9_]{1}))', $statement[$j])); $j++) {
65  
-                        $name .= $statement[$j];
66  
-                    }
67  
-                    $paramMap[$i] = $name; // named parameters can be duplicated!
68  
-                    $i = $j;
  68
+                    $pos = $placeholder[1] + $fragment[1];
  69
+                    $paramMap[$pos] = substr($placeholder[0], 1, strlen($placeholder[0]));
69 70
                 }
70  
-                ++$count;
71  
-            } else if ($statement[$i] == "'" || $statement[$i] == '"') {
72  
-                $inLiteral = ! $inLiteral; // switch state!
73 71
             }
74 72
         }
75 73
 
@@ -180,4 +178,23 @@ static public function expandListParameters($query, $params, $types)
180 178
 
181 179
         return array($query, $paramsOrd, $typesOrd);
182 180
     }
183  
-}
  181
+
  182
+    /**
  183
+     * Slice the SQL statement around pairs of quotes and
  184
+     * return string fragments of SQL outside of quoted literals.
  185
+     * Each fragment is captured as a 2-element array:
  186
+     *
  187
+     * 0 => matched fragment string,
  188
+     * 1 => offset of fragment in $statement
  189
+     *
  190
+     * @param string $statement
  191
+     * @return array
  192
+     */
  193
+    static private function getUnquotedStatementFragments($statement)
  194
+    {
  195
+        $literal = self::ESCAPED_SINGLE_QUOTED_TEXT . '|' . self::ESCAPED_DOUBLE_QUOTED_TEXT;
  196
+        preg_match_all("/([^'\"]+)(?:$literal)?/s", $statement, $fragments, PREG_OFFSET_CAPTURE);
  197
+
  198
+        return $fragments[1];
  199
+    }
  200
+}
8  tests/Doctrine/Tests/DBAL/SQLParserUtilsTest.php
@@ -28,6 +28,13 @@ static public function dataGetPlaceholderPositions()
28 28
             array("SELECT '?' FROM foo", true, array()),
29 29
             array('SELECT "?" FROM foo WHERE bar = ?', true, array(32)),
30 30
             array("SELECT '?' FROM foo WHERE bar = ?", true, array(32)),
  31
+            array(
  32
+<<<'SQLDATA'
  33
+SELECT * FROM foo WHERE bar = 'it\'s a trap? \\' OR bar = ?
  34
+AND baz = "\"quote\" me on it? \\" OR baz = ?
  35
+SQLDATA
  36
+                , true, array(58, 104)
  37
+            ),
31 38
 
32 39
             // named
33 40
             array('SELECT :foo FROM :bar', false, array(7 => 'foo', 17 => 'bar')),
@@ -37,6 +44,7 @@ static public function dataGetPlaceholderPositions()
37 44
             array('SELECT :foo_id', false, array(7 => 'foo_id')), // Ticket DBAL-231
38 45
             array('SELECT @rank := 1', false, array()), // Ticket DBAL-398
39 46
             array('SELECT @rank := 1 AS rank, :foo AS foo FROM :bar', false, array(27 => 'foo', 44 => 'bar')), // Ticket DBAL-398
  47
+            array('SELECT * FROM Foo WHERE bar > :start_date AND baz > :start_date', false, array(30 => 'start_date', 52 =>  'start_date')) // Ticket GH-113
40 48
         );
41 49
     }
42 50
 
Commit_comment_tip

Tip: You can add notes to lines in a file. Hover to the left of a line to make a note

Something went wrong with that request. Please try again.