Skip to content

Add support for quoted string backslash escaping#1177

Merged
alamb merged 1 commit intoapache:mainfrom
validio-io:escaped-string-literals
Apr 21, 2024
Merged

Add support for quoted string backslash escaping#1177
alamb merged 1 commit intoapache:mainfrom
validio-io:escaped-string-literals

Conversation

@iffyio
Copy link
Copy Markdown
Contributor

@iffyio iffyio commented Mar 14, 2024

This adds support for parsing string literals on
dialects that treat backslash character as an escape
character. As an example, the following previously failed
to parse by dialects like BigQuery where the syntax is valid.

SELECT 'a\'b';

Moves the SQL like and similar_to tests from individual
dialects to common since the tests were identical.

@coveralls
Copy link
Copy Markdown

coveralls commented Mar 14, 2024

Pull Request Test Coverage Report for Build 8678508768

Details

  • 157 of 176 (89.2%) changed or added relevant lines in 9 files are covered.
  • 1 unchanged line in 1 file lost coverage.
  • Overall coverage decreased (-0.03%) to 88.061%

Changes Missing Coverage Covered Lines Changed/Added Lines %
src/dialect/bigquery.rs 1 2 50.0%
src/dialect/clickhouse.rs 1 2 50.0%
src/dialect/mod.rs 3 4 75.0%
src/dialect/mysql.rs 1 2 50.0%
src/dialect/snowflake.rs 1 2 50.0%
tests/sqlparser_snowflake.rs 9 11 81.82%
src/tokenizer.rs 51 55 92.73%
tests/sqlparser_common.rs 88 96 91.67%
Files with Coverage Reduction New Missed Lines %
src/dialect/mod.rs 1 81.89%
Totals Coverage Status
Change from base Build 8660968190: -0.03%
Covered Lines: 20948
Relevant Lines: 23788

💛 - Coveralls

@iffyio iffyio force-pushed the escaped-string-literals branch from 1b9ff2a to 7273ded Compare March 23, 2024 08:12
@iffyio iffyio changed the title Add support for quoted string escaping Add support for quoted string backslash escaping Mar 23, 2024
Copy link
Copy Markdown
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for this PR @iffyio -- I am sorry it took me so long to review it properly. I am a little concerned about the difference between how this PR works and how it works for MySqlDialect

Is there any way we can unify the behavior?

Comment thread src/dialect/mod.rs
/// ```sql
/// SELECT '\';
/// ```
fn supports_string_literal_backslash_escape(&self) -> bool {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❤️ and 😍 for the doc comments

Comment thread src/tokenizer.rs Outdated
// consume
chars.next();

if allow_escape {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This behavior seems different than how MySqlDialect behaves

Specifically, with MySQL the escape characters are transformed into their literal values (e.g. 'a"b'would be parsed toa"bwhile this PR would parse it toa"b`

What do you think about making this consistent?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yes I'll take a closer look at this to keep the same behavior for Mysql

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@alamb Looking into updating this and turns out I didn't follow the comment entirely and was unable to infer the inconsistency for mysql - could you clarify the problem once more? It seems Github/markdown unfortunately reformatted the expected and desired output in your example so that they became identical 😅

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As in I think the tokenizer for mysql (assuming self.unescape is true) actually does the unescaping -- so a string with an escape character ("\x20") would actually be tokenized as a space (" ")

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That makes sense! I've updated this have the new logic respect the unescape similar to mysql - let me know if that's what you had in mind!

This adds support for parsing string literals on
dialects that treat backslash character as an escape
character. As an example, the following previously failed
to parse by dialects like BigQuery where the syntax is valid.
```sql
SELECT 'a\'b';
```

Moves the SQL `like` and `similar_to` tests from individual
dialects to common since the tests were identical.
@iffyio iffyio force-pushed the escaped-string-literals branch from a9fdd33 to 0458e4b Compare April 14, 2024 05:29
Copy link
Copy Markdown
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great to me -- thank you @iffyio 🙏

@alamb alamb merged commit d2c2b15 into apache:main Apr 21, 2024
JichaoS pushed a commit to luabase/sqlparser-rs that referenced this pull request May 7, 2024
@iffyio iffyio deleted the escaped-string-literals branch July 16, 2024 11:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants