Skip to content

Add SQLite dialect#248

Merged
nickolay merged 3 commits intoapache:mainfrom
miuy56dc:sqlite_dialog
Jul 31, 2020
Merged

Add SQLite dialect#248
nickolay merged 3 commits intoapache:mainfrom
miuy56dc:sqlite_dialog

Conversation

@miuy56dc
Copy link
Copy Markdown
Contributor

@coveralls
Copy link
Copy Markdown

coveralls commented Jul 29, 2020

Pull Request Test Coverage Report for Build 189367545

  • 30 of 30 (100.0%) changed or added relevant lines in 2 files are covered.
  • 53 unchanged lines in 1 file lost coverage.
  • Overall coverage increased (+0.1%) to 91.849%

Files with Coverage Reduction New Missed Lines %
src/parser.rs 53 88.93%
Totals Coverage Status
Change from base Build 186377189: 0.1%
Covered Lines: 4440
Relevant Lines: 4834

💛 - Coveralls

@nickolay nickolay changed the title support SQLite dialog Add SQLite dialect Jul 29, 2020
@nickolay
Copy link
Copy Markdown
Contributor

https://www.sqlite.org/lang_keywords.html has this:

For resilience when confronted with historical SQL statements, SQLite will sometimes bend the quoting rules above:

  • If a keyword in single quotes (ex: 'key' or 'glob') is used in a context where an identifier is allowed but where a string literal is not allowed, then the token is understood to be an identifier instead of a string literal.
  • If a keyword in double quotes (ex: "key" or "glob") is used in a context where it cannot be resolved to an identifier but where a string literal is allowed, then the token is understood to be a string literal instead of an identifier.

...but this PR appears to unconditionally parse single-quoted strings 'foo' and double-quoted identifiers "bar" as identifiers? This is probably fine as far as the parser is concerned, but I think it's worth mentioning explicitly in the dialect documentation comments.

Comment thread src/dialect/sqlite.rs
Comment on lines +22 to +25
// see https://www.sqlite.org/lang_keywords.html
fn is_delimited_identifier_start(&self, ch: char) -> bool {
ch == '`' || ch == '\'' || ch == '"' || ch == '['
}
Copy link
Copy Markdown
Contributor

@nickolay nickolay Jul 30, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The issue I have pointed to is that this comment doesn't help understanding why is_delimited_identifier_start is implemented the way it is. The copy-pasta you've added above doesn't help with that.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From the document, we know that SQLite supports single quotes, double quotes, square brackets and grave accents. It is applicable not only to keywords but also to other identifiers. SQLite will sometimes bend the quoting rules for historical SQL statements, but after that it says

Programmers are cautioned not to use the two exceptions described in the previous bullets. We emphasize that they exist only so that old and ill-formed SQL statements will run correctly. Future versions of SQLite might raise errors instead of accepting the malformed statements covered by the exceptions above.

SQLite adds new keywords from time to time when it takes on new features. So to prevent your code from being broken by future enhancements, you should normally quote any identifier that is an English language word, even if you do not have to.

Copy link
Copy Markdown
Contributor

@nickolay nickolay Jul 30, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The document says that SQLite uses "", [], and `` for identifiers, but sometimes, depending on the context, treats 'string literals' as identifiers too, and "double-quoted identifiers" as string literals. You make all four of "", [], ``, '' unconditionally be parsed as identifiers, which might be fine, but can't be documented by simply linking to the SQLite's page. Someone will come in later and ask why the string literals are not parsed as such, and I won't be able to answer that.

Copy link
Copy Markdown
Contributor Author

@miuy56dc miuy56dc Jul 30, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for pointing out that, I think now I will just use `` and [] for identifiers. In most cases, single quotes and double quotes should be passed as strings, it's a little complex to implement parse that depending on the context, maybe I can implement it in the next PR. So I will comment that

Use `` and [] for identifiers
TODO: Depending on the context, treats 'string literals' as identifiers too, and "double-quoted identifiers" as string literals.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest starting with handling "..." as an identifier, as the tokenizer won't handle it otherwise.

@nickolay nickolay merged commit f8feff4 into apache:main Jul 31, 2020
@nickolay
Copy link
Copy Markdown
Contributor

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants