Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use CrateDB's ANTLR grammar in SQL parser #433

Open
amotl opened this issue Mar 18, 2024 · 5 comments
Open

Use CrateDB's ANTLR grammar in SQL parser #433

amotl opened this issue Mar 18, 2024 · 5 comments

Comments

@amotl
Copy link
Member

amotl commented Mar 18, 2024

I think we should look into replacing sqlparse with a parser generated from the CrateDB antlr grammar, to ensure it supports our SQL dialect.

See https://github.com/antlr/antlr4/blob/4.6/doc/python-target.md

Originally posted by @mfussenegger in #432 (review)

@surister
Copy link

I think we should move this away from crash and create a separate repo to have the development of antlr4 -> targets.

I see two uses cases for the antlr4 targets:

  1. Python: For crash and cloud team
  2. Javascript: Local admin UI without any backend.

I have already something half-working:

input = InputStream('SELECT * FROM sys.shards; SELECT 2;')
lexer = SqlBaseLexer(input)
stream = CommonTokenStream(lexer)
parser = SqlBaseParser(stream)
parse_tree = parser.statements()
print_stmt(parse_tree.statement(0), stream)
print_stmt(parse_tree.statement(1), stream)
SELECT * FROM sys.shards
SELECT 2

@mfussenegger
Copy link
Member

I see two uses cases for the antlr4 targets:
Python: For crash and cloud team
Javascript: Local admin UI without any backend.

I don't see a benefit in mixing several targets into a single repo. You just end up with a ton of different build systems. A local admin UI without any backend also seems a bit like a weird use-case. The whole point of the admin UI is to inspect a cluster and run queries.

As long as there is only a single consumer it can be inlined, otherwise you're just adding resource maintenance overhead.
What would cloud use it for?
Could start integrating it here, make sure it's in a dedicated module - then extracting it later on should be easy enough.

@amotl
Copy link
Member Author

amotl commented Mar 18, 2024

What would "cloud" use it for?

Not being involved in all the relevant details, nor their exact implementations, nor too much on the grammar plumbing side as well, I guess @surister might be referring to the SQL Query Console where such a component / outcome may also be used or relevant in one way or another.

Could start integrating it here, make sure it's in a dedicated module - then extracting it later on should be easy enough.

Starting is important, right, I absolutely agree.

Maybe I can't still grok this fragment of the discussion, so please support me: Is it just about parallelizing this endeavor to serve the needs of both Python and JavaScript, and finding a right place for this work?

@surister
Copy link

surister commented Mar 18, 2024

I see two uses cases for the antlr4 targets:
Python: For crash and cloud team
Javascript: Local admin UI without any backend.

I don't see a benefit in mixing several targets into a single repo. You just end up with a ton of different build systems.

I tried it and the way targets are created are extremely similar, different targets might potentially share scripts, hence the proposition.

A local admin UI without any backend also seems a bit like a weird use-case. The whole point of the admin UI is to inspect a cluster and run queries.

Sorry, when I meant without any backend I meant without Grand Central, our current admin ui embedded with Crate, which of course, CrateDB is the backend in this case :P, the use case I see would be to actually provide correct syntax highlighting and error highlighting which is currently is very flaky and hand-made. Furthermore there is an antlr4 target for ACE which is the editor that we use in cloud.

What would cloud use it for?

Besides the aforementioned use case, cloud already uses sqlparse when proxing queries, enabling running several queries.

@surister
Copy link

cc @proddata

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants