New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The parser sucks #4

Closed
cswinter opened this Issue Jul 9, 2018 · 7 comments

Comments

Projects
None yet
4 participants
@cswinter
Copy link
Owner

cswinter commented Jul 9, 2018

The parser solves a fairly well-defined problem and has a simple and clean interface to the rest of LocustDB so this is quite easy to work on without knowing much about the rest of the code.

I hacked together the parser within a few hours during the original hack week, and it has mostly remained unchanged since. It has serious issues:

  • no usable error messages
  • no operator precedence
  • not built in a principled way and not super easy to extend with new operators or functions
  • various quirks where queries that seem like they should parse don't
  • it uses an outdated version of nom, and I think there might be good reasons not to use no nom at all because parser performance is irrelevant

Even if we ultimately decide to build our own parser, it would probably not be a bad idea in the short term to just leverage some other existing SQL parser. All we would need to do is add an additional pass that rejects parts of SQL not supported by LocustDB (most of them) and maps the rest into the AST format expected by the query engine.

@virattara

This comment has been minimized.

Copy link
Contributor

virattara commented Jul 18, 2018

Interested!

@cswinter

This comment has been minimized.

Copy link
Owner

cswinter commented Jul 19, 2018

Awesome! This would be super high impact and bring LocustDB much closer to actually being usable.

Right now parsing is done by the parse_query function (generated with a nom macro) which is actually used in just a single place. There's a few unit tests, and all of the integration tests also exercise the parser. For development, you might find the :ast <query> command useful which is available in the repl and just runs the parser and then prints out the resulting ASTs.

If there's anything I can help you with let me know.

@ivanceras

This comment has been minimized.

Copy link

ivanceras commented Sep 4, 2018

The sqlparser used in datafusion has been modularized out of datafusion - https://github.com/andygrove/sqlparser-rs .
The parser code is small and clean.

@zoidbergwill

This comment has been minimized.

Copy link
Contributor

zoidbergwill commented Sep 4, 2018

That looks awesome, @ivanceras. I tried to look at Diesel's parser a few weeks ago, but struggled to get it working.

@cswinter

This comment has been minimized.

Copy link
Owner

cswinter commented Sep 16, 2018

Wow great find, this project seems really well aligned with what we want in LocustDB! I finally had some time to play around with the parser, all the functionality we need is there and it also looks like the AST it outputs is already very similar to LocustDB's AST.

Anyone wanna give it a shot?

@virattara

This comment has been minimized.

Copy link
Contributor

virattara commented Sep 17, 2018

Working on it.

@cswinter

This comment has been minimized.

Copy link
Owner

cswinter commented Oct 22, 2018

The parser is awesome now. Thank you, @virattara!

@cswinter cswinter closed this Oct 22, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment