Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The parser sucks #4

Closed
cswinter opened this issue Jul 9, 2018 · 7 comments
Closed

The parser sucks #4

cswinter opened this issue Jul 9, 2018 · 7 comments
Labels
good first issue Good for newcomers

Comments

@cswinter
Copy link
Owner

cswinter commented Jul 9, 2018

The parser solves a fairly well-defined problem and has a simple and clean interface to the rest of LocustDB so this is quite easy to work on without knowing much about the rest of the code.

I hacked together the parser within a few hours during the original hack week, and it has mostly remained unchanged since. It has serious issues:

  • no usable error messages
  • no operator precedence
  • not built in a principled way and not super easy to extend with new operators or functions
  • various quirks where queries that seem like they should parse don't
  • it uses an outdated version of nom, and I think there might be good reasons not to use no nom at all because parser performance is irrelevant

Even if we ultimately decide to build our own parser, it would probably not be a bad idea in the short term to just leverage some other existing SQL parser. All we would need to do is add an additional pass that rejects parts of SQL not supported by LocustDB (most of them) and maps the rest into the AST format expected by the query engine.

@virattara
Copy link
Contributor

Interested!

@cswinter
Copy link
Owner Author

Awesome! This would be super high impact and bring LocustDB much closer to actually being usable.

Right now parsing is done by the parse_query function (generated with a nom macro) which is actually used in just a single place. There's a few unit tests, and all of the integration tests also exercise the parser. For development, you might find the :ast <query> command useful which is available in the repl and just runs the parser and then prints out the resulting ASTs.

If there's anything I can help you with let me know.

@ivanceras
Copy link

The sqlparser used in datafusion has been modularized out of datafusion - https://github.com/andygrove/sqlparser-rs .
The parser code is small and clean.

@zoidyzoidzoid
Copy link
Contributor

That looks awesome, @ivanceras. I tried to look at Diesel's parser a few weeks ago, but struggled to get it working.

@cswinter
Copy link
Owner Author

Wow great find, this project seems really well aligned with what we want in LocustDB! I finally had some time to play around with the parser, all the functionality we need is there and it also looks like the AST it outputs is already very similar to LocustDB's AST.

Anyone wanna give it a shot?

@virattara
Copy link
Contributor

Working on it.

@cswinter
Copy link
Owner Author

The parser is awesome now. Thank you, @virattara!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue Good for newcomers
Projects
None yet
Development

No branches or pull requests

4 participants