Conversation
binaryop_no_andor = "+" / "-" / "*" / "/" / "=" / "<>" / "<=" / ">" / "<" / ">" | ||
unaryop = "+" / "-" / "not" / "NOT" | ||
|
||
SELECT = ws "SELECT" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the value in having this be a separate rule, instead of just putting "SELECT" directly in the rule above? The annoying thing with doing it this way is that it adds another action unnecessarily. If we can avoid this, that'd be good. That might be easier handled in post-processing, but I'm not sure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We definitely can, but also, it seems like we should be collapsing non-terminals which have deterministic paths to terminals in the grammar? Do we not do that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't have any code that does this, no.
# not all functions can take * as an argument. | ||
# Check whether LIKE can take non string arguments (example in scholar dataset) | ||
|
||
SQL_GRAMMAR2 = Grammar( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just FYI, I know @kl2806 was switching from using strings to define the grammar to constructing a dictionary directly. It might be worth doing that instead, as it'd be easier to maintain. Not necessarily needed in this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep - that's mainly to make substitution easier for the different tables. I will also do that when I add the context for all the datasets.
Adding this now and tweaking a little of the pre-processing to break up the PR stream for these parsers - obviously this won't be the final grammar, because I need to modify it depending on the table, but it will be good to keep around a grammar which parses the raw dataset, because I do not want to have to debug that thing twice.