SQL Coverage script #1750

DeNeutoy · 2018-09-11T19:46:31Z

Adding this now and tweaking a little of the pre-processing to break up the PR stream for these parsers - obviously this won't be the final grammar, because I need to modify it depending on the table, but it will be good to keep around a grammar which parses the raw dataset, because I do not want to have to debug that thing twice.

matt-gardner · 2018-09-18T19:09:23Z

scripts/examine_sql_coverage.py

+        binaryop_no_andor = "+" / "-" / "*" / "/" / "=" / "<>" / "<=" / ">" / "<" / ">"
+        unaryop          = "+" / "-" / "not" / "NOT"
+
+        SELECT   = ws "SELECT"


What's the value in having this be a separate rule, instead of just putting "SELECT" directly in the rule above? The annoying thing with doing it this way is that it adds another action unnecessarily. If we can avoid this, that'd be good. That might be easier handled in post-processing, but I'm not sure.

We definitely can, but also, it seems like we should be collapsing non-terminals which have deterministic paths to terminals in the grammar? Do we not do that?

We don't have any code that does this, no.

matt-gardner · 2018-09-18T19:13:45Z

scripts/examine_sql_coverage.py

+# not all functions can take * as an argument.
+# Check whether LIKE can take non string arguments (example in scholar dataset)
+
+SQL_GRAMMAR2 = Grammar(


Just FYI, I know @kl2806 was switching from using strings to define the grammar to constructing a dictionary directly. It might be worth doing that instead, as it'd be easier to maintain. Not necessarily needed in this PR.

Yep - that's mainly to make substitution easier for the different tables. I will also do that when I add the context for all the datasets.

Mark Neumann and others added 10 commits September 11, 2018 12:09

add script to check dataset coverage for a grammar

90a2bd3

add sqlparse to nicely format queries

2df0658

tidy up

4a3896f

fix bug with greedy parsing of OR-DER

9d6b3b5

don't remove quoted strings, they are important. Add LIKE operator

ba3c5a2

fix some tests, add temporary workaround to coverage script

2fd6d79

revert wip changes, final working grammar with high overall coverage

591de21

fix last test, touch up grammar

86519e5

fix comment

b245bc5

Merge branch 'master' into coverage-script

8f49057

DeNeutoy requested a review from matt-gardner September 18, 2018 04:43

matt-gardner approved these changes Sep 18, 2018

View reviewed changes

Merge branch 'master' into coverage-script

d97ed77

DeNeutoy merged commit 6039ac0 into allenai:master Sep 18, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SQL Coverage script #1750

SQL Coverage script #1750

DeNeutoy commented Sep 11, 2018 •

edited

matt-gardner Sep 18, 2018

DeNeutoy Sep 18, 2018

matt-gardner Sep 18, 2018

matt-gardner Sep 18, 2018

DeNeutoy Sep 18, 2018

SQL Coverage script #1750

SQL Coverage script #1750

Conversation

DeNeutoy commented Sep 11, 2018 • edited

matt-gardner Sep 18, 2018

Choose a reason for hiding this comment

DeNeutoy Sep 18, 2018

Choose a reason for hiding this comment

matt-gardner Sep 18, 2018

Choose a reason for hiding this comment

matt-gardner Sep 18, 2018

Choose a reason for hiding this comment

DeNeutoy Sep 18, 2018

Choose a reason for hiding this comment

DeNeutoy commented Sep 11, 2018 •

edited