[SPARK-14398] [SQL] Audit non-reserved keyword list in ANTLR4 parser by bomeng · Pull Request #12191 · apache/spark

bomeng · 2016-04-06T00:40:58Z

What changes were proposed in this pull request?

I have compared non-reserved list in Antlr3 and Antlr4 one by one as well as all the existing keywords defined in Antlr4, added the missing keywords to the non-reserved keywords list. If we need to support more syntax, we can add more keywords by then.

Any recommendation for the above is welcome.

How was this patch tested?

I manually checked the keywords one by one. Please let me know if there is a better way to test.

Another thought: I suggest to put all the keywords definition and non-reserved list in order, that will be much easier to check in the future.

SparkQA · 2016-04-06T01:50:11Z

Test build #55066 has finished for PR 12191 at commit 5c130ba.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2016-04-06T03:56:55Z

Test build #55080 has finished for PR 12191 at commit c6fbf0f.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):
- public class Java8DatasetAggregatorSuite extends JavaDatasetAggregatorSuiteBase
- public class JavaDatasetAggregatorSuite extends JavaDatasetAggregatorSuiteBase
- class JavaDatasetAggregatorSuiteBase implements Serializable

SparkQA · 2016-04-06T07:54:23Z

Test build #55091 has finished for PR 12191 at commit 18bff08.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

hvanhovell · 2016-04-06T08:42:47Z

@bomeng what is the point of adding non-reserved keywords if they are not used in parser rules?

The main point of this ticket is that we need to make sure that we do not have regressions compared to the old situation; a non-reserved keyword in the old situation should not be reserved in the new situation. Did you find any of these cases?

viirya · 2016-04-06T09:47:40Z

If we don't actually use these non-reserved keywords in any rules, I think we don't need to add them to the list. It might cause confusing too.

bomeng · 2016-04-06T13:46:16Z

Sorry for my misunderstanding. I thought we want to keep all the keywords that were defined in the Antlr3 and later if we want to use them, we do not have to add them back case by case.

Among the items I added, some of them (e.g. ASC, DESC) needs to be in the non-reserved list, since they are used the parser and were non-reserved before. Should I only focus on those? What is the best way to do it? Please advise. Thanks.

hvanhovell · 2016-04-06T13:54:53Z

@bomeng No worries. Please focus on the keywords that are reserved in the ANTLR4 parser, but were not in the ANTLR3 parser. The exception being join keywords.

We can add the other keywords when we need them.

bomeng · 2016-04-07T21:54:47Z

Another try... This time, I've scanned all the existing keywords one by one and added missing non-reserved ones back. So it is more conservative approach. Later on, if we need to support more syntax, we can add more keywords by then. Thanks.

SparkQA · 2016-04-07T23:16:44Z

Test build #55247 has finished for PR 12191 at commit 2cff552.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

hvanhovell · 2016-04-08T14:49:35Z

sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4

    | STATISTICS | ANALYZE | PARTITIONED | EXTERNAL | DEFINED | RECORDWRITER
    | REVOKE | GRANT | LOCK | UNLOCK | MSCK | EXPORT | IMPORT | LOAD | VALUES | COMMENT | ROLE
    | ROLES | COMPACTIONS | PRINCIPALS | TRANSACTIONS | INDEX | INDEXES | LOCKS | OPTION
+    | ASC | DESC | LIMIT | METADATA | MINUS | PLUS | RENAME | SETS


PLUS (+) and MINUS (-) are bit funny, and really shouldn't be used as identifiers. Lets leave them out.

bomeng · 2016-04-08T19:44:08Z

@hvanhovell I've made the changes by removing +/-. I really want to sort out the keywords in the file if you agree, right now, I have to search one by one and it is tedious. Do you think it is worth to do another JIRA for that?

SparkQA · 2016-04-08T21:10:14Z

Test build #55384 has finished for PR 12191 at commit 8d02b4d.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2016-04-08T21:20:23Z

Test build #55386 has finished for PR 12191 at commit 1b5f511.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

bomeng · 2016-04-11T17:31:58Z

@hvanhovell

viirya · 2016-04-13T07:31:00Z

@bomeng Can you update description too? Thanks.

bomeng · 2016-04-13T16:46:05Z

description is updated. thanks.

SparkQA · 2016-04-14T23:36:56Z

Test build #55855 has finished for PR 12191 at commit 6b5924c.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

hvanhovell · 2016-04-18T05:40:44Z

@bomeng sorry for not getting back to you sooner. Is sorting the list only for asthetics and ease of searching? It seems like it is not really worth effort if it is, what do you think?

It might have a little merit in terms of performance to group all nonReserved keywords together. The parser has to check if a Token is on the nonReserved list and it does this by switch statements. Having a complete range of nonReserved tokens might allow a JIT/Compiler to optimize this.

bomeng · 2016-04-18T19:35:40Z

Yes, the reason for sorting the keywords is for ease of searching purpose.
I have checked the generated codes and see the switch/case for each non-reserved words. But to my understanding, case A: case B: ... won't have performance difference as case A | B: ... this should be easily optimized by the compiler.

hvanhovell · 2016-04-19T05:59:53Z

The compiler should emit a tableswitch instead of a lookupswitch when the nonReserved keywords are grouped together; which is a bit faster. I don't think the improvement is large enought to warrant another change and another PR. So lets merge this one and be done.

LGTM

hvanhovell · 2016-04-19T07:10:36Z

Merging to master. Thanks!

## What changes were proposed in this pull request? I have compared non-reserved list in Antlr3 and Antlr4 one by one as well as all the existing keywords defined in Antlr4, added the missing keywords to the non-reserved keywords list. If we need to support more syntax, we can add more keywords by then. Any recommendation for the above is welcome. ## How was this patch tested? I manually checked the keywords one by one. Please let me know if there is a better way to test. Another thought: I suggest to put all the keywords definition and non-reserved list in order, that will be much easier to check in the future. Author: bomeng <bmeng@us.ibm.com> Closes apache#12191 from bomeng/SPARK-14398.

bomeng added 2 commits April 5, 2016 17:19

update non-reserved list

0f26a6a

add value type

5c130ba

bomeng added 2 commits April 5, 2016 19:40

add a missing non-reserved keyword

bf84dee

Merge remote-tracking branch 'upstream/master' into SPARK-14398

c6fbf0f

bomeng added 2 commits April 5, 2016 23:16

Merge remote-tracking branch 'upstream/master' into SPARK-14398

f83e284

fix error and merge upstream

18bff08

bomeng added 2 commits April 7, 2016 10:20

Merge remote-tracking branch 'upstream/master' into SPARK-14398

c4a62f8

update non-reserved list

2cff552

hvanhovell reviewed Apr 8, 2016
View reviewed changes

bomeng added 2 commits April 8, 2016 12:40

Merge remote-tracking branch 'upstream/master' into SPARK-14398

64c34d0

remove +/- for now

8d02b4d

double-check: found metadata is a duplicate

1b5f511

Merge remote-tracking branch 'upstream/master' into SPARK-14398

6b5924c

asfgit closed this in 74fe235 Apr 19, 2016

bomeng deleted the SPARK-14398 branch April 27, 2016 22:12

Conversation

bomeng commented Apr 6, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

How was this patch tested?

Uh oh!

SparkQA commented Apr 6, 2016

Uh oh!

SparkQA commented Apr 6, 2016

Uh oh!

SparkQA commented Apr 6, 2016

Uh oh!

hvanhovell commented Apr 6, 2016

Uh oh!

viirya commented Apr 6, 2016

Uh oh!

bomeng commented Apr 6, 2016

Uh oh!

hvanhovell commented Apr 6, 2016

Uh oh!

bomeng commented Apr 7, 2016

Uh oh!

SparkQA commented Apr 7, 2016

Uh oh!

hvanhovell Apr 8, 2016

Choose a reason for hiding this comment

Uh oh!

bomeng commented Apr 8, 2016

Uh oh!

SparkQA commented Apr 8, 2016

Uh oh!

SparkQA commented Apr 8, 2016

Uh oh!

bomeng commented Apr 11, 2016

Uh oh!

viirya commented Apr 13, 2016

Uh oh!

bomeng commented Apr 13, 2016

Uh oh!

SparkQA commented Apr 14, 2016

Uh oh!

hvanhovell commented Apr 18, 2016

Uh oh!

bomeng commented Apr 18, 2016

Uh oh!

hvanhovell commented Apr 19, 2016

Uh oh!

hvanhovell commented Apr 19, 2016

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

bomeng commented Apr 6, 2016 •

edited

Loading