reconstruct antlrv3 grammar to improve performance #440

yuyuankang · 2019-09-29T14:40:05Z

No description provided.

jixuan1989 · 2019-09-30T00:45:25Z

Hi, the improvement of the performance?

LeiRui · 2019-09-30T00:47:07Z

Hi, could you summarize the core idea of this reconstruction in one or a few sentences?

jt2594838

I also wonder what makes the differences.

server/src/main/antlr3/org/apache/iotdb/db/sql/parse/TqlParser.g

yuyuankang · 2019-10-12T07:17:18Z

The core idea of this reconstruction is

The statement is now grouped as ddlStatement, dmlStatement, and administrationStatement, which is more consistent with the conventional sql design.
I reorganized the format of constant. In previous version, we had negetive integer, positive integer, usigned integer, etc. To aviod problems of longest match, we need use "=>" operator. This operator was widely used in previous version. I simplified the grammer definitions by just defining the integer and real numbers, regardless of the sign. No "=>" operators anymore. Althought, some acceptable constraints are introduced, like "+" is not allowed to identify a positive value. I think because of less checkings during parsing, the performance is improved, especially in parsing full-digit paths.

LeiRui · 2019-10-14T09:29:08Z

"+" is not allowed to identify a positive value

What about updating docs where necessary?

jixuan1989 · 2019-10-15T00:58:15Z

server/src/main/java/org/apache/iotdb/db/qp/constant/TqlParserConstant.java

@@ -1,4 +1,4 @@
-/*
+/**


use /* rather than /**

I replaced /** with /*. Thanks for your suggestion.

jixuan1989 · 2019-10-15T00:59:09Z

server/src/main/java/org/apache/iotdb/db/qp/strategy/LogicalGenerator.java

@@ -1,14 +1,14 @@
-/*
+/**


use /*
and seems no "
" marker

I replaced /** with /*. Thanks for your suggestion.

jixuan1989 · 2019-10-15T02:25:48Z

server/src/main/java/org/apache/iotdb/db/qp/strategy/LogicalGenerator.java

- *
- *     http://www.apache.org/licenses/LICENSE-2.0
- *
+ * <p>


same problem

I removed the "
"s. Thanks for your suggestion.

server/src/main/java/org/apache/iotdb/db/qp/strategy/LogicalGenerator.java

jixuan1989 · 2019-10-15T02:31:21Z

server/src/test/java/org/apache/iotdb/db/qp/plan/LogicalPlanSmallTest.java

+    }
+    AstNode astNode = ParseUtils.findRootNonNullToken(astTree);
+    RootOperator operator = generator.getLogicalPlan(astNode);
+    // expected to throw LogicalOperatorException: LIMIT <N>: N must be a positive integer and can not be zero.


Hi, this Test has not expectation?

I added some expectations. Thanks for your suggestion.

jixuan1989

some minor changes request

Genius-pig · 2019-10-15T08:39:17Z

No "=>" operators anymore.

"=>" operator means always execute predicate. Removing all of them is a real improvement? Syntactic predicates were used to work around a prediction weakness in ANTLR 3.

Genius-pig · 2019-10-15T08:44:31Z

server/src/main/antlr3/org/apache/iotdb/db/sql/parse/TqlLexer.g

+
+
+// ***************************
+fragment A


Why don’t you put them together?

fragment Letter : 'a'..'z' | 'A'..'Z' ;

Separating them makes it easier to define the other key words, easier to make the sql case insensitive. For example,
K_SELECT
: S E L E C T
;

If I only define Letter, it will be
K_SELECT
: 'SELECT' | 'select'
;

any questions? @Genius-pig

You can use code to uppercase input String, what @Ring-k has done will lead to Antlr file is too long.
Check out what has spark done. https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/ParseDriver.scala
https://github.com/apache/spark/blob/master/sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4

It seems that before parsing, they capitalize all characters in a string, and while it is not case-sensitive to keywords, it is also case-insensitive to paths, values, and so on. To aviod it, I chose the way as it is now. There are some other examples.
https://github.com/antlr/grammars-v3/blob/master/mysql/MySQL.g

I think both ways of implementation are fine. If you think the ANTLR file is too long, separating lexer and parser rules in different files is a solution, as what we have done.

I agree with you. There are both ways to handle case-insensitive according to https://github.com/antlr/antlr4/blob/master/doc/case-insensitive-lexing.md which is the Antlr official site. Although your way may cause a little slower, it can handle both case-sensitive and case-insensitive keywords.

yuyuankang · 2019-10-15T11:07:48Z

No "=>" operators anymore.

"=>" operator means always execute predicate. Removing all of them is a real improvement? Syntactic predicates were used to work around a prediction weakness in ANTLR 3.

For each syntactic predicate, ANTLR defines a special method that returns true or false depending on whether the predicate’s grammar fragment matches the next input symbols. So if syntactic predicate is frequently used, the method will be frequently called. So, in my opinion, a simpler design leads to better performance.

yuyuankang added 10 commits July 29, 2019 10:14

implement rpc compression

311d85e

remove demo

e5536fa

update rpc port

2991a1f

change default configuration

fad18a7

change default configuration

be9705d

change to original file

eaf1ce5

add brackets

eb6fd47

solve iotdb.engine and config file conflict

5dc0988

reconstruct tql parser and lexer

eb90af4

improve_antlrv3

1aaffb6

jt2594838 reviewed Sep 30, 2019

View reviewed changes

server/src/main/antlr3/org/apache/iotdb/db/sql/parse/TqlParser.g Show resolved Hide resolved

server/src/main/antlr3/org/apache/iotdb/db/sql/parse/TqlParser.g Outdated Show resolved Hide resolved

yuyuankang added 6 commits September 30, 2019 13:41

remove print

c44f2ab

implement delete storage group

498e7c3

fix conflict

e8c15bb

set compressor

38d263d

remove unused package

dee7ff2

remove unused tests

06cebab

jixuan1989 reviewed Oct 15, 2019

View reviewed changes

yuyuankang added 2 commits October 15, 2019 10:17

modify comment

a04a020

update comment

3b5a28b

jixuan1989 reviewed Oct 15, 2019

View reviewed changes

server/src/main/java/org/apache/iotdb/db/qp/strategy/LogicalGenerator.java Outdated Show resolved Hide resolved

jixuan1989 reviewed Oct 15, 2019

View reviewed changes

jixuan1989 requested changes Oct 15, 2019

View reviewed changes

remove useless comment

266e574

yuyuankang added 3 commits October 15, 2019 11:12

complete delete storage group test

21d37dd

update delete storage group test

543c160

update comment

d37eea5

Genius-pig reviewed Oct 15, 2019

View reviewed changes

yuyuankang added 3 commits October 15, 2019 22:03

update comment

9f2930b

remove quit statement

f27c308

remove quit test

9ce7fa0

jixuan1989 approved these changes Oct 18, 2019

View reviewed changes

jt2594838 approved these changes Oct 21, 2019

View reviewed changes

fix conflict

bce102d

jt2594838 merged commit 4ea7bcc into apache:master Oct 22, 2019

reconstruct antlrv3 grammar to improve performance #440

reconstruct antlrv3 grammar to improve performance #440

Uh oh!

Conversation

yuyuankang commented Sep 29, 2019

Uh oh!

jixuan1989 commented Sep 30, 2019

Uh oh!

LeiRui commented Sep 30, 2019

Uh oh!

jt2594838 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

yuyuankang commented Oct 12, 2019

Uh oh!

LeiRui commented Oct 14, 2019

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jixuan1989 left a comment

Choose a reason for hiding this comment

Uh oh!

Genius-pig commented Oct 15, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

yuyuankang commented Oct 15, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Genius-pig commented Oct 15, 2019 •

edited

Loading