-
Notifications
You must be signed in to change notification settings - Fork 1.1k
reconstruct antlrv3 grammar to improve performance #440
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Hi, the improvement of the performance? |
|
Hi, could you summarize the core idea of this reconstruction in one or a few sentences? |
jt2594838
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also wonder what makes the differences.
server/src/main/antlr3/org/apache/iotdb/db/sql/parse/TqlParser.g
Outdated
Show resolved
Hide resolved
|
The core idea of this reconstruction is
|
What about updating docs where necessary? |
| @@ -1,4 +1,4 @@ | |||
| /* | |||
| /** | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use /* rather than /**
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I replaced /** with /*. Thanks for your suggestion.
| @@ -1,14 +1,14 @@ | |||
| /* | |||
| /** | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use /*
and seems no "
" marker
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I replaced /** with /*. Thanks for your suggestion.
| * | ||
| * http://www.apache.org/licenses/LICENSE-2.0 | ||
| * | ||
| * <p> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same problem
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I removed the "
"s. Thanks for your suggestion.
server/src/main/java/org/apache/iotdb/db/qp/strategy/LogicalGenerator.java
Outdated
Show resolved
Hide resolved
| } | ||
| AstNode astNode = ParseUtils.findRootNonNullToken(astTree); | ||
| RootOperator operator = generator.getLogicalPlan(astNode); | ||
| // expected to throw LogicalOperatorException: LIMIT <N>: N must be a positive integer and can not be zero. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi, this Test has not expectation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added some expectations. Thanks for your suggestion.
jixuan1989
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
some minor changes request
"=>" operator means always execute predicate. Removing all of them is a real improvement? Syntactic predicates were used to work around a prediction weakness in ANTLR 3. |
|
|
||
|
|
||
| // *************************** | ||
| fragment A |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why don’t you put them together?
fragment
Letter
: 'a'..'z' | 'A'..'Z'
;
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Separating them makes it easier to define the other key words, easier to make the sql case insensitive. For example,
K_SELECT
: S E L E C T
;
If I only define Letter, it will be
K_SELECT
: 'SELECT' | 'select'
;
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
any questions? @Genius-pig
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can use code to uppercase input String, what @Ring-k has done will lead to Antlr file is too long.
Check out what has spark done. https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/ParseDriver.scala
https://github.com/apache/spark/blob/master/sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems that before parsing, they capitalize all characters in a string, and while it is not case-sensitive to keywords, it is also case-insensitive to paths, values, and so on. To aviod it, I chose the way as it is now. There are some other examples.
https://github.com/antlr/grammars-v3/blob/master/mysql/MySQL.g
I think both ways of implementation are fine. If you think the ANTLR file is too long, separating lexer and parser rules in different files is a solution, as what we have done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with you. There are both ways to handle case-insensitive according to https://github.com/antlr/antlr4/blob/master/doc/case-insensitive-lexing.md which is the Antlr official site. Although your way may cause a little slower, it can handle both case-sensitive and case-insensitive keywords.
For each syntactic predicate, ANTLR defines a special method that returns true or false depending on whether the predicate’s grammar fragment matches the next input symbols. So if syntactic predicate is frequently used, the method will be frequently called. So, in my opinion, a simpler design leads to better performance. |
No description provided.