Skip to content

Conversation

@yuyuankang
Copy link
Contributor

No description provided.

@jixuan1989
Copy link
Member

Hi, the improvement of the performance?

@LeiRui
Copy link
Contributor

LeiRui commented Sep 30, 2019

Hi, could you summarize the core idea of this reconstruction in one or a few sentences?

Copy link
Contributor

@jt2594838 jt2594838 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also wonder what makes the differences.

@yuyuankang
Copy link
Contributor Author

The core idea of this reconstruction is

  1. The statement is now grouped as ddlStatement, dmlStatement, and administrationStatement, which is more consistent with the conventional sql design.
  2. I reorganized the format of constant. In previous version, we had negetive integer, positive integer, usigned integer, etc. To aviod problems of longest match, we need use "=>" operator. This operator was widely used in previous version. I simplified the grammer definitions by just defining the integer and real numbers, regardless of the sign. No "=>" operators anymore. Althought, some acceptable constraints are introduced, like "+" is not allowed to identify a positive value. I think because of less checkings during parsing, the performance is improved, especially in parsing full-digit paths.

@LeiRui
Copy link
Contributor

LeiRui commented Oct 14, 2019

"+" is not allowed to identify a positive value

What about updating docs where necessary?

@@ -1,4 +1,4 @@
/*
/**
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use /* rather than /**

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I replaced /** with /*. Thanks for your suggestion.

@@ -1,14 +1,14 @@
/*
/**
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use /*
and seems no "

" marker

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I replaced /** with /*. Thanks for your suggestion.

*
* http://www.apache.org/licenses/LICENSE-2.0
*
* <p>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same problem

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed the "

"s. Thanks for your suggestion.

}
AstNode astNode = ParseUtils.findRootNonNullToken(astTree);
RootOperator operator = generator.getLogicalPlan(astNode);
// expected to throw LogicalOperatorException: LIMIT <N>: N must be a positive integer and can not be zero.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, this Test has not expectation?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added some expectations. Thanks for your suggestion.

Copy link
Member

@jixuan1989 jixuan1989 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

some minor changes request

@Genius-pig
Copy link
Contributor

Genius-pig commented Oct 15, 2019

No "=>" operators anymore.

"=>" operator means always execute predicate. Removing all of them is a real improvement? Syntactic predicates were used to work around a prediction weakness in ANTLR 3.



// ***************************
fragment A
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why don’t you put them together?

fragment
Letter
    : 'a'..'z' | 'A'..'Z'
    ;

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Separating them makes it easier to define the other key words, easier to make the sql case insensitive. For example,
K_SELECT
: S E L E C T
;

If I only define Letter, it will be
K_SELECT
: 'SELECT' | 'select'
;

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

any questions? @Genius-pig

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems that before parsing, they capitalize all characters in a string, and while it is not case-sensitive to keywords, it is also case-insensitive to paths, values, and so on. To aviod it, I chose the way as it is now. There are some other examples.
https://github.com/antlr/grammars-v3/blob/master/mysql/MySQL.g

I think both ways of implementation are fine. If you think the ANTLR file is too long, separating lexer and parser rules in different files is a solution, as what we have done.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with you. There are both ways to handle case-insensitive according to https://github.com/antlr/antlr4/blob/master/doc/case-insensitive-lexing.md which is the Antlr official site. Although your way may cause a little slower, it can handle both case-sensitive and case-insensitive keywords.

@yuyuankang
Copy link
Contributor Author

No "=>" operators anymore.

"=>" operator means always execute predicate. Removing all of them is a real improvement? Syntactic predicates were used to work around a prediction weakness in ANTLR 3.

For each syntactic predicate, ANTLR defines a special method that returns true or false depending on whether the predicate’s grammar fragment matches the next input symbols. So if syntactic predicate is frequently used, the method will be frequently called. So, in my opinion, a simpler design leads to better performance.

@jt2594838 jt2594838 merged commit 4ea7bcc into apache:master Oct 22, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants