Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement the parser for expression #4039

Merged
merged 6 commits into from Feb 3, 2022
Merged

Conversation

andylokandy
Copy link
Collaborator

@andylokandy andylokandy commented Feb 1, 2022

I hereby agree to the terms of the CLA available at: https://databend.rs/dev/policies/cla/

Summary

Implement the parser for expressions.

The core technique used here to help arrange the operator precedence and associativity is called Pratt parser, which is also known as the Top-Down Operator-Precedence (TDOP) parser. I recommend this tdop-tutorial if you're interested in how it works.

Changelog

  • Not for changelog (changelog entry is not required)

Related Issues

Ref #866

Test Plan

Unit Tests

@databend-bot
Copy link
Member

Thanks for the contribution!
I have applied any labels matching special text in your PR Changelog.

Please review the labels and make any necessary changes.

@databend-bot
Copy link
Member

Thanks for the contribution!
I have applied any labels matching special text in your PR Changelog.

Please review the labels and make any necessary changes.

@vercel
Copy link

vercel bot commented Feb 1, 2022

This pull request is being automatically deployed with Vercel (learn more).
To see the status of your deployment, click below or on the icon next to each commit.

🔍 Inspect: https://vercel.com/databend/databend/AL3GCXGsu9ZvMgq9F12hSoKw5YNh
✅ Preview: Canceled

[Deployment for dcb0eab canceled]

@andylokandy andylokandy changed the title Implement parser for expression Implement the parser for expression Feb 1, 2022
@leiysky leiysky self-assigned this Feb 1, 2022
@leiysky leiysky self-requested a review February 1, 2022 15:51
@codecov-commenter
Copy link

codecov-commenter commented Feb 1, 2022

Codecov Report

Merging #4039 (2eb8f04) into main (33de75f) will decrease coverage by 0%.
The diff coverage is 30%.

Impacted file tree graph

@@          Coverage Diff          @@
##            main   #4039   +/-   ##
=====================================
- Coverage     57%     57%   -1%     
=====================================
  Files        820     821    +1     
  Lines      43423   43454   +31     
=====================================
  Hits       24786   24786           
- Misses     18637   18668   +31     
Impacted Files Coverage Δ
common/ast/src/parser/rule/expr.rs 0% <0%> (ø)
common/ast/src/parser/token.rs 80% <ø> (ø)
.../ast/src/parser/transformer/transform_sqlparser.rs 46% <43%> (-3%) ⬇️
common/ast/src/parser/ast/expression.rs 47% <62%> (+3%) ⬆️
common/management/src/cluster/cluster_mgr.rs 78% <0%> (-2%) ⬇️
metasrv/src/network.rs 98% <0%> (+1%) ⬆️
common/dal/src/context.rs 88% <0%> (+2%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 33de75f...2eb8f04. Read the comment docs.

Comment on lines 70 to 76
FunctionCall {
// Set to true if the function is aggregate function with `DISTINCT`, like `COUNT(DISTINCT a)`
/// Set to true if the function is aggregate function with `DISTINCT`, like `COUNT(DISTINCT a)`
distinct: bool,
name: String,
args: Vec<Expr>,
params: Vec<Literal>,
},
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm thinking about lifting the distinct field out and add a AggregationFunc variant for aggregation functions(SUM, AVG etc.).

Copy link
Member

@sundy-li sundy-li Feb 2, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At the level of AST, can we distinguish functions and aggregate functions ?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sundy-li I think it's viable because the name of aggregate functions are keywords.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. But if we have custom aggregate functions like: window_funnel, retention, should we add them into
as keywords?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. But if we have custom aggregate functions like: window_funnel, retention, should we add them into as keywords?

In general, it's case by case, but my suggestion is treating aggregate functions names as keywords.

It's more convenient to handle special grammar of specific functions by reserving their names as keyword(e.g. EXTRACT(... FROM ...), COUNT(*), SUM(DISTINCT ...)).

Since all aggregate functions should support the DISTINCT grammar, and the number of aggregate functions is much lower than scalar functions, I think it's fine to reserve their names as keywords.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But there are two main problems I came up with:

  1. UDF Aggregate/Scalar Functions (function name is defined by the user and stored in meta service)
  2. Combinators: *If like sumIf. If we add some aggregate function, we should update the keywords & combinator keywords.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But there are two main problems I came up with:

  1. UDF Aggregate/Scalar Functions (function name is defined by the user and stored in meta service)
  2. Combinators: *If like sumIf. If we add some aggregate function, we should update the keywords & combinator keywords.

The UDFs are context sensitive that cannot be handled by parser.

It seems we have to keep the distinct field here for planner to check the semantic of a unrecognizable function...

What about only reserving the special cases like COUNT(for COUNT(*)) as keyword?


#[derive(Debug, Clone, PartialEq)]
#[allow(dead_code)]
pub enum ExprElement {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would you add some comment to explain the function of this struct?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

addressed

Comment on lines +656 to +671
// TODO(andylokandy): complete the keyword-function list, or remove the functions' name from keywords
pub fn function_name<'a, Error>(i: Input<'a>) -> IResult<Input<'a>, String, Error>
where Error: ParseError<Input<'a>> {
map(
rule! {
Ident
| COUNT
| SUM
| AVG
| MIN
| MAX
| STDDEV_POP
| SQRT
},
|name| name.text.to_string(),
)(i)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@leiysky FYI, this list is probably incomplete. Which solution mentioned in the todo do you prefer?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose we just keep COUNT as a token and use Ident for the rest, i.e. the second way?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose we just keep COUNT as a token and use Ident for the rest, i.e. the second way?

Agree.

@databend-bot
Copy link
Member

Wait for another reviewer approval

@BohuTANG
Copy link
Member

BohuTANG commented Feb 3, 2022

/lgtm

Thank you @andylokandy !

@databend-bot
Copy link
Member

CI Passed
Reviewers Approved
Let's Merge
Thank you for the PR @andylokandy

@databend-bot databend-bot merged commit 76e831b into datafuselabs:main Feb 3, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants