Implement the parser for expression #4039

andylokandy · 2022-02-01T15:23:58Z

I hereby agree to the terms of the CLA available at: https://databend.rs/dev/policies/cla/

Summary

Implement the parser for expressions.

The core technique used here to help arrange the operator precedence and associativity is called Pratt parser, which is also known as the Top-Down Operator-Precedence (TDOP) parser. I recommend this tdop-tutorial if you're interested in how it works.

Changelog

Not for changelog (changelog entry is not required)

Related Issues

Ref #866

Test Plan

Unit Tests

databend-bot · 2022-02-01T15:24:02Z

Thanks for the contribution!
I have applied any labels matching special text in your PR Changelog.

Please review the labels and make any necessary changes.

databend-bot · 2022-02-01T15:24:04Z

Thanks for the contribution!
I have applied any labels matching special text in your PR Changelog.

Please review the labels and make any necessary changes.

vercel · 2022-02-01T15:24:04Z

This pull request is being automatically deployed with Vercel (learn more).
To see the status of your deployment, click below or on the icon next to each commit.

🔍 Inspect: https://vercel.com/databend/databend/AL3GCXGsu9ZvMgq9F12hSoKw5YNh
✅ Preview: Canceled

[Deployment for dcb0eab canceled]

codecov-commenter · 2022-02-01T20:07:26Z

Codecov Report

Merging #4039 (2eb8f04) into main (33de75f) will decrease coverage by 0%.
The diff coverage is 30%.

@@          Coverage Diff          @@
##            main   #4039   +/-   ##
=====================================
- Coverage     57%     57%   -1%     
=====================================
  Files        820     821    +1     
  Lines      43423   43454   +31     
=====================================
  Hits       24786   24786           
- Misses     18637   18668   +31

Impacted Files	Coverage Δ
common/ast/src/parser/rule/expr.rs	`0% <0%> (ø)`
common/ast/src/parser/token.rs	`80% <ø> (ø)`
.../ast/src/parser/transformer/transform_sqlparser.rs	`46% <43%> (-3%)`	⬇️
common/ast/src/parser/ast/expression.rs	`47% <62%> (+3%)`	⬆️
common/management/src/cluster/cluster_mgr.rs	`78% <0%> (-2%)`	⬇️
metasrv/src/network.rs	`98% <0%> (+1%)`	⬆️
common/dal/src/context.rs	`88% <0%> (+2%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 33de75f...2eb8f04. Read the comment docs.

leiysky · 2022-02-02T04:24:07Z

common/ast/src/parser/ast/expression.rs

    FunctionCall {
-        // Set to true if the function is aggregate function with `DISTINCT`, like `COUNT(DISTINCT a)`
+        /// Set to true if the function is aggregate function with `DISTINCT`, like `COUNT(DISTINCT a)`
        distinct: bool,
        name: String,
        args: Vec<Expr>,
        params: Vec<Literal>,
    },


I'm thinking about lifting the distinct field out and add a AggregationFunc variant for aggregation functions(SUM, AVG etc.).

At the level of AST, can we distinguish functions and aggregate functions ?

@sundy-li I think it's viable because the name of aggregate functions are keywords.

Yes. But if we have custom aggregate functions like: window_funnel, retention, should we add them into
as keywords?

Yes. But if we have custom aggregate functions like: window_funnel, retention, should we add them into as keywords?

In general, it's case by case, but my suggestion is treating aggregate functions names as keywords.

It's more convenient to handle special grammar of specific functions by reserving their names as keyword(e.g. EXTRACT(... FROM ...), COUNT(*), SUM(DISTINCT ...)).

Since all aggregate functions should support the DISTINCT grammar, and the number of aggregate functions is much lower than scalar functions, I think it's fine to reserve their names as keywords.

But there are two main problems I came up with:

UDF Aggregate/Scalar Functions （function name is defined by the user and stored in meta service)

Combinators: *If like sumIf. If we add some aggregate function, we should update the keywords & combinator keywords.

But there are two main problems I came up with:

UDF Aggregate/Scalar Functions （function name is defined by the user and stored in meta service)

Combinators: *If like sumIf. If we add some aggregate function, we should update the keywords & combinator keywords.

The UDFs are context sensitive that cannot be handled by parser.

It seems we have to keep the distinct field here for planner to check the semantic of a unrecognizable function...

What about only reserving the special cases like COUNT(for COUNT(*)) as keyword?

leiysky · 2022-02-02T04:26:34Z

common/ast/src/parser/rule/expr.rs

+
+#[derive(Debug, Clone, PartialEq)]
+#[allow(dead_code)]
+pub enum ExprElement {


Would you add some comment to explain the function of this struct?

andylokandy · 2022-02-02T15:01:36Z

common/ast/src/parser/rule/expr.rs

+// TODO(andylokandy): complete the keyword-function list, or remove the functions' name from keywords
+pub fn function_name<'a, Error>(i: Input<'a>) -> IResult<Input<'a>, String, Error>
+where Error: ParseError<Input<'a>> {
+    map(
+        rule! {
+            Ident
+            | COUNT
+            | SUM
+            | AVG
+            | MIN
+            | MAX
+            | STDDEV_POP
+            | SQRT
+        },
+        |name| name.text.to_string(),
+    )(i)


@leiysky FYI, this list is probably incomplete. Which solution mentioned in the todo do you prefer?

I suppose we just keep COUNT as a token and use Ident for the rest, i.e. the second way?

I suppose we just keep COUNT as a token and use Ident for the rest, i.e. the second way?

Agree.

databend-bot · 2022-02-03T14:06:01Z

Wait for another reviewer approval

BohuTANG · 2022-02-03T15:41:24Z

/lgtm

Thank you @andylokandy !

databend-bot · 2022-02-03T15:41:32Z

CI Passed
Reviewers Approved
Let's Merge
Thank you for the PR @andylokandy

Implement parser for expression

0c9b605

andylokandy requested a review from BohuTANG as a code owner February 1, 2022 15:23

databend-bot added the pr-not-for-changelog label Feb 1, 2022

databend-bot added the need-review label Feb 1, 2022

vercel bot temporarily deployed to Preview February 1, 2022 15:24 Inactive

andylokandy changed the title ~~Implement parser for expression~~ Implement the parser for expression Feb 1, 2022

leiysky self-assigned this Feb 1, 2022

leiysky self-requested a review February 1, 2022 15:51

Add unit tests

c0e9eff

vercel bot temporarily deployed to Preview February 1, 2022 19:14 Inactive

Fix typo

0d3c5c1

vercel bot temporarily deployed to Preview February 1, 2022 19:24 Inactive

leiysky reviewed Feb 2, 2022

View reviewed changes

Specially treat between

2eb8f04

vercel bot temporarily deployed to Preview February 2, 2022 07:23 Inactive

Improve error reporting

484967f

vercel bot temporarily deployed to Preview February 2, 2022 09:06 Inactive

Remove dbg

dcb0eab

vercel bot temporarily deployed to Preview February 2, 2022 13:12 Inactive

andylokandy commented Feb 2, 2022

View reviewed changes

andylokandy mentioned this pull request Feb 2, 2022

Implment backtrack for parser and add a human-friendly pretty printer for errors #4045

Merged

leiysky approved these changes Feb 3, 2022

View reviewed changes

BohuTANG requested a review from sundy-li February 3, 2022 15:38

databend-bot approved these changes Feb 3, 2022

View reviewed changes

databend-bot removed the need-review label Feb 3, 2022

databend-bot merged commit 76e831b into datafuselabs:main Feb 3, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement the parser for expression #4039

Implement the parser for expression #4039

andylokandy commented Feb 1, 2022 •

edited

databend-bot commented Feb 1, 2022

databend-bot commented Feb 1, 2022

vercel bot commented Feb 1, 2022 •

edited

codecov-commenter commented Feb 1, 2022 •

edited

leiysky Feb 2, 2022

sundy-li Feb 2, 2022 •

edited

andylokandy Feb 2, 2022

sundy-li Feb 2, 2022

leiysky Feb 2, 2022

sundy-li Feb 2, 2022

leiysky Feb 3, 2022

leiysky Feb 2, 2022

andylokandy Feb 2, 2022

andylokandy Feb 2, 2022

leiysky Feb 3, 2022

sundy-li Feb 3, 2022

databend-bot commented Feb 3, 2022

BohuTANG commented Feb 3, 2022

databend-bot commented Feb 3, 2022

Implement the parser for expression #4039

Implement the parser for expression #4039

Conversation

andylokandy commented Feb 1, 2022 • edited

Summary

Changelog

Related Issues

Test Plan

databend-bot commented Feb 1, 2022

databend-bot commented Feb 1, 2022

vercel bot commented Feb 1, 2022 • edited

codecov-commenter commented Feb 1, 2022 • edited

Codecov Report

Choose a reason for hiding this comment

sundy-li Feb 2, 2022 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

databend-bot commented Feb 3, 2022

BohuTANG commented Feb 3, 2022

databend-bot commented Feb 3, 2022

andylokandy commented Feb 1, 2022 •

edited

vercel bot commented Feb 1, 2022 •

edited

codecov-commenter commented Feb 1, 2022 •

edited

sundy-li Feb 2, 2022 •

edited