New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement the parser for expression #4039
Conversation
Thanks for the contribution! Please review the labels and make any necessary changes. |
Thanks for the contribution! Please review the labels and make any necessary changes. |
This pull request is being automatically deployed with Vercel (learn more). 🔍 Inspect: https://vercel.com/databend/databend/AL3GCXGsu9ZvMgq9F12hSoKw5YNh [Deployment for dcb0eab canceled] |
Codecov Report
@@ Coverage Diff @@
## main #4039 +/- ##
=====================================
- Coverage 57% 57% -1%
=====================================
Files 820 821 +1
Lines 43423 43454 +31
=====================================
Hits 24786 24786
- Misses 18637 18668 +31
Continue to review full report at Codecov.
|
FunctionCall { | ||
// Set to true if the function is aggregate function with `DISTINCT`, like `COUNT(DISTINCT a)` | ||
/// Set to true if the function is aggregate function with `DISTINCT`, like `COUNT(DISTINCT a)` | ||
distinct: bool, | ||
name: String, | ||
args: Vec<Expr>, | ||
params: Vec<Literal>, | ||
}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm thinking about lifting the distinct
field out and add a AggregationFunc
variant for aggregation functions(SUM, AVG etc.).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At the level of AST
, can we distinguish functions
and aggregate functions
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@sundy-li I think it's viable because the name of aggregate functions are keywords.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. But if we have custom aggregate functions like: window_funnel
, retention
, should we add them into
as keywords?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. But if we have custom aggregate functions like:
window_funnel
,retention
, should we add them into as keywords?
In general, it's case by case, but my suggestion is treating aggregate functions names as keywords.
It's more convenient to handle special grammar of specific functions by reserving their names as keyword(e.g. EXTRACT(... FROM ...)
, COUNT(*)
, SUM(DISTINCT ...)
).
Since all aggregate functions should support the DISTINCT
grammar, and the number of aggregate functions is much lower than scalar functions, I think it's fine to reserve their names as keywords.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But there are two main problems I came up with:
- UDF Aggregate/Scalar Functions (function name is defined by the user and stored in meta service)
- Combinators:
*If
like sumIf. If we add some aggregate function, we should update the keywords & combinator keywords.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But there are two main problems I came up with:
- UDF Aggregate/Scalar Functions (function name is defined by the user and stored in meta service)
- Combinators:
*If
like sumIf. If we add some aggregate function, we should update the keywords & combinator keywords.
The UDF
s are context sensitive that cannot be handled by parser.
It seems we have to keep the distinct
field here for planner to check the semantic of a unrecognizable function...
What about only reserving the special cases like COUNT
(for COUNT(*)
) as keyword?
|
||
#[derive(Debug, Clone, PartialEq)] | ||
#[allow(dead_code)] | ||
pub enum ExprElement { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would you add some comment to explain the function of this struct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
addressed
// TODO(andylokandy): complete the keyword-function list, or remove the functions' name from keywords | ||
pub fn function_name<'a, Error>(i: Input<'a>) -> IResult<Input<'a>, String, Error> | ||
where Error: ParseError<Input<'a>> { | ||
map( | ||
rule! { | ||
Ident | ||
| COUNT | ||
| SUM | ||
| AVG | ||
| MIN | ||
| MAX | ||
| STDDEV_POP | ||
| SQRT | ||
}, | ||
|name| name.text.to_string(), | ||
)(i) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@leiysky FYI, this list is probably incomplete. Which solution mentioned in the todo do you prefer?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suppose we just keep COUNT
as a token and use Ident
for the rest, i.e. the second way?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suppose we just keep
COUNT
as a token and useIdent
for the rest, i.e. the second way?
Agree.
Wait for another reviewer approval |
/lgtm Thank you @andylokandy ! |
CI Passed |
I hereby agree to the terms of the CLA available at: https://databend.rs/dev/policies/cla/
Summary
Implement the parser for expressions.
The core technique used here to help arrange the operator precedence and associativity is called Pratt parser, which is also known as the Top-Down Operator-Precedence (TDOP) parser. I recommend this tdop-tutorial if you're interested in how it works.
Changelog
Related Issues
Ref #866
Test Plan
Unit Tests