New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[CALCITE-3679] Allow lambda expressions in SQL queries #3502
Conversation
Is there a design document for this feature? Moreover, if EXISTS is independent on the lambdas perhaps it should be in a separate PR. |
Leaving aside the design document, I don't see any changes to the documentation files either. |
98e238c
to
340a603
Compare
The real tests of whether the design works properly is to allow a lambda expression wherever a function name is allowed. |
Hi @mihaibudiu, thanks for your review. I have made some commits to address your feedback:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks very good.
I've suggested quite a few changes to names and scope (e.g. removing public
) to make this easier to maintain.
Can you add a quidem test? I would like to see this working end-to-end.
*/ | ||
public class RexLambdaRef extends RexInputRef { | ||
|
||
private final String paramName; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could this be an ordinal?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After making RexLambdaRef
extend RexSlot
, we can now use the name
, so there is no need for the paramName
anymore.
import org.apache.calcite.sql.SqlKind; | ||
|
||
/** | ||
* Variable which references a field of a lambda expression. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/which/that/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
} | ||
|
||
@Override public boolean equals(@Nullable Object o) { | ||
if (this == o) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you use the compact form 'this == o || o instanceof RexLambdaExpression & ...'?
no need for Objects.equals. use expression.equals because it's not-null.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
//~ Constructors ----------------------------------------------------------- | ||
|
||
RexLambdaExpression(List<RexLambdaRef> parameters, RexNode expression) { | ||
this.expression = expression; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ImmutbleList.copyOf(parameters)
requireNonNull(expression)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
/** | ||
* Represents a lambda expression. | ||
*/ | ||
public class RexLambdaExpression extends RexNode { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rename RexLambdaExpression to RexLambda. give example.
@@ -1060,6 +1064,9 @@ void AddArg(List<SqlNode> list, ExprContext exprContext) : | |||
) | |||
( | |||
e = Default() | |||
| | |||
LOOKAHEAD((SimpleIdentifierOrList() | <LPAREN> <RPAREN>) <LAMBDA_OPERATOR>) | |||
e = LambdaExpression() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How expensive is this LOOKAHEAD?
LOOKAHEAD(2) | ||
<LPAREN> <RPAREN> { parameters = SqlNodeList.EMPTY; } | ||
| | ||
parameters = SimpleIdentifierOrList() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should there be a rule that allows ()
, x
and (x [, y]*)
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I extract this rule to SimpleIdentifierOrListOrEmpty
.
@@ -8787,6 +8817,7 @@ void NonReservedKeyWord2of3() : | |||
| < NE2: "!=" > | |||
| < PLUS: "+" > | |||
| < MINUS: "-" > | |||
| < LAMBDA_OPERATOR: "->" > |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just LAMBDA
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
} | ||
final SqlNode expression = toSql(program, lambda.getExpression()); | ||
return new SqlLambdaExpression(POS, parameters, expression); | ||
case LAMBDA_REF: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add blank line before
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
.fails("Cannot apply '(?s).*HIGHER_ORDER_FUNCTION' to arguments of type " | ||
+ "'HIGHER_ORDER_FUNCTION\\(<INTEGER>, <FUNCTION\\(ANY, ANY, ANY\\) -> ANY>\\)'.*"); | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
need to see more tests for EXISTS. including negative tests. wrong type of arguments, wrong number of arguments, arguments of wrong name, arguments of wrong case.
check type derivation (if you didn't do it in SqlOperatorTest)
@@ -134,6 +134,18 @@ RelDataType createMapType( | |||
RelDataType keyType, | |||
RelDataType valueType); | |||
|
|||
/** | |||
* Create a lambda expression type. Lambda expressions are functions that |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/Create/Creates/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
return Objects.hash(expression, parameters); | ||
} | ||
|
||
@Override public String toString() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why override toString
rather than computeDigest
?
I think it would be clearer and more concise if you use a for
loop rather than functional style.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Best thing is to do what RexMRAggCall
does - make the class final and initialize digest
in the constructor
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed, I use for
loop instead.
RexLambdaExpression
extends RexNode
, the computeDigest
is defined in RexCall
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your review and encouragement, @julianhyde. I have made some changes based on your comments. And, I will be adding an end-to-end test later.
@@ -1033,6 +1034,9 @@ void AddArg0(List<SqlNode> list, ExprContext exprContext) : | |||
) | |||
( | |||
e = Default() | |||
| | |||
LOOKAHEAD((SimpleIdentifierOrList() | <LPAREN> <RPAREN>) <LAMBDA_OPERATOR>) | |||
e = LambdaExpression() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The most important token is ->
, so the expense depends on how many tokens come before ->
.
For example:
() -> true // equals LOOKAHEAD(3)
(a, b) -> a + b // equals LOOKAHED(6)
LOOKAHEAD(2) | ||
<LPAREN> <RPAREN> { parameters = SqlNodeList.EMPTY; } | ||
| | ||
parameters = SimpleIdentifierOrList() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I extract this rule to SimpleIdentifierOrListOrEmpty
.
@@ -8787,6 +8817,7 @@ void NonReservedKeyWord2of3() : | |||
| < NE2: "!=" > | |||
| < PLUS: "+" > | |||
| < MINUS: "-" > | |||
| < LAMBDA_OPERATOR: "->" > |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
} | ||
final SqlNode expression = toSql(program, lambda.getExpression()); | ||
return new SqlLambdaExpression(POS, parameters, expression); | ||
case LAMBDA_REF: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
@@ -134,6 +134,18 @@ RelDataType createMapType( | |||
RelDataType keyType, | |||
RelDataType valueType); | |||
|
|||
/** | |||
* Create a lambda expression type. Lambda expressions are functions that |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
@@ -77,6 +77,7 @@ public enum SqlTypeFamily implements RelDataTypeFamily { | |||
CURSOR, | |||
COLUMN_LIST, | |||
GEO, | |||
LAMBDA_EXPRESSION, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
super(parent); | ||
this.lambdaExpr = lambdaExpr; | ||
|
||
// default parameter type is ANY |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In fact, the lambda expression will be validated twice.
A simple lambda expression a -> a
, we can not infer the type of lambda if we just know the lambda expression, this is why I say "default parameter type is ANY".
But if we see the lambda expression in a function, we will validate the lambda expression second time in LambdaExpressionOperandTypeChecker
.
For example, give a function test_func1
, whose type checker is
OperandTypes.sequence("TEST_FUNC1(INTEGER, FUNCTION(STRING) -> NUMERIC)",
OperandTypes.family(SqlTypeFamily.INTEGER),
OperandTypes.function(SqlTypeFamily.NUMERIC, SqlTypeFamily.STRING))
Assuming there is an SQL statement,
select test_func1(1, x -> x / 2);
The first validation:
we assume the type of x
is ANY
. We can know the x -> x / 2
is legal.
The second validation:
Based on the type checker, the type of x
is string, so x / 2
is not a valid expression.
*/ | ||
public class SqlLambdaExpressionScope extends ListScope { | ||
private final SqlLambdaExpression lambdaExpr; | ||
private final Map<String, RelDataType> parameterTypes; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to register the type of parameter to SqlLambdaExpressionScope
when validate lambda expression second time, so I use the map as cache.
|
||
/** | ||
* SQL lambda expression type. | ||
*/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have renamed this class to FunctionSqlType
. But I think all the RelDataType
classes are public, FunctionSqlType
should be consistent with them.
@@ -623,4 +625,24 @@ public CompositeFunction() { | |||
return typeFactory.createSqlType(SqlTypeName.BIGINT); | |||
} | |||
} | |||
|
|||
private static final SqlFunction HIGHER_ORDER_FUNCTION = | |||
new SqlFunction("HIGHER_ORDER_FUNCTION", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
Do the lambdas allow capturing values? |
Thank you for your review. I apologize for the delayed response; I have been quite busy over the past few weeks. Regarding this PR, I have made modifications to almost all of the reviews. And then, I will write more tests. Once that is done, I will split this PR into two parts: one for lambda and another for the exists function |
eaacd78
to
7a2d043
Compare
Co-authored-by: Ritesh Kapoor <riteshkapoor.opensource@gmail.com>
Quality Gate passedThe SonarCloud Quality Gate passed, but some issues were introduced. 12 New issues |
This PR is about three things:
Parse
Lambda expression is only accepted as a parameter of function call.
Validate
I add a new type
LAMBDA_EXPRESSION
to describe type of lambda expression.For example:
The type of lambda in
exists
function is a function type where the input parameter is of typeANY
and the return type is of typeBOOLEAN
.The lambda expression will be validated twice. For the first time, calcite will validate the lambda expression itself, which means all the types of parameters of lambda expression are considered as
ANY
. For example,(x, y) -> x + y
, bothx
andy
areANY
type, we can deduce thatx + y
is alsoANY
, so the type of(x, y) -> x + y
isFUNCTION(ANY, ANY) -> ANY
. And then we need to know if the type of lambda is legal in a function, this is whatLambdaExpressionOperandTypeChecker
do. It will reset the type of parameters, and infer the type of lambda expression, and check if the derived type satisfy this checker. After the checker, we can infer the real type of lamda expression.RelNode
In this part, I create a class
RexLambdaRef
to reference the parameter of lambda expression. From another perspective, we can view lambda expressions as "constants" rather than tables, so we cannot directly use 'RexInputRef' and 'RexLocalRef'`.I referred to #1733, so I listed Ritesh as a co-author.