[CALCITE-6242] Enhance lambda closure#4926
Conversation
| .type("RecordType(INTEGER NOT NULL EXPR$0) NOT NULL"); | ||
| s.withSql("select HIGHER_ORDER_FUNCTION2(1, () -> 0.1)") | ||
| .type("RecordType(INTEGER NOT NULL EXPR$0) NOT NULL"); | ||
| s.withSql("select HIGHER_ORDER_FUNCTION(1, (x, y) -> x + 1 + ^emp.deptno^) from emp") |
There was a problem hiding this comment.
Please add jira message in test.
|
|
||
| !ok | ||
|
|
||
| select * |
|
|
||
| @Override public R visitLambda(RexLambda lambda, P arg) { | ||
| return null; | ||
| if (!deep) { |
There was a problem hiding this comment.
I think you could add some comments here to explain the reason.
|
There are some new discussions in Jira that you should keep up with. |
mihaibudiu
left a comment
There was a problem hiding this comment.
What happens to nested lambdas? Are they legal? Will they validate?
x -> (y -> x + y)
For the inner closure "x" should not be treated like a table column name.
Thank you for your suggestion. Currently, nested lambdas like x -> (y -> x + y) can be parsed normally. I have added more cases to verify @mihaibudiu |
mihaibudiu
left a comment
There was a problem hiding this comment.
This is very nice, left a couple more requests for checking some tricky corner cases.
| checkArgument(parameterTypes.containsKey(columnName), | ||
| "column %s not found", columnName); | ||
| return parameterTypes.get(columnName); | ||
| if (parameterTypes.containsKey(columnName)) { |
There was a problem hiding this comment.
Do you know what happens to case sensitivity?
I suspect parameter names are subject to the same rules as other identifiers, including quoting.
Can you please write some tests using uppercase/lowercase parameter names and also using quoted names?
There was a problem hiding this comment.
Do you know what happens to case sensitivity? I suspect parameter names are subject to the same rules as other identifiers, including quoting. Can you please write some tests using uppercase/lowercase parameter names and also using quoted names?
Thank you for reminding me that the support for case sensitivity here is indeed not rigorous enough. I have made optimizations and added test cases. I am very grateful for your professional and meticulous review of this pull request (PR)
| // the inner lambda can resolve references to outer lambda parameters. | ||
| // e.g., in x -> EXISTS(arr, y -> x + y = 4), the inner lambda's blackboard | ||
| // needs access to "X" from the outer lambda's nameToNodeMap. | ||
| if (bb.nameToNodeMap != null) { |
There was a problem hiding this comment.
speaking of this, I suspect we should reject lambdas with the same parameter used twice x -> x -> x + 1. Please write a test about that too.
There was a problem hiding this comment.
speaking of this, I suspect we should reject lambdas with the same parameter used twice
x -> x -> x + 1. Please write a test about that too.
I completely agree. I have added new test cases for this
4ef11b6 to
2056190
Compare
| // nested lambdas can resolve outer lambda parameters via the | ||
| // scope chain, rather than treating them as table column names. | ||
| registerOperandSubQueries( | ||
| i == 1 ? lambdaScope : parentScope, call, i); |
There was a problem hiding this comment.
Instead of the magic i == 1 inside the for loop, can we just write: (We anyway expect shape of SqlLambda)
registerOperandSubQueries(lambdaScope, call.getExpression(), i)
Also, I don't quite get why we need registerOperandSubQueries for the parameters (operand = 0)? Do we need this loop at all ?
There was a problem hiding this comment.
Oh, sorry. We can not write like that, because registerOperandSubQueries accept ordinal as parameter. But question about loop is still valid
There was a problem hiding this comment.
Oh, sorry. We can not write like that, because
registerOperandSubQueriesaccept ordinal as parameter. But question about loop is still valid
I agree, but compared to lambda, in the current scenario, flattening it into two lines would be more intuitive. Operand 0 is the parameter list, and operand 1 is the lambda body
There was a problem hiding this comment.
Why do we need two lines? Do we actually need registerOperandSubQueries for the parameters, is it even possible for a parameter to be a SubQuery?
I think this could be simplified from:
for (int i = 0; i < operands.size(); i++) {
registerOperandSubQueries(
i == 1 ? lambdaScope : parentScope, call, i);
to just:
// To make it easier to understand code
int expressionOperand = 1
registerOperandSubQueries(lambdaScope, call, expressionOperand)
?
There was a problem hiding this comment.
Why do we need two lines? Do we actually need registerOperandSubQueries for the parameters, is it even possible for a parameter to be a
SubQuery?I think this could be simplified from:
for (int i = 0; i < operands.size(); i++) { registerOperandSubQueries( i == 1 ? lambdaScope : parentScope, call, i);to just:
// To make it easier to understand code int expressionOperand = 1 registerOperandSubQueries(lambdaScope, call, expressionOperand)?
like this:
// Register the parameter list under the parent scope
// (parameters are visible to the lambda body, not the other way around).
registerOperandSubQueries(parentScope, call, 0);
// Register the expression body under the lambda scope so that
// nested lambdas can resolve outer lambda parameters via the
// scope chain, rather than treating them as table column names.
registerOperandSubQueries(lambdaScope, call, 1);
There was a problem hiding this comment.
Why do you need to call registerOperandSubQueries for parameters.
registerOperandSubQueries(parentScope, call, 0);
registerOperandSubQueries does:
Registers any sub-queries inside a given call operand, and converts the
operand to a scalar sub-query if the operator requires it.
SqlLambda.parameters is list of SqlIdentitifier. Can it ever be/contain a SubQuery? I hope not.
So, registerOperandSubQueries(parentScope, call, 0) should do nothing. It should be safe to drop it
There was a problem hiding this comment.
Good catch, you're right.
SqlLambda.parameters is a SqlNodeList whose elements are always SqlIdentifier (the parameter names). Tracing through registerOperandSubQueries → registerSubQueries:
parameters.getKind() is not QUERY, so the SCALAR_QUERY wrapping branch is skipped.
In the SqlNodeList branch, each SqlIdentifier is neither a QUERY nor a SqlCall/SqlNodeList, so it falls into the atomic-node branch and is ignored.
So registerOperandSubQueries(parentScope, call, 0) is effectively a no-op on a lambda's parameter list, and dropping it is safe.I have updated the code.
| throw newValidationError(params.get(j), | ||
| RESOURCE.duplicateLambdaParameter(otherName)); | ||
| } | ||
| } |
There was a problem hiding this comment.
Can we write something like:
final Set<String> seen = catalogReader.nameMatcher().createSet();
for (SqlNode param : lambdaExpr.getParameters()) {
final String name = ((SqlIdentifier) param).getSimple();
if (!seen.add(name)) {
throw newValidationError(param, RESOURCE.duplicateLambdaParameter(name));
}
}
There was a problem hiding this comment.
Can we write something like:
final Set<String> seen = catalogReader.nameMatcher().createSet(); for (SqlNode param : lambdaExpr.getParameters()) { final String name = ((SqlIdentifier) param).getSimple(); if (!seen.add(name)) { throw newValidationError(param, RESOURCE.duplicateLambdaParameter(name)); } }
done,thanks
mihaibudiu
left a comment
There was a problem hiding this comment.
If @dssysolyatin does not have other questions, let's merge this.
|
I don't think calcite has a "call" expression, which would allow you to call a closure: |
|
Just last question - #4926 (comment) and feel free to merge after that |
4b33f31 to
5a993bf
Compare
|



jira: https://issues.apache.org/jira/browse/CALCITE-6242