feat: add lambda syntax to grammar #6868

stevenpyzhang · 2021-01-14T20:52:28Z

Description

Adds syntax for lambda functions to the KSQL grammar

LambdaFunctionExpression for representing a lambda function node
LambdaLiteral for representing a lambda variable

During the AstBuilder phase, the lambda variables are treated as UnqualifiedColumnReferences since I couldn't pass in a proper context, so I decided to convert them with the AstSanitizer.

Currently, lambdas won't work properly because the corresponding SqlToJavaVisitor methods haven't been implemented yet.

Testing done

Unit test
Manual test

Reviewer checklist

Ensure docs are updated if necessary. (eg. if a user visible feature is being added or changed).
Ensure relevant issues are linked (description should include text like "Fixes #")

guozhangwang

Honestly I'm quite under-qualified yet to review this PR, since I'm still learning the AST code.. so I just left a few questions mainly for my own learning.

guozhangwang · 2021-01-21T00:12:22Z

ksqldb-engine/src/test/java/io/confluent/ksql/engine/rewrite/AstSanitizerTest.java

+  public void shouldSanitizeLambdaArguments() {
+    // Given:
+    final Statement stmt = givenQuery(
+        "SELECT TRANSFORM_ARRAY(X => X + 5) FROM TEST2;");


Is this a valid statement in practice? I.e. without a column reference of the entity TEST, how should we know which column to apply this lambda expression?

This particular one wouldn't be valid.

We don't have to have a column reference though to apply the invocation function, we could have Select Transform_Array(ARRAY[1, 2], x => x + 5) FROM TEST2 emit changes;

What does ARRAY[1, 2] mean here?

https://docs.ksqldb.io/en/latest/developer-guide/syntax-reference/#array

It's how we create an inline array in the language. In this case we're not using any of the rows from the source and instead just using a constant array for use in the function

I see. So in this case it is irrelevant to TEST2's records, and would always return a single record as {key: null, value: [6,7]} right?

...tion/src/main/java/io/confluent/ksql/execution/expression/tree/LambdaFunctionExpression.java

guozhangwang · 2021-01-21T00:17:15Z

ksqldb-parser/src/test/java/io/confluent/ksql/parser/AstBuilderTest.java

+  @Test
+  public void shouldBuildLambdaFunction() {
+    // Given:
+    final SingleStatementContext stmt = givenQuery("SELECT TRANSFORM_ARRAY(Col4, X => X + 5) FROM TEST1;");


For my own education: it seems that if Col is not specified, the sanitization phase would add a default column COL_0 to it, is that right? If yes, when is the sanitization phase triggered? And how would COL_0 be leverage in execution?

The santizer runs in the engine after parsing. The parser (which builds our internal AST) should be doing a direct translation from the antlr ast to our internal AST representation.

Thanks! But I'm wondering, if Col4 is not specified, i.e.

SELECT TRANSFORM_ARRAY(X => X + 5) FROM TEST1

Then the sanitizer would add a COL_0 to the result AST, does that mean we default to the first column (rowkey?) if none is specified? I thought if it is not specified then the statement should fail.

the KSQL_COL_0 refers to the name of the column in the output of the query. We need to have a column name for each column and if the user didn't specify what the column name should be, we should generate a unique name for that output column

ksql> select 5,3 as TEST,6,time from KSQL_PROCESSING_LOG emit changes; +----------------------------------+----------------------------------+----------------------------------+----------------------------------+ |KSQL_COL_0 |TEST |KSQL_COL_1 |TIME | +----------------------------------+----------------------------------+----------------------------------+----------------------------------+

rodesai

Thanks, @stevenpyzhang! Left a few comments/questions inline.

rodesai · 2021-01-21T06:54:51Z

ksqldb-parser/src/main/java/io/confluent/ksql/parser/AstBuilder.java

@@ -1173,10 +1185,12 @@ public Node visitWhenClause(final SqlBaseParser.WhenClauseContext context) {

    @Override
    public Node visitFunctionCall(final SqlBaseParser.FunctionCallContext context) {
+      final List<Expression> expressionList = visit(context.expression(), Expression.class);
+      expressionList.addAll(visit(context.lambdaFunction(), Expression.class));
      return new FunctionCall(


aren't we losing the ordering of the lambda w/ respect to the other expressions? I guess maybe it doesn't matter?

If we don't care about ordering the lambdas with respect to the expressions we should probably put them in their own list.

I'm going to update the g4 so that all the lambda functions have to be after all the other function arguments

| identifier'(' (expression (',' expression)* (',' lambdaFunction)*)? ')' #functionCall

So then this assumption can be made soundly

Sounds good to me.

rodesai · 2021-01-21T06:57:08Z

ksqldb-parser/src/test/java/io/confluent/ksql/parser/AstBuilderTest.java

+  @Test
+  public void shouldBuildLambdaFunction() {
+    // Given:
+    final SingleStatementContext stmt = givenQuery("SELECT TRANSFORM_ARRAY(Col4, X => X + 5) FROM TEST1;");


The santizer runs in the engine after parsing. The parser (which builds our internal AST) should be doing a direct translation from the antlr ast to our internal AST representation.

rodesai · 2021-01-21T07:01:45Z

ksqldb-execution/src/main/java/io/confluent/ksql/execution/expression/tree/LambdaLiteral.java

+import java.util.Optional;
+
+@Immutable
+public class LambdaLiteral extends Literal {


question rather than suggestion - why do we need to use another type for these? An alternative would be to just use UnqualifiedColumnReferenceExp (and maybe rename it to UnqualifiedReferenceExp). Do you think it makes it easier to implement lambdas if they have different types?

I think when implementing the type reference, it makes the code cleaner. I'd imagine when doing type checking in ExpressionTypeManager, instead of having to pass down in the context (similar to what's happening in AstSanitizer) that we're in a lambda function expression, we can just have a specific visitor for a LambdaLiteral.

After working through some of the type stuff, I do think it's somewhat necessary to have the additional type. Right now, the array/map input is passed in as an UnqualifiedReferenceExp and it's nice to be able to distinguish that from the lambda inputs

rodesai · 2021-01-21T07:33:19Z

ksqldb-engine/src/main/java/io/confluent/ksql/engine/rewrite/ExpressionTreeRewriter.java

+        return result.get();
+      }
+
+      final LambdaContext lambdaContext =


LambdaContext doesn't belong here - the rewriter shouldn't make any interpretation of the context. From this class's POV context is an opaque cookie to pass around to the rewriter so it can keep some internal state.

What's the intent of this LambdaContext stuff? is it just to screen for conflicts in variable names w/ enclosing variable names or column names? The better way to structure this would be to keep track of these in AstSanitizer. So from there you could do:

class AstSanitizer { ... private static final class ExpressionRewriterPlugin extends VisitParentExpressionVisitor<Optional<Expression>, Context<SanitizerContext>> { @Override public Optional<Expression> visitUnqualifiedColumnReference( final UnqualifiedColumnReferenceExp expression, final SanitizerContext ctx ) { final ColumnName columnName = expression.getColumnName(); if (ctx.lambdaArgs().contains(columnName.text())) { return Optional.of(new LambdaLiteral(columnName.text())); } ... } @Override public Optional<Expression> visitLambdaExpression( final LambdaFunctionExpression expression, final SanitizerContext ctx ) { dataSourceExtractor.getAllSources().forEach(aliasedDataSource -> { for (String argument : expression.getArguments()) { if (aliasedDataSource.getDataSource().getSchema().columns().stream() .map(column -> column.name().text()).collect(Collectors.toList()) .contains(argument)) { throw new KsqlException("Lambda argument can't be a column name."); } if (ctx.lambdaArgs().contains(argument)) { throw new KsqlException... } ctx.addLambdaArg(argument); } }); return Optional.empty(); // just let the rewriter reconstruct the lambda - this function just validates and tracks args } } private static class SanitizerContext { Set<String> lambdaArgs = new HashSet<>(); private void addLambdaArg(final String name) { reservedNames.add(name); } private Set<String> lambdaArgs() { return ImmutableSet.copyOf(lambdaArgs); } } }

Then, this method should just become a dumb rewrite like everything else, and we can get rid of LambdaContext

Then in https://github.com/confluentinc/ksql/pull/6868/files#diff-e5f9abbf3a2878a1dce09039136805764ecc31404a7eb27f2d1b49b9e038030dR81 we would just pass in a new SanitizerContext() instead of a null.

I'm wondering, if we can just reuse LambdaContext instead of creating a new SanitizerContext here.

I suppose we could, though tracking the lambda args seems specific to AstSanitizer at the moment.

...tion/src/main/java/io/confluent/ksql/execution/expression/tree/LambdaFunctionExpression.java

guozhangwang

Thanks for the new commit. I just have a few minor comments left, otherwise LGTM.

ksqldb-parser/src/main/antlr4/io/confluent/ksql/parser/SqlBase.g4

guozhangwang · 2021-01-29T06:26:50Z

ksqldb-parser/src/main/java/io/confluent/ksql/parser/AstBuilder.java

@@ -1173,10 +1185,12 @@ public Node visitWhenClause(final SqlBaseParser.WhenClauseContext context) {

    @Override
    public Node visitFunctionCall(final SqlBaseParser.FunctionCallContext context) {
+      final List<Expression> expressionList = visit(context.expression(), Expression.class);
+      expressionList.addAll(visit(context.lambdaFunction(), Expression.class));
      return new FunctionCall(


Sounds good to me.

guozhangwang · 2021-01-29T06:28:51Z

ksqldb-parser/src/test/java/io/confluent/ksql/parser/AstBuilderTest.java

+  @Test
+  public void shouldBuildLambdaFunction() {
+    // Given:
+    final SingleStatementContext stmt = givenQuery("SELECT TRANSFORM_ARRAY(Col4, X => X + 5) FROM TEST1;");


guozhangwang · 2021-01-29T06:30:23Z

ksqldb-parser/src/test/java/io/confluent/ksql/parser/AstBuilderTest.java

            ),
            Optional.empty())
    ))));
  }

+  @Test


Since in g4 we allow multiple expressions followed by multiple lambda functions, could we also add a test for the error case where number of expressions is not equal to the number of lambda functions?

I don't think we can make this restriction, not all invocation functions would fit this requirement. From KLIP-30

reduce_map(map, s, (k, v, s) => s) - Reduces the input Map down to a single value. s is the initial state and is passed into the scope of the lambda function. Each invocation returns a new value for s, which the next invocation will receive. reduce_map will return the final value of s.

^ this one wouldn't fit this restriction

Got it, thanks.

rodesai

LGTM!

stevenpyzhang force-pushed the lambda-function-syntax branch 4 times, most recently from f7b417c to cf79b2c Compare January 19, 2021 04:53

feat: add lambda syntax to grammar

f3c1ca5

stevenpyzhang force-pushed the lambda-function-syntax branch 3 times, most recently from 5f6c212 to f6ab6e0 Compare January 20, 2021 06:21

stevenpyzhang marked this pull request as ready for review January 20, 2021 18:01

stevenpyzhang requested a review from a team as a code owner January 20, 2021 18:01

guozhangwang reviewed Jan 21, 2021

View reviewed changes

rodesai reviewed Jan 21, 2021

View reviewed changes

guozhangwang reviewed Jan 23, 2021

View reviewed changes

...tion/src/main/java/io/confluent/ksql/execution/expression/tree/LambdaFunctionExpression.java Outdated Show resolved Hide resolved

stevenpyzhang force-pushed the lambda-function-syntax branch 4 times, most recently from 0a5ad24 to a8fc46a Compare January 27, 2021 07:03

change g4

1ee3a57

stevenpyzhang force-pushed the lambda-function-syntax branch from a8fc46a to 1ee3a57 Compare January 27, 2021 16:21

stevenpyzhang requested review from rodesai, guozhangwang, a team and lct45 January 27, 2021 18:09

guozhangwang approved these changes Jan 29, 2021

View reviewed changes

rodesai approved these changes Jan 29, 2021

View reviewed changes

spacing

59557f2

stevenpyzhang merged commit dd3f365 into confluentinc:master Jan 29, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add lambda syntax to grammar #6868

feat: add lambda syntax to grammar #6868

stevenpyzhang commented Jan 14, 2021

guozhangwang left a comment

guozhangwang Jan 21, 2021

stevenpyzhang Jan 21, 2021

guozhangwang Jan 21, 2021

stevenpyzhang Jan 21, 2021

guozhangwang Jan 21, 2021

stevenpyzhang Jan 22, 2021

guozhangwang Jan 21, 2021

rodesai Jan 21, 2021

guozhangwang Jan 22, 2021

stevenpyzhang Jan 26, 2021

guozhangwang Jan 29, 2021

rodesai left a comment

rodesai Jan 21, 2021

rodesai Jan 21, 2021

stevenpyzhang Jan 26, 2021

guozhangwang Jan 29, 2021

rodesai Jan 21, 2021

rodesai Jan 21, 2021

stevenpyzhang Jan 26, 2021

lct45 Jan 27, 2021

rodesai Jan 21, 2021

guozhangwang Jan 23, 2021 •

edited

Loading

rodesai Jan 26, 2021

guozhangwang left a comment

guozhangwang Jan 29, 2021

guozhangwang Jan 29, 2021

guozhangwang Jan 29, 2021

stevenpyzhang Jan 29, 2021

guozhangwang Jan 29, 2021

rodesai left a comment

feat: add lambda syntax to grammar #6868

feat: add lambda syntax to grammar #6868

Conversation

stevenpyzhang commented Jan 14, 2021

Description

Testing done

Reviewer checklist

guozhangwang left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rodesai left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

guozhangwang Jan 23, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

guozhangwang left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rodesai left a comment

Choose a reason for hiding this comment

guozhangwang Jan 23, 2021 •

edited

Loading