[BEAM-10925] Implement Java UDF in ZetaSQL. #13200

ibzib · 2020-10-26T20:37:40Z

Adds jar loading to #12398.

UDFs are specified by providing a jar to ZetaSQL options; support for
SqlTransform::registerUd(a)?f will be added in a later change.

R: @amaliujia

Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:

Choose reviewer(s) and mention them in a comment (R: @username).
Format the pull request title like [BEAM-XXX] Fixes bug in ApproximateQuantiles, where you replace BEAM-XXX with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue.
Update CHANGES.md with noteworthy changes.
If this contribution is large, please file an Apache Individual Contributor License Agreement.

See the Contributor Guide for more tips on how to make review process smoother.

Post-Commit Tests Status (on master branch)

Lang	Dataflow	Samza	Twister2
Go	---	---	---
Java
Python		---	---
XLang	---	---	---

Pre-Commit Tests Status (on master branch)

---	Java	Python	Go	Website	Whitespace	Typescript
Non-portable
Portable	---		---	---	---	---

See .test-infra/jenkins/README for trigger phrase, status and link of all Jenkins jobs.

GitHub Actions Tests Status (on master branch)

See CI.md for more information about GitHub Actions CI.

amaliujia · 2020-10-27T04:36:49Z

Thank you! Will take a look soon!

amaliujia · 2020-10-27T04:37:52Z

Run SQL PreCommit

amaliujia · 2020-10-27T04:37:58Z

Run Java PreCommit

amaliujia · 2020-10-27T04:38:04Z

Run SQL_Java11 PreCommit

amaliujia · 2020-10-27T17:18:33Z

The failed SQL tests seems true. I will review the code and after the overall idea looks good, we can dig into to see why some tests have failed.

kennknowles · 2020-10-27T18:12:40Z

sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/UdfProvider.java

+ */
+public interface UdfProvider {
+  /** Maps function names to scalar function implementations. */
+  default Map<String, Method> userDefinedScalarFunctions() {


Commented on the design doc: it is really preferable not to use Method here, but a specialized interface that is more specific. Method has a bunch of things that are not relevant and also if we want to add UDF-specific methods we can't.

I can see why you think Method have too many things that might not relevant. Ideally we only need users to provide function signature (including function name) , which are enough to 1) offer required information to SQL parser 2) locate the right implementation in the user jar.

The reason we still use Method, is because that current UDF implementation asks for a Method: 1 2 3. So as we will need to construct a Method eventually, it doesn't seem a bad choice to expose it in the interface.

Using our own interface would give us more flexibility with the implementation, but if our interface was solely a means of looking up a Method, I don't see much of a difference. If we want to add UDF-specific methods, there's no guarantee we could do that in a backward-compatible way even if we used a specialized interface.

(Mostly unrelated fun fact: Intellij can autofill this.getClass.getMethod calls, so it's more convenient to use than I initially thought.)

Agreed with what @ibzib has said.

My another thought is, having a Method in our interface actually is a good thing for longer term evolvement. Once the interface is adopted, it will become harder to deprecate and upgrade this interface. Thus having Method provides more flexibility to allow us enhance UDF implementation without changing the interface. The information (e.g. annotations) that might be not relevant now could become relevant later.

I don't agree. If it is a simple function, a method can always be used as in SomeClass::methodName and this is also easier for following control flow (including for IDEs). Whenever you can use explicit method references instead of reflection, you should.

The converse is not true. If a function is provided, it cannot automatically be converted to Method. The class Method is specifically a rich representation of an aspect of a Class in the JVM. It is not a good class for use when you want some sort of function. It is not a good class for anything except when you specifically want to manipulate the JVM's representation of a class.

If we expand the interface, we can maintain backwards-compatibility with default method implementation. If we need to expand it in a way where this is not possible we will need to create a new interface. If we use Method it is not possible to expand the interface anyhow.

I really think if you look at the pro/con of using Method vs a smaller interface there is not a single thing in the "pro" column.

@kennknowles Maybe we should have an idea of what smaller interface you are thinking of (in case we have been thinking different things)? Can you give a concrete example, say for a scalar function, what is the implementation and what is returned from the interface?Then we can compare and see pros/cons and how to expand the interface?

We can go back to design doc to discuss this if you prefer to. Right now I cannot find the original comment you mentioned so starting a new comment is ok.

Per our offline chat, we can continue discussing what a good interface should be and this PR should not be blocked. We will mark this interface as experimental and refine when necessary.

Most important: offline I agreed that we do not need to block on the representation of a UDF as a Method. But I would still like to see a test case demonstrating what it is like for a user to register a UDF using a lambda / anonymous function.

I created a custom interface ScalarFn, which uses annotation @ApplyMethod to locate the method to be applied. It adds a little more boilerplate (+2 SLOC / function), but now it's extensible.

But I would still like to see a test case demonstrating what it is like for a user to register a UDF using a lambda / anonymous function.

One of the drawbacks of the annotation approach is that it is not compatible with lambdas. But we needed the flexibility; lambdas require a known/fixed number of arguments.

amaliujia

Thanks! Overall approach looks good. Have left some comments.

...tensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/planner/BeamRuleSets.java

...nsions/sql/zetasql/src/main/java/org/apache/beam/sdk/extensions/sql/zetasql/SqlAnalyzer.java

...sql/zetasql/src/main/java/org/apache/beam/sdk/extensions/sql/zetasql/ZetaSQLPlannerImpl.java

...ql/zetasql/src/main/java/org/apache/beam/sdk/extensions/sql/zetasql/ZetaSQLQueryPlanner.java

...asql/src/main/java/org/apache/beam/sdk/extensions/sql/zetasql/translation/JavaUdfLoader.java

amaliujia · 2020-10-27T20:20:06Z

Those failed SQL precommits are about compilation error by Janino. For example:

java.lang.UnsupportedOperationException: Could not compile CalcFn: {
  c.output(org.apache.beam.sdk.values.Row.withSchema(outputSchema).attachValues(java.util.Arrays.asList(org.apache.beam.sdk.extensions.sql.zetasql.translation.impl.TimestampFunctions.timestamp(1230219000000L))));
}

It is not obvious to me why this PR has failed such tests. Will take another look to see whether I can find the root cause.

amaliujia · 2020-10-27T21:03:08Z

@ibzib as this PR is lagging behind the master for 200+ commits. Maybe we could try a rebase then check failing precommit tests?

Adds jar loading to apache#12398. UDFs are specified by providing a jar to ZetaSQL options; support for SqlTransform::registerUd(a)?f will be added in a later change.

ibzib · 2020-10-27T23:55:30Z

@ibzib as this PR is lagging behind the master for 200+ commits. Maybe we could try a rebase then check failing precommit tests?

@amaliujia I rebased, but those tests are still failing.

amaliujia · 2020-10-28T05:53:10Z

@ibzib now I can see the failed tests better.

So if you check the error messages, for those failed test, they have been hitting BeamCalcRel. However, for such ZetaSQL function tests, they should hit BeamZetaSqlRel. So can you run one of the failed tests locally and observe the behavior of visiting BeamJavaUdfRule and BeamZetaSqlRule, and see why the BeamJavaUdfRule is matched thus BeamCalcRel is used?

…unGroup field to ScalarFunctionImpl. Built-in functions should go through BeamZetaSqlCalcRel, not BeamCalcRel.

ibzib · 2020-10-28T22:02:35Z

@amaliujia Rules are fixed, PTAL

amaliujia · 2020-10-28T22:21:14Z

Overall LGTM. @kennknowles do you have other comments except for the interface discussion?

amaliujia · 2020-10-28T22:48:19Z

Run Java PreCommit

...nsions/sql/zetasql/src/main/java/org/apache/beam/sdk/extensions/sql/zetasql/SqlAnalyzer.java

amaliujia · 2020-10-29T18:58:11Z

...sql/zetasql/src/test/java/org/apache/beam/sdk/extensions/sql/zetasql/ZetaSqlJavaUdfTest.java

+    thrown.expect(IllegalArgumentException.class);
+    thrown.expectMessage("Option 'path' has type TYPE_INT64 (expected TYPE_STRING).");
+    zetaSQLQueryPlanner.convertToBeamRel(sql);
+  }


Can you add a case to verify NULL handling? E.g. pass a null value to increment? It is ok if it does not work, but it will be good to see expected behavior and then document it.

I added some tests. I discovered a gap in type checking, since the current implementation does not strictly enforce that the SQL function signature must match the Java signature. I will fix that.

That makes sense. Thanks!

I will also help add checks after this PR is in. For example, check the mixed usage of built-in functions and UDF.

amaliujia

LGTM

Thanks for adding the NULL test. I have also tested this PR by code import and didn't see a regression.

amaliujia · 2020-10-30T17:46:42Z

Run Java PreCommit

kennknowles · 2020-10-30T20:26:17Z

...extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/ScalarFunctionImpl.java

@@ -91,14 +93,15 @@ private ScalarFunctionImpl(Method method, CallImplementor implementor) {
   *
   * @param clazz class that is used to implement the function
   * @param methodName Method name (typically "eval")
+   * @param funGroup Optional function group identifier to differentiate implementations.


Not sure this clarifies what a "function group" is. What is it?

It's an arbitrary ZetaSQL field: https://github.com/google/zetasql/blob/1e03a558e2c26ad8f6a47fcb388ac4265d92f0dd/java/com/google/zetasql/Function.java#L61

This probably won't be needed if we use a custom interface instead of Method, since then we will need a new and separate implementation.

kennknowles · 2020-10-30T20:28:15Z

sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/UdfProvider.java

+ */
+public interface UdfProvider {
+  /** Maps function names to scalar function implementations. */
+  default Map<String, Method> userDefinedScalarFunctions() {


Most important: offline I agreed that we do not need to block on the representation of a UDF as a Method. But I would still like to see a test case demonstrating what it is like for a user to register a UDF using a lambda / anonymous function.

kennknowles · 2020-10-30T20:29:42Z

sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/UdfProvider.java

+  }
+
+  /** Maps function names to aggregate function implementations. */
+  default Map<String, Combine.CombineFn<?, ?, ?>> userDefinedAggregateFunctions() {


We did not discuss much, but also CombineFn is tightly coupled to Beam and we may find that it does not have all the metadata needed to be a UDAF. It would be good to have a variety of test cases around this. But it is @Experimental so we do not need to finish the whole thing now.

It is already used in registerUdaf, hasn't that worked so far?

beam/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/SqlTransform.java

Line 257 in 168d442

public SqlTransform registerUdaf(String functionName, Combine.CombineFn combineFn) {

More context:

using CombineFn is to address one of the users request that they don't control the source code, but they want to use that, and those existing libraries are coupled with Beam already by CombineFn interface.

kennknowles · 2020-10-30T20:30:22Z

...extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/ScalarFunctionImpl.java

+  }
+
+  /**
+   * Creates {@link org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.schema.Function} from


Noting that Calcite has a separate Function so they agree that defining your own is useful.

kennknowles · 2020-10-30T20:31:12Z

...vider/src/main/java/org/apache/beam/sdk/extensions/sql/zetasql/provider/UdfTestProvider.java

+
+/**
+ * Defines Java UDFs that are tested in {@link
+ * org.apache.beam.sdk.extensions.sql.zetasql.ZetaSqlJavaUdfTest}.


We could have tests in this module no problem. It would simplify the builds.

I put the UdfProvider in a separate module to prevent it automatically being included on the context classloader for the tests. Otherwise we can get a bunch of false positives because nothing actually needs to be loaded.

kennknowles · 2020-10-30T20:33:52Z

sdks/java/extensions/sql/zetasql/build.gradle

+test {
+  dependsOn emptyJar
+  // Pass jars used by Java UDF tests via system properties.
+  evaluationDependsOnChildren() // Needed to resolve jarPath for udf-test-provider subproject.


I want to suggest: don't. Just put the tests in that module.

kennknowles · 2020-10-30T20:34:21Z

...ql/zetasql/src/main/java/org/apache/beam/sdk/extensions/sql/zetasql/BeamJavaUdfCalcRule.java

+import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.core.Calc;
+import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.logical.LogicalCalc;
+
+public class BeamJavaUdfCalcRule extends BeamZetaSqlCalcRuleBase {


Javadoc please

@amaliujia can you write a Javadoc for this? (after addressing Kenn's other comment on this file)

Sure I can handle this. Can you add a TODO so I can add javadoc in another PR? I will need to write PRs to polish code.

IIRC committers can commit directly to someone else's PR, so you can change it in this PR if you want.

Actually this was more straightforward than I thought. So I did it myself. PTAL

kennknowles · 2020-10-30T20:35:15Z

...etasql/src/main/java/org/apache/beam/sdk/extensions/sql/zetasql/BeamZetaSqlCalcRuleBase.java

+    super(clazz, in, out, description);
+  }
+
+  protected boolean hasUdfInProjects(RelOptRuleCall x) {


Why does this need to use inheritance? It seems like a straightforward function (aka should be a static function somewhere and you don't need a "base" class just to hold the function. FWIW it is a red flag to see a common ancestor class with no explanation other than "Base".

@amaliujia wrote this part so I'll let him comment.

We can change this to a static function now.

The idea was to use inheritance to build a bunch of rules to split Calc thus we can separate Java, Python and built-in function execution. As currently current UDF support in this PR is limited, it is not clear why inheritance.

When there is a need for inheritance in the future after we start to support mixed built-in function and Java UDF, we can add it back.

kennknowles · 2020-10-30T20:37:04Z

...nsions/sql/zetasql/src/main/java/org/apache/beam/sdk/extensions/sql/zetasql/SqlAnalyzer.java

+        getFunctionGroup(createFunctionStmt), String.join(".", createFunctionStmt.getNamePath()));
+  }
+
+  static String getFunctionGroup(ResolvedCreateFunctionStmt createFunctionStmt) {


Seems like "function group" is a concept that will be with us for a while. Where is the best place to explain it so everyone will discover it?

kennknowles · 2020-10-30T20:39:08Z

...tasql/src/main/java/org/apache/beam/sdk/extensions/sql/zetasql/translation/SqlOperators.java


  public static final SqlOperator DATE_OP =
-      createUdfOperator("DATE", BeamBuiltinMethods.DATE_METHOD);
+      createBuiltinFunctionOperator("DATE", BeamBuiltinMethods.DATE_METHOD);


Changes like this raise flags. If you haven't watched how to grow a language I highly recommend it. TL;DR built-in and user-defined functions should have little to no difference if it can be avoided. (for zetasql I think the distinction is "do we execute in Java or do we execute via gRPC to C++"

My understanding is createUdfOperator was to create functions execute via gRPC to C++. The naming was confusing and this is a change to show the truth of how "DATE" is executed.

I only added createBuiltinFunctionOperator to so we could have a default function group. I could remove it and always require a function group.

This is another thing that is liable to change if we steer away from Method.

I removed createBuiltinFunctionOperator and required all function definitions to specify a function group.

Use a custom ScalarFn class instead of using Method directly. Create ZetaSqlScalarFunctionImpl to hold function group.

ibzib

I changed the interface and moved around some implementation code for better encapsulation, PTAL. (more changes to come)

ibzib · 2020-10-31T23:48:58Z

sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/UdfProvider.java

+ */
+public interface UdfProvider {
+  /** Maps function names to scalar function implementations. */
+  default Map<String, Method> userDefinedScalarFunctions() {


I created a custom interface ScalarFn, which uses annotation @ApplyMethod to locate the method to be applied. It adds a little more boilerplate (+2 SLOC / function), but now it's extensible.

But I would still like to see a test case demonstrating what it is like for a user to register a UDF using a lambda / anonymous function.

One of the drawbacks of the annotation approach is that it is not compatible with lambdas. But we needed the flexibility; lambdas require a known/fixed number of arguments.

ibzib · 2020-10-31T23:54:15Z

...ql/zetasql/src/main/java/org/apache/beam/sdk/extensions/sql/zetasql/BeamJavaUdfCalcRule.java

+import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.core.Calc;
+import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.logical.LogicalCalc;
+
+public class BeamJavaUdfCalcRule extends BeamZetaSqlCalcRuleBase {


@amaliujia can you write a Javadoc for this? (after addressing Kenn's other comment on this file)

ibzib · 2020-10-31T23:54:51Z

...etasql/src/main/java/org/apache/beam/sdk/extensions/sql/zetasql/BeamZetaSqlCalcRuleBase.java

+    super(clazz, in, out, description);
+  }
+
+  protected boolean hasUdfInProjects(RelOptRuleCall x) {


@amaliujia wrote this part so I'll let him comment.

amaliujia · 2020-11-03T03:47:05Z

sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/reflect/DoFnSignatures.java

@@ -2197,66 +2197,11 @@ private static void validateTimerFamilyField(
    return DoFnSignature.GetSizeMethod.create(m, windowT, methodContext.getExtraParameters());
  }

-  private static Collection<Method> declaredMethodsWithAnnotation(


Overall idea looks good to me. I will leave @kennknowles to comment this part as @kennknowles knows more about Java SDK.

FileSystems.copy only works when the source and destination have the same filesystem.

ibzib · 2020-11-04T22:26:17Z

@kennknowles This is ready for another review now.

kennknowles

I have some ideas but I think I'll step back from the Java details. It is too big for me to really dig in and we shouldn't hold up in PR. It does seem that the user-interface class (ScalarFn, with annotation-driven method extraction) is what is stored in the registered UDFs, whereas I might prefer that it is converted to a model-level class that could be provided in many different ways (either via annotations, or lambdas, or by directly implementing APIs, etc).

kennknowles · 2020-11-11T18:56:52Z

sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/ScalarFn.java

+
+/**
+ * A scalar function that can be executed as part of a SQL query. Subclasses must contain exactly
+ * one method annotated with {@link ApplyMethod}, which will be applied to the SQL function


I guess you can't use lambdas after all?

Lambdas are not compatible with the annotation-based design.

kennknowles · 2020-11-11T19:04:05Z

...sql/zetasql/src/test/java/org/apache/beam/sdk/extensions/sql/zetasql/ZetaSqlJavaUdfTest.java

+    String sql =
+        String.format(
+            "CREATE FUNCTION matches(str STRING, regStr STRING) RETURNS BOOLEAN LANGUAGE java OPTIONS (path='%s'); "
+                + "SELECT matches(\"a\", \"a\"), 'apple'='beta'",


So it does work with a blend of UDFs and non-UDF expressions? The = is executed by Calcite SQL expression evaluator? (the PR is big enough that I am skimming things and looking at tests)

Yes your are right. This is an expected behavior as this is a big initial PR to achieve "execute Java UDF but might still have places to improve". As a follow up work I will check the mixed usage of UDF and non-UDF case and reject it.

apilloud

I only see one change that needs to be addressed prior to merging this: BeamJavaUdfCalcRule filtering non-udf operations.

This PR is to large to effectively review, you've already received approval from others but if you want a complete review I would suggest breaking it up into at least the following smaller separate changes:

ReflectHelpers refactor
2. AggregateFn
ScalerFn
4. UdfProvider
5. createFunction support
6. BeamJavaUdfCalcRule
7. JavaUdfLoader

apilloud · 2020-11-11T19:49:43Z

...ql/zetasql/src/main/java/org/apache/beam/sdk/extensions/sql/zetasql/BeamJavaUdfCalcRule.java

+
+  @Override
+  public boolean matches(RelOptRuleCall x) {
+    return ZetaSQLQueryPlanner.hasUdfInProjects(x);


This needs to reject non-udf operations in the projects, otherwise you might end up running ZetaSQL operators through BeamCalcRel which currently has no coverage in compliance testing and will introduce data corruption bugs. (If you want a more expansive set of operators we need to setup the compliance tests to cover them first.)

apilloud · 2020-11-11T19:55:23Z

...nsions/sql/zetasql/src/main/java/org/apache/beam/sdk/extensions/sql/zetasql/SqlAnalyzer.java

@@ -116,6 +144,51 @@ static boolean isEndOfInput(ParseResumeLocation parseResumeLocation) {
    return tables.build();
  }

+  /** Returns the fully qualified name of the function defined in {@code statement}. */
+  static String getFunctionQualifiedName(ResolvedCreateFunctionStmt createFunctionStmt) {


nit: This is unused?

apilloud · 2020-11-11T20:09:44Z

...nsions/sql/zetasql/src/main/java/org/apache/beam/sdk/extensions/sql/zetasql/SqlAnalyzer.java

+      case "PY":
+      case "PYTHON":
+      case "JS":
+      case "JAVASCRIPT":


nit: How does this list compare with BigQuery? Do we need to reserve these before we implement them?

apilloud · 2020-11-11T20:10:14Z

...nsions/sql/zetasql/src/main/java/org/apache/beam/sdk/extensions/sql/zetasql/SqlAnalyzer.java

+        return createFunctionStmt.getIsAggregate()
+            ? USER_DEFINED_JAVA_AGGREGATE_FUNCTIONS
+            : USER_DEFINED_JAVA_SCALAR_FUNCTIONS;
+      case "SQL":


This field being unset is SQL in BigQuery. Should we match?

apilloud · 2020-11-11T20:15:46Z

...sql/zetasql/src/main/java/org/apache/beam/sdk/extensions/sql/zetasql/ZetaSQLPlannerImpl.java

-                SqlAnalyzer.USER_DEFINED_FUNCTIONS,
-                String.join(".", createFunctionStmt.getNamePath()));
-        udfBuilder.put(functionFullName, createFunctionStmt);
+        if (SqlAnalyzer.getFunctionGroup(createFunctionStmt)


nit: Could this be a switch statement instead?

apilloud · 2020-11-11T20:19:13Z

...main/java/org/apache/beam/sdk/extensions/sql/zetasql/translation/AggregateScanConverter.java

-          "Does not support ZetaSQL aggregate function: "
-              + aggregateFunctionCall.getFunction().getName());
+    SqlAggFunction sqlAggFunction;
+    if (aggregateFunctionCall


nit: USER_DEFINED_JAVA_AGGREGATE_FUNCTIONS.equals(...)

apilloud · 2020-11-11T20:20:26Z

...ql/zetasql/src/main/java/org/apache/beam/sdk/extensions/sql/zetasql/ZetaSQLQueryPlanner.java

+              SqlUserDefinedFunction udf = (SqlUserDefinedFunction) call.op;
+              if (udf.function instanceof ZetaSqlScalarFunctionImpl) {
+                ZetaSqlScalarFunctionImpl scalarFunction = (ZetaSqlScalarFunctionImpl) udf.function;
+                return scalarFunction.functionGroup.equals(


Nit: put the constant first to reduce risk of NPE. USER_DEFINED_JAVA_SCALAR_FUNCTIONS.equals(...)

apilloud · 2020-11-11T20:21:23Z

...main/java/org/apache/beam/sdk/extensions/sql/zetasql/translation/AggregateScanConverter.java

@@ -232,4 +265,8 @@ private AggregateCall convertAggCall(
    return AggregateCall.create(
        sqlAggFunction, false, false, false, argList, -1, RelCollations.EMPTY, returnType, aggName);
  }
+
+  private static RelDataTypeFactory createTypeFactory() {


nit: Nothing uses this?

apilloud · 2020-11-11T20:22:23Z

...rc/main/java/org/apache/beam/sdk/extensions/sql/zetasql/translation/ExpressionConverter.java

@@ -644,7 +649,7 @@ private RexNode convertResolvedFunctionCall(
          throw new UnsupportedOperationException(
              "Unsupported function: " + funName + ". Only support TUMBLE, HOP, and SESSION now.");
      }
-    } else if (funGroup.equals("ZetaSQL")) {
+    } else if (funGroup.equals(ZETASQL_FUNCTION_GROUP_NAME)) {


nit: ZETASQL_FUNCTION_GROUP_NAME.equals(funGroup) (There are a bunch more of this pattern, I'm going to stop commenting on them.)

Thanks, forgot about that. I fixed this part in #13309.

ibzib · 2020-11-11T20:57:48Z

This PR is to large to effectively review, you've already received approval from others but if you want a complete review I would suggest breaking it up into at least the following smaller separate changes:

I will try to spin off parts of this PR, starting with #13304.

ibzib · 2020-11-11T22:53:33Z

I am closing this PR. It will be much easier to review and merge each logical piece separately. @apilloud I will address your comments as they come up in the new PRs.

ibzib · 2020-11-11T21:12:52Z

.../src/main/java/org/apache/beam/sdk/extensions/sql/zetasql/translation/impl/ScalarFnImpl.java

+import org.apache.beam.sdk.util.common.ReflectHelpers;
+
+/** Implementation logic for {@link ScalarFn}. */
+public class ScalarFnImpl {


Self comment: this is unrelated to ScalarFunctionImpl and so should be renamed to avoid confusion.

amaliujia self-requested a review October 27, 2020 04:38

kennknowles reviewed Oct 27, 2020

View reviewed changes

kennknowles self-requested a review October 27, 2020 18:13

amaliujia reviewed Oct 27, 2020

View reviewed changes

[BEAM-10925] Implement Java UDF in ZetaSQL.

21caa65

Adds jar loading to apache#12398. UDFs are specified by providing a jar to ZetaSQL options; support for SqlTransform::registerUd(a)?f will be added in a later change.

ibzib force-pushed the BEAM-10925 branch from 5d55641 to 72f8764 Compare October 27, 2020 23:46

Address review comments.

715d4d5

ibzib force-pushed the BEAM-10925 branch from 72f8764 to 715d4d5 Compare October 27, 2020 23:50

ibzib added 5 commits October 28, 2020 13:45

Remove unneeded import.

f9256d5

Mark UdfProvider interface as @experimental.

775968e

Differentiate built-in and user-defined ZetaSQL functions by adding f…

34638d0

…unGroup field to ScalarFunctionImpl. Built-in functions should go through BeamZetaSqlCalcRel, not BeamCalcRel.

fix spot bugs

bbc17d3

fix checkstyle error

5a32bd2

amaliujia reviewed Oct 29, 2020

View reviewed changes

...nsions/sql/zetasql/src/main/java/org/apache/beam/sdk/extensions/sql/zetasql/SqlAnalyzer.java Show resolved Hide resolved

amaliujia reviewed Oct 29, 2020

View reviewed changes

Add tests for null arguments and type checking.

70c122e

amaliujia approved these changes Oct 29, 2020

View reviewed changes

kennknowles requested changes Oct 30, 2020

View reviewed changes

ibzib added 2 commits October 31, 2020 16:09

Move general-usage helper methods from DoFnSignatures to ReflectHelpers.

44d2d67

Address review comments.

017ecb4

Use a custom ScalarFn class instead of using Method directly. Create ZetaSqlScalarFunctionImpl to hold function group.

ibzib commented Nov 2, 2020

View reviewed changes

ibzib added 4 commits November 2, 2020 13:15

Create AggregateFn to wrap CombineFn.

fe0fc95

Defer ScalarFn method lookup and add tests.

b86647f

adjust TODOs

2c9494d

Remove BeamZetaSqlCalcRuleBase.

5ebd18d

amaliujia reviewed Nov 3, 2020

View reviewed changes

Fix jar download from GCS.

0499c8f

FileSystems.copy only works when the source and destination have the same filesystem.

amaliujia requested a review from kennknowles November 7, 2020 05:20

kennknowles requested a review from apilloud November 11, 2020 18:51

kennknowles approved these changes Nov 11, 2020

View reviewed changes

apilloud reviewed Nov 11, 2020

View reviewed changes

ibzib closed this Nov 11, 2020

ibzib commented Nov 11, 2020

View reviewed changes

[BEAM-10925] Implement Java UDF in ZetaSQL. #13200

[BEAM-10925] Implement Java UDF in ZetaSQL. #13200

Conversation

ibzib commented Oct 26, 2020

Post-Commit Tests Status (on master branch)

Pre-Commit Tests Status (on master branch)

GitHub Actions Tests Status (on master branch)

amaliujia commented Oct 27, 2020

amaliujia commented Oct 27, 2020

amaliujia commented Oct 27, 2020

amaliujia commented Oct 27, 2020

amaliujia commented Oct 27, 2020

Choose a reason for hiding this comment

amaliujia Oct 27, 2020 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kennknowles Oct 27, 2020 • edited

Choose a reason for hiding this comment

amaliujia Oct 28, 2020 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

amaliujia left a comment

Choose a reason for hiding this comment

amaliujia commented Oct 27, 2020

amaliujia commented Oct 27, 2020

ibzib commented Oct 27, 2020

amaliujia commented Oct 28, 2020

ibzib commented Oct 28, 2020

amaliujia commented Oct 28, 2020

amaliujia commented Oct 28, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

amaliujia Oct 29, 2020 • edited

Choose a reason for hiding this comment

amaliujia left a comment • edited

Choose a reason for hiding this comment

amaliujia commented Oct 30, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

amaliujia Nov 2, 2020 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

amaliujia Nov 2, 2020 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

amaliujia Oct 30, 2020 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ibzib left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ibzib commented Nov 4, 2020

kennknowles left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

amaliujia Nov 11, 2020 • edited

Choose a reason for hiding this comment

amaliujia Oct 27, 2020 •

edited

kennknowles Oct 27, 2020 •

edited

amaliujia Oct 28, 2020 •

edited

amaliujia Oct 29, 2020 •

edited

amaliujia left a comment •

edited

amaliujia Nov 2, 2020 •

edited

amaliujia Nov 2, 2020 •

edited

amaliujia Oct 30, 2020 •

edited

amaliujia Nov 11, 2020 •

edited