Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BEAM-10925] Implement Java UDF in ZetaSQL. #13200

Closed
wants to merge 15 commits into from
Closed

Conversation

ibzib
Copy link
Contributor

@ibzib ibzib commented Oct 26, 2020

Adds jar loading to #12398.

UDFs are specified by providing a jar to ZetaSQL options; support for
SqlTransform::registerUd(a)?f will be added in a later change.

R: @amaliujia


Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:

  • Choose reviewer(s) and mention them in a comment (R: @username).
  • Format the pull request title like [BEAM-XXX] Fixes bug in ApproximateQuantiles, where you replace BEAM-XXX with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue.
  • Update CHANGES.md with noteworthy changes.
  • If this contribution is large, please file an Apache Individual Contributor License Agreement.

See the Contributor Guide for more tips on how to make review process smoother.

Post-Commit Tests Status (on master branch)

Lang SDK Dataflow Flink Samza Spark Twister2
Go Build Status --- Build Status --- Build Status ---
Java Build Status Build Status
Build Status
Build Status
Build Status
Build Status
Build Status
Build Status Build Status
Build Status
Build Status
Build Status
Python Build Status
Build Status
Build Status
Build Status
Build Status
Build Status
Build Status
Build Status
--- Build Status ---
XLang Build Status --- Build Status --- Build Status ---

Pre-Commit Tests Status (on master branch)

--- Java Python Go Website Whitespace Typescript
Non-portable Build Status Build Status
Build Status
Build Status
Build Status
Build Status Build Status Build Status Build Status
Portable --- Build Status --- --- --- ---

See .test-infra/jenkins/README for trigger phrase, status and link of all Jenkins jobs.

GitHub Actions Tests Status (on master branch)

Build python source distribution and wheels
Python tests
Java tests

See CI.md for more information about GitHub Actions CI.

@amaliujia
Copy link
Contributor

Thank you! Will take a look soon!

@amaliujia
Copy link
Contributor

Run SQL PreCommit

@amaliujia
Copy link
Contributor

Run Java PreCommit

@amaliujia
Copy link
Contributor

Run SQL_Java11 PreCommit

@amaliujia amaliujia self-requested a review October 27, 2020 04:38
@amaliujia
Copy link
Contributor

The failed SQL tests seems true. I will review the code and after the overall idea looks good, we can dig into to see why some tests have failed.

*/
public interface UdfProvider {
/** Maps function names to scalar function implementations. */
default Map<String, Method> userDefinedScalarFunctions() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Commented on the design doc: it is really preferable not to use Method here, but a specialized interface that is more specific. Method has a bunch of things that are not relevant and also if we want to add UDF-specific methods we can't.

Copy link
Contributor

@amaliujia amaliujia Oct 27, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can see why you think Method have too many things that might not relevant. Ideally we only need users to provide function signature (including function name) , which are enough to 1) offer required information to SQL parser 2) locate the right implementation in the user jar.

The reason we still use Method, is because that current UDF implementation asks for a Method: 1 2 3. So as we will need to construct a Method eventually, it doesn't seem a bad choice to expose it in the interface.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using our own interface would give us more flexibility with the implementation, but if our interface was solely a means of looking up a Method, I don't see much of a difference. If we want to add UDF-specific methods, there's no guarantee we could do that in a backward-compatible way even if we used a specialized interface.

(Mostly unrelated fun fact: Intellij can autofill this.getClass.getMethod calls, so it's more convenient to use than I initially thought.)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed with what @ibzib has said.

My another thought is, having a Method in our interface actually is a good thing for longer term evolvement. Once the interface is adopted, it will become harder to deprecate and upgrade this interface. Thus having Method provides more flexibility to allow us enhance UDF implementation without changing the interface. The information (e.g. annotations) that might be not relevant now could become relevant later.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't agree. If it is a simple function, a method can always be used as in SomeClass::methodName and this is also easier for following control flow (including for IDEs). Whenever you can use explicit method references instead of reflection, you should.

The converse is not true. If a function is provided, it cannot automatically be converted to Method. The class Method is specifically a rich representation of an aspect of a Class in the JVM. It is not a good class for use when you want some sort of function. It is not a good class for anything except when you specifically want to manipulate the JVM's representation of a class.

If we expand the interface, we can maintain backwards-compatibility with default method implementation. If we need to expand it in a way where this is not possible we will need to create a new interface. If we use Method it is not possible to expand the interface anyhow.

I really think if you look at the pro/con of using Method vs a smaller interface there is not a single thing in the "pro" column.

Copy link
Contributor

@amaliujia amaliujia Oct 28, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kennknowles Maybe we should have an idea of what smaller interface you are thinking of (in case we have been thinking different things)? Can you give a concrete example, say for a scalar function, what is the implementation and what is returned from the interface?Then we can compare and see pros/cons and how to expand the interface?

We can go back to design doc to discuss this if you prefer to. Right now I cannot find the original comment you mentioned so starting a new comment is ok.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Per our offline chat, we can continue discussing what a good interface should be and this PR should not be blocked. We will mark this interface as experimental and refine when necessary.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Most important: offline I agreed that we do not need to block on the representation of a UDF as a Method. But I would still like to see a test case demonstrating what it is like for a user to register a UDF using a lambda / anonymous function.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I created a custom interface ScalarFn, which uses annotation @ApplyMethod to locate the method to be applied. It adds a little more boilerplate (+2 SLOC / function), but now it's extensible.

But I would still like to see a test case demonstrating what it is like for a user to register a UDF using a lambda / anonymous function.

One of the drawbacks of the annotation approach is that it is not compatible with lambdas. But we needed the flexibility; lambdas require a known/fixed number of arguments.

@kennknowles kennknowles self-requested a review October 27, 2020 18:13
Copy link
Contributor

@amaliujia amaliujia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Overall approach looks good. Have left some comments.

@amaliujia
Copy link
Contributor

Those failed SQL precommits are about compilation error by Janino. For example:

java.lang.UnsupportedOperationException: Could not compile CalcFn: {
  c.output(org.apache.beam.sdk.values.Row.withSchema(outputSchema).attachValues(java.util.Arrays.asList(org.apache.beam.sdk.extensions.sql.zetasql.translation.impl.TimestampFunctions.timestamp(1230219000000L))));
}

It is not obvious to me why this PR has failed such tests. Will take another look to see whether I can find the root cause.

@amaliujia
Copy link
Contributor

@ibzib as this PR is lagging behind the master for 200+ commits. Maybe we could try a rebase then check failing precommit tests?

Adds jar loading to apache#12398.

UDFs are specified by providing a jar to ZetaSQL options; support for
SqlTransform::registerUd(a)?f will be added in a later change.
@ibzib
Copy link
Contributor Author

ibzib commented Oct 27, 2020

@ibzib as this PR is lagging behind the master for 200+ commits. Maybe we could try a rebase then check failing precommit tests?

@amaliujia I rebased, but those tests are still failing.

@amaliujia
Copy link
Contributor

@ibzib now I can see the failed tests better.

So if you check the error messages, for those failed test, they have been hitting BeamCalcRel. However, for such ZetaSQL function tests, they should hit BeamZetaSqlRel. So can you run one of the failed tests locally and observe the behavior of visiting BeamJavaUdfRule and BeamZetaSqlRule, and see why the BeamJavaUdfRule is matched thus BeamCalcRel is used?

@ibzib
Copy link
Contributor Author

ibzib commented Oct 28, 2020

@amaliujia Rules are fixed, PTAL

@amaliujia
Copy link
Contributor

Overall LGTM. @kennknowles do you have other comments except for the interface discussion?

@amaliujia
Copy link
Contributor

Run Java PreCommit

thrown.expect(IllegalArgumentException.class);
thrown.expectMessage("Option 'path' has type TYPE_INT64 (expected TYPE_STRING).");
zetaSQLQueryPlanner.convertToBeamRel(sql);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a case to verify NULL handling? E.g. pass a null value to increment? It is ok if it does not work, but it will be good to see expected behavior and then document it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added some tests. I discovered a gap in type checking, since the current implementation does not strictly enforce that the SQL function signature must match the Java signature. I will fix that.

Copy link
Contributor

@amaliujia amaliujia Oct 29, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That makes sense. Thanks!

I will also help add checks after this PR is in. For example, check the mixed usage of built-in functions and UDF.

Copy link
Contributor

@amaliujia amaliujia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Thanks for adding the NULL test. I have also tested this PR by code import and didn't see a regression.

@amaliujia
Copy link
Contributor

Run Java PreCommit

@@ -91,14 +93,15 @@ private ScalarFunctionImpl(Method method, CallImplementor implementor) {
*
* @param clazz class that is used to implement the function
* @param methodName Method name (typically "eval")
* @param funGroup Optional function group identifier to differentiate implementations.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure this clarifies what a "function group" is. What is it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's an arbitrary ZetaSQL field: https://github.com/google/zetasql/blob/1e03a558e2c26ad8f6a47fcb388ac4265d92f0dd/java/com/google/zetasql/Function.java#L61

This probably won't be needed if we use a custom interface instead of Method, since then we will need a new and separate implementation.

*/
public interface UdfProvider {
/** Maps function names to scalar function implementations. */
default Map<String, Method> userDefinedScalarFunctions() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Most important: offline I agreed that we do not need to block on the representation of a UDF as a Method. But I would still like to see a test case demonstrating what it is like for a user to register a UDF using a lambda / anonymous function.

}

/** Maps function names to aggregate function implementations. */
default Map<String, Combine.CombineFn<?, ?, ?>> userDefinedAggregateFunctions() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We did not discuss much, but also CombineFn is tightly coupled to Beam and we may find that it does not have all the metadata needed to be a UDAF. It would be good to have a variety of test cases around this. But it is @Experimental so we do not need to finish the whole thing now.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is already used in registerUdaf, hasn't that worked so far?

public SqlTransform registerUdaf(String functionName, Combine.CombineFn combineFn) {

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

More context:

using CombineFn is to address one of the users request that they don't control the source code, but they want to use that, and those existing libraries are coupled with Beam already by CombineFn interface.

}

/**
* Creates {@link org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.schema.Function} from
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Noting that Calcite has a separate Function so they agree that defining your own is useful.


/**
* Defines Java UDFs that are tested in {@link
* org.apache.beam.sdk.extensions.sql.zetasql.ZetaSqlJavaUdfTest}.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could have tests in this module no problem. It would simplify the builds.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I put the UdfProvider in a separate module to prevent it automatically being included on the context classloader for the tests. Otherwise we can get a bunch of false positives because nothing actually needs to be loaded.

test {
dependsOn emptyJar
// Pass jars used by Java UDF tests via system properties.
evaluationDependsOnChildren() // Needed to resolve jarPath for udf-test-provider subproject.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I want to suggest: don't. Just put the tests in that module.

import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.core.Calc;
import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.logical.LogicalCalc;

public class BeamJavaUdfCalcRule extends BeamZetaSqlCalcRuleBase {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Javadoc please

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@amaliujia can you write a Javadoc for this? (after addressing Kenn's other comment on this file)

Copy link
Contributor

@amaliujia amaliujia Nov 2, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure I can handle this. Can you add a TODO so I can add javadoc in another PR? I will need to write PRs to polish code.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIRC committers can commit directly to someone else's PR, so you can change it in this PR if you want.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually this was more straightforward than I thought. So I did it myself. PTAL

super(clazz, in, out, description);
}

protected boolean hasUdfInProjects(RelOptRuleCall x) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does this need to use inheritance? It seems like a straightforward function (aka should be a static function somewhere and you don't need a "base" class just to hold the function. FWIW it is a red flag to see a common ancestor class with no explanation other than "Base".

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@amaliujia wrote this part so I'll let him comment.

Copy link
Contributor

@amaliujia amaliujia Nov 2, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can change this to a static function now.

The idea was to use inheritance to build a bunch of rules to split Calc thus we can separate Java, Python and built-in function execution. As currently current UDF support in this PR is limited, it is not clear why inheritance.

When there is a need for inheritance in the future after we start to support mixed built-in function and Java UDF, we can add it back.

getFunctionGroup(createFunctionStmt), String.join(".", createFunctionStmt.getNamePath()));
}

static String getFunctionGroup(ResolvedCreateFunctionStmt createFunctionStmt) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like "function group" is a concept that will be with us for a while. Where is the best place to explain it so everyone will discover it?


public static final SqlOperator DATE_OP =
createUdfOperator("DATE", BeamBuiltinMethods.DATE_METHOD);
createBuiltinFunctionOperator("DATE", BeamBuiltinMethods.DATE_METHOD);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes like this raise flags. If you haven't watched how to grow a language I highly recommend it. TL;DR built-in and user-defined functions should have little to no difference if it can be avoided. (for zetasql I think the distinction is "do we execute in Java or do we execute via gRPC to C++"

Copy link
Contributor

@amaliujia amaliujia Oct 30, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My understanding is createUdfOperator was to create functions execute via gRPC to C++. The naming was confusing and this is a change to show the truth of how "DATE" is executed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I only added createBuiltinFunctionOperator to so we could have a default function group. I could remove it and always require a function group.

This is another thing that is liable to change if we steer away from Method.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed createBuiltinFunctionOperator and required all function definitions to specify a function group.

Use a custom ScalarFn class instead of using Method directly.

Create ZetaSqlScalarFunctionImpl to hold function group.
Copy link
Contributor Author

@ibzib ibzib left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed the interface and moved around some implementation code for better encapsulation, PTAL. (more changes to come)

*/
public interface UdfProvider {
/** Maps function names to scalar function implementations. */
default Map<String, Method> userDefinedScalarFunctions() {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I created a custom interface ScalarFn, which uses annotation @ApplyMethod to locate the method to be applied. It adds a little more boilerplate (+2 SLOC / function), but now it's extensible.

But I would still like to see a test case demonstrating what it is like for a user to register a UDF using a lambda / anonymous function.

One of the drawbacks of the annotation approach is that it is not compatible with lambdas. But we needed the flexibility; lambdas require a known/fixed number of arguments.

import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.core.Calc;
import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.logical.LogicalCalc;

public class BeamJavaUdfCalcRule extends BeamZetaSqlCalcRuleBase {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@amaliujia can you write a Javadoc for this? (after addressing Kenn's other comment on this file)

super(clazz, in, out, description);
}

protected boolean hasUdfInProjects(RelOptRuleCall x) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@amaliujia wrote this part so I'll let him comment.

@@ -2197,66 +2197,11 @@ private static void validateTimerFamilyField(
return DoFnSignature.GetSizeMethod.create(m, windowT, methodContext.getExtraParameters());
}

private static Collection<Method> declaredMethodsWithAnnotation(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall idea looks good to me. I will leave @kennknowles to comment this part as @kennknowles knows more about Java SDK.

FileSystems.copy only works when the source and destination have the same filesystem.
@ibzib
Copy link
Contributor Author

ibzib commented Nov 4, 2020

@kennknowles This is ready for another review now.

Copy link
Member

@kennknowles kennknowles left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have some ideas but I think I'll step back from the Java details. It is too big for me to really dig in and we shouldn't hold up in PR. It does seem that the user-interface class (ScalarFn, with annotation-driven method extraction) is what is stored in the registered UDFs, whereas I might prefer that it is converted to a model-level class that could be provided in many different ways (either via annotations, or lambdas, or by directly implementing APIs, etc).


/**
* A scalar function that can be executed as part of a SQL query. Subclasses must contain exactly
* one method annotated with {@link ApplyMethod}, which will be applied to the SQL function
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess you can't use lambdas after all?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lambdas are not compatible with the annotation-based design.

String sql =
String.format(
"CREATE FUNCTION matches(str STRING, regStr STRING) RETURNS BOOLEAN LANGUAGE java OPTIONS (path='%s'); "
+ "SELECT matches(\"a\", \"a\"), 'apple'='beta'",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So it does work with a blend of UDFs and non-UDF expressions? The = is executed by Calcite SQL expression evaluator? (the PR is big enough that I am skimming things and looking at tests)

Copy link
Contributor

@amaliujia amaliujia Nov 11, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes your are right. This is an expected behavior as this is a big initial PR to achieve "execute Java UDF but might still have places to improve". As a follow up work I will check the mixed usage of UDF and non-UDF case and reject it.

Copy link
Member

@apilloud apilloud left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I only see one change that needs to be addressed prior to merging this: BeamJavaUdfCalcRule filtering non-udf operations.

This PR is to large to effectively review, you've already received approval from others but if you want a complete review I would suggest breaking it up into at least the following smaller separate changes:

  1. ReflectHelpers refactor
    2. AggregateFn
  2. ScalerFn
    4. UdfProvider
    5. createFunction support
    6. BeamJavaUdfCalcRule
    7. JavaUdfLoader


@Override
public boolean matches(RelOptRuleCall x) {
return ZetaSQLQueryPlanner.hasUdfInProjects(x);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs to reject non-udf operations in the projects, otherwise you might end up running ZetaSQL operators through BeamCalcRel which currently has no coverage in compliance testing and will introduce data corruption bugs. (If you want a more expansive set of operators we need to setup the compliance tests to cover them first.)

@@ -116,6 +144,51 @@ static boolean isEndOfInput(ParseResumeLocation parseResumeLocation) {
return tables.build();
}

/** Returns the fully qualified name of the function defined in {@code statement}. */
static String getFunctionQualifiedName(ResolvedCreateFunctionStmt createFunctionStmt) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: This is unused?

case "PY":
case "PYTHON":
case "JS":
case "JAVASCRIPT":
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: How does this list compare with BigQuery? Do we need to reserve these before we implement them?

return createFunctionStmt.getIsAggregate()
? USER_DEFINED_JAVA_AGGREGATE_FUNCTIONS
: USER_DEFINED_JAVA_SCALAR_FUNCTIONS;
case "SQL":
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This field being unset is SQL in BigQuery. Should we match?

SqlAnalyzer.USER_DEFINED_FUNCTIONS,
String.join(".", createFunctionStmt.getNamePath()));
udfBuilder.put(functionFullName, createFunctionStmt);
if (SqlAnalyzer.getFunctionGroup(createFunctionStmt)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Could this be a switch statement instead?

"Does not support ZetaSQL aggregate function: "
+ aggregateFunctionCall.getFunction().getName());
SqlAggFunction sqlAggFunction;
if (aggregateFunctionCall
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: USER_DEFINED_JAVA_AGGREGATE_FUNCTIONS.equals(...)

SqlUserDefinedFunction udf = (SqlUserDefinedFunction) call.op;
if (udf.function instanceof ZetaSqlScalarFunctionImpl) {
ZetaSqlScalarFunctionImpl scalarFunction = (ZetaSqlScalarFunctionImpl) udf.function;
return scalarFunction.functionGroup.equals(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: put the constant first to reduce risk of NPE. USER_DEFINED_JAVA_SCALAR_FUNCTIONS.equals(...)

@@ -232,4 +265,8 @@ private AggregateCall convertAggCall(
return AggregateCall.create(
sqlAggFunction, false, false, false, argList, -1, RelCollations.EMPTY, returnType, aggName);
}

private static RelDataTypeFactory createTypeFactory() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Nothing uses this?

@@ -644,7 +649,7 @@ private RexNode convertResolvedFunctionCall(
throw new UnsupportedOperationException(
"Unsupported function: " + funName + ". Only support TUMBLE, HOP, and SESSION now.");
}
} else if (funGroup.equals("ZetaSQL")) {
} else if (funGroup.equals(ZETASQL_FUNCTION_GROUP_NAME)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: ZETASQL_FUNCTION_GROUP_NAME.equals(funGroup) (There are a bunch more of this pattern, I'm going to stop commenting on them.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, forgot about that. I fixed this part in #13309.

@ibzib
Copy link
Contributor Author

ibzib commented Nov 11, 2020

This PR is to large to effectively review, you've already received approval from others but if you want a complete review I would suggest breaking it up into at least the following smaller separate changes:

I will try to spin off parts of this PR, starting with #13304.

@ibzib
Copy link
Contributor Author

ibzib commented Nov 11, 2020

I am closing this PR. It will be much easier to review and merge each logical piece separately. @apilloud I will address your comments as they come up in the new PRs.

@ibzib ibzib closed this Nov 11, 2020
import org.apache.beam.sdk.util.common.ReflectHelpers;

/** Implementation logic for {@link ScalarFn}. */
public class ScalarFnImpl {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Self comment: this is unrelated to ScalarFunctionImpl and so should be renamed to avoid confusion.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants