Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BEAM-9953] [ZetaSQL] Implement CREATE FUNCTION and scalar UDF. #12153

Merged
merged 3 commits into from Jul 1, 2020

Conversation

ibzib
Copy link
Contributor

@ibzib ibzib commented Jul 1, 2020

Scalar UDFs are implemented by simply replacing each function invocation with its corresponding RexNode.

R: @amaliujia


Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:

  • Choose reviewer(s) and mention them in a comment (R: @username).
  • Format the pull request title like [BEAM-XXX] Fixes bug in ApproximateQuantiles, where you replace BEAM-XXX with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue.
  • Update CHANGES.md with noteworthy changes.
  • If this contribution is large, please file an Apache Individual Contributor License Agreement.

See the Contributor Guide for more tips on how to make review process smoother.

Post-Commit Tests Status (on master branch)

Lang SDK Dataflow Flink Samza Spark Twister2
Go Build Status --- Build Status --- Build Status ---
Java Build Status Build Status
Build Status
Build Status
Build Status
Build Status
Build Status
Build Status Build Status
Build Status
Build Status
Build Status
Python Build Status
Build Status
Build Status
Build Status
Build Status
Build Status
Build Status
Build Status
Build Status
--- Build Status ---
XLang --- --- Build Status --- --- Build Status

Pre-Commit Tests Status (on master branch)

--- Java Python Go Website
Non-portable Build Status Build Status
Build Status
Build Status Build Status
Portable --- Build Status --- ---

See .test-infra/jenkins/README for trigger phrase, status and link of all Jenkins jobs.

Copy link
Contributor

@amaliujia amaliujia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Kyle! Did the first pass.

I am seeing you are passing a UDF map through functions in ExpressionConverter, which seems a big API change.

So can you explain a bit: in your mind what will be the approach to implement Java UDF, and will this API change help there? Basically I am worried if we are making some changes only for pure SQL UDF meanwhile there could be a shared approach for both SQL and Java UDF.

"Failed to define function %s", String.join(".", createFunctionStmt.getNamePath())),
e);
}
return resolvedStatement;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

line 159 and line 161 are duplicates and they can be put at the end of this function.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

}

@Test
@Ignore("Qualified paths can't be resolved due to a bug in ZetaSQL.")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any link to what this bug is (or log a JIRA to describe what has failed?)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I filed a public bug for this: google/zetasql#42

statement = analyzer.analyzeNextStatement(parseResumeLocation, options, catalog);
if (statement.nodeKind() == RESOLVED_CREATE_FUNCTION_STMT) {
ResolvedCreateFunctionStmt createFunctionStmt = (ResolvedCreateFunctionStmt) statement;
// ResolvedCreateFunctionStmt does not include the full function name, so build it here.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you clarify what is "full function name"?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ResolvedCreateFunctionStmt contains the path as a list of strings, while we need the whole path as a single string.

String.join(".", createFunctionStmt.getNamePath()));
udfBuilder.put(functionFullName, createFunctionStmt);
} else if (statement.nodeKind() == RESOLVED_QUERY_STMT) {
if (!SqlAnalyzer.isEndOfInput(parseResumeLocation)) {
Copy link
Contributor

@amaliujia amaliujia Jul 1, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can tell line 185 and this line combined together to verify:

  1. only one SELECT in a statement list.
  2. that SELECT statement should be in the end of list.

But from readability perspective, neither one explicitly tests there are more than one SELECT in a list. I am afraid that for people who don't have context to read code here, they could not get the one single SELECT constraint (although it is implied implicitly).

My suggestion is you only validate cannot contain more than one SELECT statement here and leave Statement list must end in a SELECT statement to line 185.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is really checking "No additional statements are allowed after a SELECT statement."

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

operands.add(
convertRexNodeFromResolvedExpr(expr, columnList, fieldList, outerFunctionArguments));
}
} else if (funGroup.equals(USER_DEFINED_FUNCTIONS)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think for Java UDF, will this code path help?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am pretty sure we will need it, unless we can somehow avoid ExpressionConverter for Java UDF?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. For Java UDF (without a nested call), it might not go through the process to convert every of its arguments. But for nested call cases, especially with builtin functions, it could go through this process.

We can keep current implementation in this PR now.

Copy link
Contributor

@amaliujia amaliujia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR LGTM overall.

Regarding if Java UDF can use the implementation in this PR, probably we will know when we start to implement Java UDF.

ZetaSQLQueryPlanner zetaSQLQueryPlanner = new ZetaSQLQueryPlanner(config);
thrown.expect(UnsupportedOperationException.class);
thrown.expectMessage(
"Statement list must end in a SELECT statement, and cannot contain more than one SELECT statement.");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You will need to update this error message.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops, fixed

@ibzib ibzib merged commit a5c32f3 into apache:master Jul 1, 2020
yirutang pushed a commit to yirutang/beam that referenced this pull request Jul 23, 2020
…he#12153)

* [BEAM-9953] [ZetaSQL] Implement CREATE FUNCTION and scalar UDF.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants