New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BEAM-9953] [ZetaSQL] Implement CREATE FUNCTION and scalar UDF. #12153
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks Kyle! Did the first pass.
I am seeing you are passing a UDF map through functions in ExpressionConverter, which seems a big API change.
So can you explain a bit: in your mind what will be the approach to implement Java UDF, and will this API change help there? Basically I am worried if we are making some changes only for pure SQL UDF meanwhile there could be a shared approach for both SQL and Java UDF.
...nsions/sql/zetasql/src/main/java/org/apache/beam/sdk/extensions/sql/zetasql/SqlAnalyzer.java
Show resolved
Hide resolved
...nsions/sql/zetasql/src/main/java/org/apache/beam/sdk/extensions/sql/zetasql/SqlAnalyzer.java
Show resolved
Hide resolved
"Failed to define function %s", String.join(".", createFunctionStmt.getNamePath())), | ||
e); | ||
} | ||
return resolvedStatement; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
line 159 and line 161 are duplicates and they can be put at the end of this function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
} | ||
|
||
@Test | ||
@Ignore("Qualified paths can't be resolved due to a bug in ZetaSQL.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any link to what this bug is (or log a JIRA to describe what has failed?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I filed a public bug for this: google/zetasql#42
statement = analyzer.analyzeNextStatement(parseResumeLocation, options, catalog); | ||
if (statement.nodeKind() == RESOLVED_CREATE_FUNCTION_STMT) { | ||
ResolvedCreateFunctionStmt createFunctionStmt = (ResolvedCreateFunctionStmt) statement; | ||
// ResolvedCreateFunctionStmt does not include the full function name, so build it here. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you clarify what is "full function name"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ResolvedCreateFunctionStmt contains the path as a list of strings, while we need the whole path as a single string.
String.join(".", createFunctionStmt.getNamePath())); | ||
udfBuilder.put(functionFullName, createFunctionStmt); | ||
} else if (statement.nodeKind() == RESOLVED_QUERY_STMT) { | ||
if (!SqlAnalyzer.isEndOfInput(parseResumeLocation)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can tell line 185 and this line combined together to verify:
- only one SELECT in a statement list.
- that SELECT statement should be in the end of list.
But from readability perspective, neither one explicitly tests there are more than one SELECT in a list. I am afraid that for people who don't have context to read code here, they could not get the one single SELECT constraint (although it is implied implicitly).
My suggestion is you only validate cannot contain more than one SELECT statement
here and leave Statement list must end in a SELECT statement
to line 185.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is really checking "No additional statements are allowed after a SELECT statement."
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
operands.add( | ||
convertRexNodeFromResolvedExpr(expr, columnList, fieldList, outerFunctionArguments)); | ||
} | ||
} else if (funGroup.equals(USER_DEFINED_FUNCTIONS)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you think for Java UDF, will this code path help?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am pretty sure we will need it, unless we can somehow avoid ExpressionConverter for Java UDF?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. For Java UDF (without a nested call), it might not go through the process to convert every of its arguments. But for nested call cases, especially with builtin functions, it could go through this process.
We can keep current implementation in this PR now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This PR LGTM overall.
Regarding if Java UDF can use the implementation in this PR, probably we will know when we start to implement Java UDF.
ZetaSQLQueryPlanner zetaSQLQueryPlanner = new ZetaSQLQueryPlanner(config); | ||
thrown.expect(UnsupportedOperationException.class); | ||
thrown.expectMessage( | ||
"Statement list must end in a SELECT statement, and cannot contain more than one SELECT statement."); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You will need to update this error message.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oops, fixed
…he#12153) * [BEAM-9953] [ZetaSQL] Implement CREATE FUNCTION and scalar UDF.
Scalar UDFs are implemented by simply replacing each function invocation with its corresponding RexNode.
R: @amaliujia
Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:
R: @username
).[BEAM-XXX] Fixes bug in ApproximateQuantiles
, where you replaceBEAM-XXX
with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue.CHANGES.md
with noteworthy changes.See the Contributor Guide for more tips on how to make review process smoother.
Post-Commit Tests Status (on master branch)
Pre-Commit Tests Status (on master branch)
See .test-infra/jenkins/README for trigger phrase, status and link of all Jenkins jobs.