New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BEAM-11747] Make BeamCalcRel safe for ZetaSQL #13898
Conversation
c88b742
to
45d6dcd
Compare
R: @ibzib |
To conclude a bit: This PR rejects unsupported types, other stuff, etc. in BeamZetaSQL UDF, while #13934 rejects mixed usage of built-in operator and UDF. So these two PRs do not conflict each other. |
I am still not sure why we should reject Int64 type for UDF. Assuming a UDF is defined as In this case, the value is from Row thus is compatible with ZetaSQL, the return value is still Java object which is also be compatible with ZetaSQL, so there isn't a corrupted case. |
That isn't entirely true. Literals are stored as a BigDecimal (which is what was disabled): Line 211 in 9bbd5bd
The generated code prints the literal to a string and looks like this: In earlier testing I was seeing overflow errors from gRPC, but that appears to actually be caused by DOUBLE literals (which are also stored as BigDecimal). I've added DECIMAL to the list of allowed types for literals and reverted the test changes. |
This is now rebased past #13912. |
...ql/zetasql/src/main/java/org/apache/beam/sdk/extensions/sql/zetasql/ZetaSQLQueryPlanner.java
Outdated
Show resolved
Hide resolved
...ql/zetasql/src/main/java/org/apache/beam/sdk/extensions/sql/zetasql/ZetaSQLQueryPlanner.java
Outdated
Show resolved
Hide resolved
...ql/zetasql/src/main/java/org/apache/beam/sdk/extensions/sql/zetasql/ZetaSQLQueryPlanner.java
Show resolved
Hide resolved
...sql/zetasql/src/test/java/org/apache/beam/sdk/extensions/sql/zetasql/ZetaSqlJavaUdfTest.java
Outdated
Show resolved
Hide resolved
Based on this generated code, for a Java UDF, it's input won't lose any precision? Is there an example that the code generation generates code that casts input which causes precision lose? If not there is probably no data corruption for Java UDF. |
1cdebfe
to
4c224cd
Compare
@amaliujia One example is
This fixes at least 9 additional data corruption bugs (mostly around timestamps) and doesn't break any tests in Beam. I can provide you the link to the internal run of this, but if you want to debate further I would suggest we roll back first. |
My suggestion is:
|
This PR is currently addressing the second point. There is currently data corruption introduced by UDFs. If you think it will take a significant time to address this I would suggest we roll back #13841 until you figure this out. |
Can we merge just the first commit from this PR, but remove |
That works for me. I moved the first commit into it's own PR: #14009. Will one of you follow up and disable BeamJavaUdfCalcRule? |
SG, thanks. |
The Workflow run is cancelling this PR. It is an earlier duplicate of 2173354 run. |
The Workflow run is cancelling this PR. It is an earlier duplicate of 2173354 run. |
Calcite does more for built-in operators than UDF. Using CAST as an example: https://github.com/apache/calcite/blob/d815dc1209c3cc0728941e9fc81fce7d9e44c79b/core/src/main/java/org/apache/calcite/adapter/enumerable/RexToLixTranslator.java#L248 . For UDF, fortunately, Calcite does less. Basically Calcite will either generate the string representation of a value (i.e. Literal) or make a Row field access (i.e. column), and then make a reflection call on the UDF implementation. So for UDF, I think we only need to verify 1) whether Calcite does anything that make the value lose precision 2) whether Calcite generates a correct string representation for value or make correct field access to Row. The verification work will be much less compared to checking the built-in operators. I agree this PR could be an upper bound for the limitation of Java UDF, what I want to see is whether we can do better. |
Ok, with another thought, I think why not we merge this PR for now? We can start from the most strict constraints, then gradually loose the constraints. |
The constraints added here do not limit BeamCalc to only UDFs, we would need significantly more constraints for that. They do ensure the only BeamCalc specific compliance failure is |
This makes the filter for BeamCalcRel a passing set of operators.
Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:
R: @username
).[BEAM-XXX] Fixes bug in ApproximateQuantiles
, where you replaceBEAM-XXX
with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue.CHANGES.md
with noteworthy changes.See the Contributor Guide for more tips on how to make review process smoother.
Post-Commit Tests Status (on master branch)
Pre-Commit Tests Status (on master branch)
See .test-infra/jenkins/README for trigger phrase, status and link of all Jenkins jobs.
GitHub Actions Tests Status (on master branch)
See CI.md for more information about GitHub Actions CI.