New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BEAM-8307] NPE in Calcite dialect when input PCollection has logical… #11581
Conversation
0116e3a
to
67c0621
Compare
retest this please |
67c0621
to
da13a84
Compare
I'm wondering if this is a scalable solution. In general SQL is supposed to be able to handle unknown logical types, and simply treat them as the base type. If you'r seeing a NPE, maybe we need to fix that. Where do you see the NPE? |
@reuvenlax NPE is thrown as Calcite's RelDataType cannot be found for the JdbcIO Logical Type. |
I agree that this is not a scalable solution. Providing a Calcite RelDataType Mapping for every Logical Type defined(which is the solution presented in this PR) by every IO is not scalable. Another approach to solving the problem is: I was also thinking if we can use the IDENTIFIER of the logical type to determine the corresponding Calcite RelDataType. But, as the IDENTIFIER type is String and not an enum, it cannot be used. For example, all the logical types defined by JdbcIO use java.sql.JDBCType name as the IDENTIFIER. Please correct me if my understanding is incorrect. |
Where do you see the NPE? |
JdbcIO.Read -> SqlTransform.query(SELECT COUNT(*) FROM PCOLLECTION /Any query/ ) throws NPE if the input PCollection to SqlTransform has JdbcIO specific Logical Types(defined in org.apache.beam.sdk.io.jdbc.LogicalTypes) in its Schema. Please find the Source Table Schema and the attached exception stack trace in the JIRA ticket: BEAM-8307. |
I will make the necessary changes as suggested by @TheNeuralBit in https://lists.apache.org/thread.html/r281e2913379c9733f6ac5baa08f361cc4ebe880a9880b2d54d6095b0%40%3Cdev.beam.apache.org%3E |
retest this please |
da13a84
to
8f5c293
Compare
I found a bug while implementing this feature and raised PR #11609 to fix the bug as I thought that the bug fix could be cherry-picked in 2.21.0 release. |
b9d9353
to
7098248
Compare
retest this please |
The failing test |
Run SQL PostCommit |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @rahul8383! this is looking really good. I just have an ask around testing the logical types in SchemaCoder/RowCoder.
cc: @apilloud @robinyqiu for SQL type changes
sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/logicaltypes/LogicalDecimal.java
Outdated
Show resolved
Hide resolved
} else { | ||
return Arrays.copyOf(base, byteArraySize); | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI @reuvenlax this PR is removing the ability to convert smaller byte arrays. As noted in #11609 (comment) it seems this logic is inaccessible anyway.
Run SQL PostCommit |
@apilloud @amaliujia @robinyqiu To close this bug, some functional tests need to be added which uses the standard logical types defined in |
Beam SQL should be rejecting unknown logical types. We expect users to put their data into supported logical types before passing it in. This conversion is up to the user to implement and will likely be lossy. It would be significant work to make it do something different. |
The types that @rahul8383 is making into standard logical types here are well known SQL types. I don't think Beam SQL should consider them unknown. |
… type in schema, from JdbcIO Transform
7098248
to
384c7fe
Compare
384c7fe
to
87e4ebc
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for letting this languish @rahul8383 I got busy with the 2.22.0 release and neglected it :(
I have a few more comments now
|
||
/** A base class for LogicalTypes that use the same input type as the underlying base type. */ | ||
@Experimental(Experimental.Kind.SCHEMAS) | ||
public abstract class IdenticalBaseTAndInputTLogicalType<T> implements Schema.LogicalType<T, T> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you make this package-private?
.addField("variableBytes", Schema.FieldType.logicalType(VariableLengthBytes.of(100))) | ||
.addField("fixedString", Schema.FieldType.logicalType(FixedLengthString.of(10))) | ||
.addField("variableString", Schema.FieldType.logicalType(VariableLengthString.of(100))) | ||
.addField("customDecimal", Schema.FieldType.logicalType(LogicalDecimal.of(10, 5))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you add tests that use some or all of these new types in a SQL query? Right now I think this is the only test that exercises the new code in CalciteUtils, and it's not actually verifying the types will work in a SQL statement.
checkArgument(base == null || base.length() == length); | ||
return base; | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you add constants for these new logical types in SqlTypes?
cc: @robinyqiu
|
||
/** A base class for LogicalTypes that use the same input type as the underlying base type. */ | ||
@Experimental(Experimental.Kind.SCHEMAS) | ||
public abstract class IdenticalBaseTAndInputTLogicalType<T> implements Schema.LogicalType<T, T> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you make this package-private? I think its only used in schemas.logicaltypes
right now
|
||
/** A LogicalType representing a Decimal type with custom precision and scale. */ | ||
@Experimental(Experimental.Kind.SCHEMAS) | ||
public class LogicalDecimal extends IdenticalBaseTAndInputTLogicalType<BigDecimal> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we just call this Decimal
?
This pull request has been marked as stale due to 60 days of inactivity. It will be closed in 1 week if no further activity occurs. If you think that’s incorrect or this pull request requires a review, please simply write any comment. If closed, you can revive the PR at any time and @mention a reviewer or discuss it on the dev@beam.apache.org list. Thank you for your contributions. |
This pull request has been closed due to lack of activity. If you think that is incorrect, or the pull request requires review, you can revive the PR at any time. |
… type in schema, from JdbcIO Transform
When SqlTransform.query() PTransform is used with JdbcIO, where JdbcIO has logical types, an NPE is thrown in BeamSql when converting Beam Schema to Calcite RelDataType.
This PR adds standard logical types with URNs in
schemas.logicaltypes
package and adds Calcite's RelDataType Mapping for these logical types.JdbcIO
is modified to use these standard logical types to convert JDBC Schema to Beam Schema.Standard logical types for DATE, TIME, TIME_WITH_TIMEZONE, TIMESTAMP_WITH_TIMEZONE are yet to be handled.
Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:
R: @username
).[BEAM-XXX] Fixes bug in ApproximateQuantiles
, where you replaceBEAM-XXX
with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue.CHANGES.md
with noteworthy changes.See the Contributor Guide for more tips on how to make review process smoother.
Post-Commit Tests Status (on master branch)
Pre-Commit Tests Status (on master branch)
See .test-infra/jenkins/README for trigger phrase, status and link of all Jenkins jobs.