-
Notifications
You must be signed in to change notification settings - Fork 4.5k
[BEAM-7024] Calcite BINARY to Beam Schema BYTES missing in CalciteUtils #8242
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BEAM-7024] Calcite BINARY to Beam Schema BYTES missing in CalciteUtils #8242
Conversation
| .put(SqlTypeName.DECIMAL, DECIMAL) | ||
| .put(SqlTypeName.BOOLEAN, BOOLEAN) | ||
| .put(SqlTypeName.VARBINARY, VARBINARY) | ||
| .put(SqlTypeName.BINARY, VARBINARY) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Both BINARY and VARBINARY should be mapped to VARBINARY, which is FieldType.BYTES.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BINARY might generate from select b'test_string'. Calcite treats it as a fixed length byte array.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK. It is fine and expected that the mapping might not be invertible.
We want users to be able to feed schema PCollection in to SQL, but also compile SQL down to schema. But they don't have to be an exact match. In fact, I think Beam schema should probably reduce to only very simple types and the rest should be logical types that SQL defines. TODO later.
You will need to replace VARBINARY with BYTES on the right hand side, no?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree on the logical types and leave it for future work for now.
It's a legacy code that defines public static final FieldType VARBINARY = FieldType.BYTES;. I am reusing this style.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah I see. I don't like it but I accept it :-)
|
retest this please |
|
Run Java PreCommit |
1 similar comment
|
Run Java PreCommit |
|
Ping |
| .put(SqlTypeName.DECIMAL, DECIMAL) | ||
| .put(SqlTypeName.BOOLEAN, BOOLEAN) | ||
| .put(SqlTypeName.VARBINARY, VARBINARY) | ||
| .put(SqlTypeName.BINARY, VARBINARY) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah I see. I don't like it but I accept it :-)
Currently the mapping between FieldType to Calcite SqlTypeName is 1 to 1.
However, there is a special case where Calcite has both BINARY and VARBINARY, which should both be saved to bytes in Beam schema.
Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:
R: @username).[BEAM-XXX] Fixes bug in ApproximateQuantiles, where you replaceBEAM-XXXwith the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue.Post-Commit Tests Status (on master branch)
Pre-Commit Tests Status (on master branch)
See .test-infra/jenkins/README for trigger phrase, status and link of all Jenkins jobs.