Skip to content

Conversation

@amaliujia
Copy link
Contributor

Currently the mapping between FieldType to Calcite SqlTypeName is 1 to 1.

However, there is a special case where Calcite has both BINARY and VARBINARY, which should both be saved to bytes in Beam schema.


Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:

  • Choose reviewer(s) and mention them in a comment (R: @username).
  • Format the pull request title like [BEAM-XXX] Fixes bug in ApproximateQuantiles, where you replace BEAM-XXX with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue.
  • If this contribution is large, please file an Apache Individual Contributor License Agreement.

Post-Commit Tests Status (on master branch)

Lang SDK Apex Dataflow Flink Gearpump Samza Spark
Go Build Status --- --- --- --- --- ---
Java Build Status Build Status Build Status Build Status
Build Status
Build Status
Build Status Build Status Build Status
Python Build Status
Build Status
--- Build Status
Build Status
Build Status --- --- ---

Pre-Commit Tests Status (on master branch)

--- Java Python Go Website
Non-portable Build Status Build Status Build Status Build Status
Portable --- Build Status --- ---

See .test-infra/jenkins/README for trigger phrase, status and link of all Jenkins jobs.

.put(SqlTypeName.DECIMAL, DECIMAL)
.put(SqlTypeName.BOOLEAN, BOOLEAN)
.put(SqlTypeName.VARBINARY, VARBINARY)
.put(SqlTypeName.BINARY, VARBINARY)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both BINARY and VARBINARY should be mapped to VARBINARY, which is FieldType.BYTES.

Copy link
Contributor Author

@amaliujia amaliujia Apr 5, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BINARY might generate from select b'test_string'. Calcite treats it as a fixed length byte array.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK. It is fine and expected that the mapping might not be invertible.

We want users to be able to feed schema PCollection in to SQL, but also compile SQL down to schema. But they don't have to be an exact match. In fact, I think Beam schema should probably reduce to only very simple types and the rest should be logical types that SQL defines. TODO later.

You will need to replace VARBINARY with BYTES on the right hand side, no?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree on the logical types and leave it for future work for now.

It's a legacy code that defines public static final FieldType VARBINARY = FieldType.BYTES;. I am reusing this style.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah I see. I don't like it but I accept it :-)

@amaliujia
Copy link
Contributor Author

@amaliujia
Copy link
Contributor Author

retest this please

@amaliujia
Copy link
Contributor Author

Run Java PreCommit

1 similar comment
@amaliujia
Copy link
Contributor Author

Run Java PreCommit

@amaliujia
Copy link
Contributor Author

Ping

.put(SqlTypeName.DECIMAL, DECIMAL)
.put(SqlTypeName.BOOLEAN, BOOLEAN)
.put(SqlTypeName.VARBINARY, VARBINARY)
.put(SqlTypeName.BINARY, VARBINARY)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah I see. I don't like it but I accept it :-)

@kennknowles kennknowles merged commit 3c74212 into apache:master Apr 10, 2019
@amaliujia amaliujia deleted the rw-calcite_binary_to_schema_bytes branch April 10, 2019 17:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants