Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[PIP-71][SQL]Pulsar SQL migrate SchemaHandle to presto decoder #8422

Merged
merged 26 commits into from Feb 1, 2021

Conversation

hnail
Copy link
Contributor

@hnail hnail commented Nov 2, 2020

Fixes #4747
Fixes #7652

Motivation

PIP-71: https://github.com/apache/pulsar/wiki/PIP-71:-Pulsar-SQL-migrate-SchemaHandle-to-presto-decoder

Pip-Doc : [PIP-71][SQL]Migrate SchemaHandle to Presto-decoder

In current version , pulsar-presto deserialize fields rely on SchemaHandler , but this causes the following restrictions :

  • Metadata: current nested field is dissociate with presto ParameterizedType , It treated nested field as a separated field , so presto compiler can’t understand the type hierarchy . nested field should be Row type in presto (e.g. Hive struct type support) . In the same way,array \ map type also shoud associate with presto ParameterizedTypes.
  • Decoder : SchemaHandler is hard to work with RecordCursor.getObject() to support ROW,MAP,ARRAY .etc

The motivations of this pull request :

  • PulsarMetadata take advantage of ParameterizedType to describe row/array/map Type instead of resolve nested columns in pulsar-presto connecter.
  • Customize RowDecoder | RowDecoderFactory | ColumnDecoder to work with pulsar interface, and with some our own extensions compare to presto original version , we can support more type for backward compatible (e.g.
    TIMESTAMP\DATE\TIME\Real\ARRAY\MAP\ROW support).
  • Decouple avro or schema type with pulsar-presto main module (RecordSet,ConnectorMetadata .etc ), aim to friendly with other schema type ( ProtobufNative 、thrift etc..).

Modifications

Describe in PIP-71: Pulsar SQL migrate SchemaHandle to presto decoder


Does this pull request potentially affect one of the following parts:

If yes was chosen, please highlight the changes

  • Dependencies (does it add or upgrade a dependency): (yes )
  • The public API: (no)
  • The schema: ( no)
  • The default values of configurations: (no)
  • The wire protocol: (no)
  • The rest endpoints: (no)
  • The admin cli options: (no)
  • Anything that affects deployment: (no)

Documentation

@hnail hnail changed the title [SQL][WIP]migrate SchemaHandle to Presto-decoder [SQL]migrate SchemaHandle to Presto-decoder Nov 3, 2020
@hnail hnail changed the title [SQL]migrate SchemaHandle to Presto-decoder [PIP][SQL]Migrate SchemaHandle to Presto-decoder Nov 9, 2020
@hnail
Copy link
Contributor Author

hnail commented Nov 10, 2020

/pulsarbot run-failure-checks

@sijie sijie added this to the 2.7.0 milestone Nov 11, 2020
@sijie sijie added the area/sql Pulsar SQL related features label Nov 11, 2020
@hnail hnail changed the title [PIP][SQL]Migrate SchemaHandle to Presto-decoder [PIP-71][SQL]Pulsar SQL migrate SchemaHandle to presto decoder Nov 11, 2020
@codelipenghui
Copy link
Contributor

@gaoran10 Could you please help review this PR?

@codelipenghui
Copy link
Contributor

move to 2.8.0 first.

@codelipenghui codelipenghui modified the milestones: 2.7.0, 2.8.0 Nov 17, 2020
@jiazhai
Copy link
Member

jiazhai commented Nov 22, 2020

@gaoran10 to help review this PR

@sijie
Copy link
Member

sijie commented Nov 30, 2020

@hnail Can you rebase this pull request to the latest master?

@hnail
Copy link
Contributor Author

hnail commented Dec 23, 2020

@hnail Great work. Could you rebase the branch with apache/master first?

@gaoran10 Thanks a lot for your review and help with this PR, i fixed as your suggest. can you help re-review it ?

@Anonymitaet
Copy link
Member

Confirmed w/ @congbobo184, no need to update doc for this PR.

@hnail
Copy link
Contributor Author

hnail commented Jan 19, 2021

/pulsarbot run-failure-checks

3 similar comments
@gaoran10
Copy link
Contributor

/pulsarbot run-failure-checks

@jiazhai
Copy link
Member

jiazhai commented Jan 28, 2021

/pulsarbot run-failure-checks

@jiazhai
Copy link
Member

jiazhai commented Jan 29, 2021

/pulsarbot run-failure-checks

@hnail
Copy link
Contributor Author

hnail commented Jan 29, 2021

/pulsarbot run-failure-checks

@gaoran10
Copy link
Contributor

/pulsarbot run-failure-checks

@codelipenghui
Copy link
Contributor

@hnail @gaoran10 Could you please help take a look the failed test?

Error:  Failed to execute goal org.apache.maven.plugins:maven-compiler-plugin:3.8.1:testCompile (default-testCompile) on project pulsar-presto-connector-original: Compilation failure
Error:  /Users/runner/work/pulsar/pulsar/pulsar-sql/presto-pulsar/src/test/java/org/apache/pulsar/sql/presto/TestPulsarRecordCursor.java:[352,40] method asyncReadEntries in class org.apache.bookkeeper.mledger.impl.ManagedCursorImpl cannot be applied to given types;
Error:    required: int,org.apache.bookkeeper.mledger.AsyncCallbacks.ReadEntriesCallback,java.lang.Object,org.apache.bookkeeper.mledger.impl.PositionImpl
Error:    found: int,java.lang.Object,java.lang.Object
Error:    reason: actual and formal argument lists differ in length

The interesting thing this why other tests are passed. Might be related to the other tests only run the install for profile core-modules ?

@hnail
Copy link
Contributor Author

hnail commented Jan 31, 2021

@hnail @gaoran10 Could you please help take a look the failed test?

Error:  Failed to execute goal org.apache.maven.plugins:maven-compiler-plugin:3.8.1:testCompile (default-testCompile) on project pulsar-presto-connector-original: Compilation failure
Error:  /Users/runner/work/pulsar/pulsar/pulsar-sql/presto-pulsar/src/test/java/org/apache/pulsar/sql/presto/TestPulsarRecordCursor.java:[352,40] method asyncReadEntries in class org.apache.bookkeeper.mledger.impl.ManagedCursorImpl cannot be applied to given types;
Error:    required: int,org.apache.bookkeeper.mledger.AsyncCallbacks.ReadEntriesCallback,java.lang.Object,org.apache.bookkeeper.mledger.impl.PositionImpl
Error:    found: int,java.lang.Object,java.lang.Object
Error:    reason: actual and formal argument lists differ in length

The interesting thing this why other tests are passed. Might be related to the other tests only run the install for profile core-modules ?

@codelipenghui fixed. cc @gaoran10 @jiazhai

@jiazhai jiazhai merged commit a1107ad into apache:master Feb 1, 2021
@jiazhai
Copy link
Member

jiazhai commented Feb 1, 2021

@hnail Thanks a lot for this great feature.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/sql Pulsar SQL related features
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Struct property is not selected in Presto [pulsar-sql] Support arrays and maps
6 participants