New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FLINK-13192][hive] Add tests for different Hive table formats #9264
Conversation
Thanks a lot for your contribution to the Apache Flink project. I'm the @flinkbot. I help the community Automated ChecksLast check on commit e1dd937 (Tue Aug 06 16:01:34 UTC 2019) Warnings:
Mention the bot in a comment to re-run the automated checks. Review Progress
Please see the Pull Request Review Guide for a full explanation of the review process. The Bot is tracking the review progress through labels. Labels are applied according to the order of the review items. For consensus, approval by a Flink committer of PMC member is required Bot commandsThe @flinkbot bot supports the following commands:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the contribution. The PR looks good. I just have a couple of minor comments.
@@ -256,10 +258,12 @@ public void configure(Configuration parameters) { | |||
public void open(int taskNumber, int numTasks) throws IOException { | |||
try { | |||
StorageDescriptor sd = hiveTablePartition.getStorageDescriptor(); | |||
serializer = (AbstractSerDe) Class.forName(sd.getSerdeInfo().getSerializationLib()).newInstance(); | |||
serializer = (Serializer) Class.forName(sd.getSerdeInfo().getSerializationLib()).newInstance(); | |||
Preconditions.checkArgument(serializer instanceof Deserializer, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure if I understand this. Interfaces Serializer and Deserializer are independent. While a serde class may implement both, it seem weird to name a variable "serializer" and later cast it to Deserializer type.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The problem is SerDeUtils.initializeSerDe
requires a Deserializer
. So we have to do the cast if we want to reuse this util method. Since most, if not all, SerDe lib implement both Serializer and Deserializer, I suppose this cast is OK?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Casting is fine, but can we name the variable differently?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any suggestions about the name? Like serDe
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah! :)
...tor-hive/src/test/java/org/apache/flink/batch/connectors/hive/TableEnvHiveConnectorTest.java
Outdated
Show resolved
Hide resolved
Updated to add test for CSV table. Also found that Hive table schema can be get from either metastore or SerDe. For CSV tables, we should get schema from SerDe, but currently HiveCatalog doesn't support it. Hence the changes to HiveCatalog. |
...tors/flink-connector-hive/src/main/java/org/apache/flink/table/catalog/hive/HiveCatalog.java
Show resolved
Hide resolved
eedd6e9
to
bac68b3
Compare
ca8af79
to
a20e191
Compare
Latest travis build succeeded. @xuefuz do you have any further comments? |
@@ -124,7 +124,7 @@ | |||
private transient int numNonPartitionColumns; | |||
|
|||
// SerDe in Hive-1.2.1 and Hive-2.3.4 can be of different classes, make sure to use a common base class | |||
private transient Serializer serializer; | |||
private transient Serializer recordSerDe; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe the type here should be just "Object".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It has to be a serializer because we need it to serialize records. Besides, using Object means we have to use reflection to call the serialize
method. And if we do this for each record, it might hurt performance.
...ink-connector-hive/src/main/java/org/apache/flink/connectors/hive/HiveTableOutputFormat.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PR looks good. I just have a couple of minor comments.
a20e191
to
e1dd937
Compare
@xuefuz thanks for the review. PR updated to address your comments. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
@KurtYoung could you help review and merge this PR? Thanks. |
sure, I will take a look soon. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, merging this.
What is the purpose of the change
To add test for different table storage formats and fix issue with
HiveTableOutputFormat
.Brief change log
Verifying this change
Added new test cases.
Does this pull request potentially affect one of the following parts:
@Public(Evolving)
: noDocumentation