[FLINK-13192][hive] Add tests for different Hive table formats #9264

lirui-apache · 2019-07-29T13:09:05Z

What is the purpose of the change

To add test for different table storage formats and fix issue with HiveTableOutputFormat.

Brief change log

Make sure to use a common base class of the SerDe to support Hive 2.3.4 and 1.2.1.
Include some hadoop dependencies in test since hive runner needs to run MR job
Add tests for different table formats.

Verifying this change

Added new test cases.

Does this pull request potentially affect one of the following parts:

Dependencies (does it add or upgrade a dependency): yes
The public API, i.e., is any changed class annotated with @Public(Evolving): no
The serializers: no
The runtime per-record code paths (performance sensitive): no
Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Yarn/Mesos, ZooKeeper: no
The S3 file system connector: no

Documentation

Does this pull request introduce a new feature? no
If yes, how is the feature documented? NA

lirui-apache · 2019-07-29T13:11:34Z

cc @KurtYoung @xuefuz @zjuwangg

flinkbot · 2019-07-29T13:12:15Z

Thanks a lot for your contribution to the Apache Flink project. I'm the @flinkbot. I help the community
to review your pull request. We will use this comment to track the progress of the review.

Automated Checks

Last check on commit e1dd937 (Tue Aug 06 16:01:34 UTC 2019)

Warnings:

1 pom.xml files were touched: Check for build and licensing issues.
No documentation files were touched! Remember to keep the Flink docs up to date!

_{Mention the bot in a comment to re-run the automated checks.}

Review Progress

❓ 1. The [description] looks good.
❓ 2. There is [consensus] that the contribution should go into to Flink.
❓ 3. Needs [attention] from.
❓ 4. The change fits into the overall [architecture].
❓ 5. Overall code [quality] is good.

Please see the Pull Request Review Guide for a full explanation of the review process.

The Bot is tracking the review progress through labels. Labels are applied according to the order of the review items. For consensus, approval by a Flink committer of PMC member is required

Bot commands

The @flinkbot bot supports the following commands:

@flinkbot approve description to approve one or more aspects (aspects: description, consensus, architecture and quality)
@flinkbot approve all to approve all aspects
@flinkbot approve-until architecture to approve everything until architecture
@flinkbot attention @username1 [@username2 ..] to require somebody's attention
@flinkbot disapprove architecture to remove an approval you gave earlier

flinkbot · 2019-07-29T13:22:19Z

CI report:

152412f : FAILURE Build
44d0b0c : FAILURE Build
eedd6e9 : FAILURE Build
bac68b3 : FAILURE Build
7deff33 : SUCCESS Build
ca8af79 : FAILURE Build
a20e191 : SUCCESS Build
e1dd937 : SUCCESS Build

xuefuz

Thanks for the contribution. The PR looks good. I just have a couple of minor comments.

xuefuz · 2019-07-29T17:52:55Z

...nnector-hive/src/main/java/org/apache/flink/batch/connectors/hive/HiveTableOutputFormat.java

@@ -256,10 +258,12 @@ public void configure(Configuration parameters) {
 	public void open(int taskNumber, int numTasks) throws IOException {
 		try {
 			StorageDescriptor sd = hiveTablePartition.getStorageDescriptor();
-			serializer = (AbstractSerDe) Class.forName(sd.getSerdeInfo().getSerializationLib()).newInstance();
+			serializer = (Serializer) Class.forName(sd.getSerdeInfo().getSerializationLib()).newInstance();
+			Preconditions.checkArgument(serializer instanceof Deserializer,


Not sure if I understand this. Interfaces Serializer and Deserializer are independent. While a serde class may implement both, it seem weird to name a variable "serializer" and later cast it to Deserializer type.

The problem is SerDeUtils.initializeSerDe requires a Deserializer. So we have to do the cast if we want to reuse this util method. Since most, if not all, SerDe lib implement both Serializer and Deserializer, I suppose this cast is OK?

Casting is fine, but can we name the variable differently?

Any suggestions about the name? Like serDe?

...tor-hive/src/test/java/org/apache/flink/batch/connectors/hive/TableEnvHiveConnectorTest.java

lirui-apache · 2019-07-30T06:34:27Z

Updated to add test for CSV table. Also found that Hive table schema can be get from either metastore or SerDe. For CSV tables, we should get schema from SerDe, but currently HiveCatalog doesn't support it. Hence the changes to HiveCatalog.
@xuefuz please have another look. Thanks.

...tors/flink-connector-hive/src/main/java/org/apache/flink/table/catalog/hive/HiveCatalog.java

lirui-apache · 2019-08-02T11:39:33Z

Latest travis build succeeded. @xuefuz do you have any further comments?

xuefuz · 2019-08-02T17:00:42Z

...ink-connector-hive/src/main/java/org/apache/flink/connectors/hive/HiveTableOutputFormat.java

@@ -124,7 +124,7 @@
 	private transient int numNonPartitionColumns;

 	// SerDe in Hive-1.2.1 and Hive-2.3.4 can be of different classes, make sure to use a common base class
-	private transient Serializer serializer;
+	private transient Serializer recordSerDe;


Maybe the type here should be just "Object".

It has to be a serializer because we need it to serialize records. Besides, using Object means we have to use reflection to call the serialize method. And if we do this for each record, it might hurt performance.

...ink-connector-hive/src/main/java/org/apache/flink/connectors/hive/HiveTableOutputFormat.java

xuefuz

PR looks good. I just have a couple of minor comments.

lirui-apache · 2019-08-03T06:02:04Z

@xuefuz thanks for the review. PR updated to address your comments.

xuefuz

LGTM

lirui-apache · 2019-08-05T03:15:03Z

@KurtYoung could you help review and merge this PR? Thanks.

KurtYoung · 2019-08-05T07:29:43Z

sure, I will take a look soon.

This closes apache#9264

KurtYoung

LGTM, merging this.

This closes apache#9264

rmetzger added the review=description? label Jul 29, 2019

rmetzger added the component=Connectors/Hive label Jul 29, 2019

xuefuz reviewed Jul 29, 2019

View reviewed changes

xuefuz reviewed Jul 30, 2019

View reviewed changes

...tors/flink-connector-hive/src/main/java/org/apache/flink/table/catalog/hive/HiveCatalog.java Show resolved Hide resolved

lirui-apache force-pushed the FLINK-13192 branch from eedd6e9 to bac68b3 Compare July 31, 2019 08:08

zjuwangg approved these changes Jul 31, 2019

View reviewed changes

lirui-apache force-pushed the FLINK-13192 branch 3 times, most recently from ca8af79 to a20e191 Compare August 2, 2019 02:55

xuefuz reviewed Aug 2, 2019

View reviewed changes

...ink-connector-hive/src/main/java/org/apache/flink/connectors/hive/HiveTableOutputFormat.java Outdated Show resolved Hide resolved

xuefuz reviewed Aug 2, 2019

View reviewed changes

lirui-apache added 5 commits August 3, 2019 13:51

[FLINK-13192][hive] Add tests for different Hive table formats

cfef04d

add test for csv table

dd50f77

missed update to test

f780b9e

rename

b2a2147

address comment

e1dd937

lirui-apache force-pushed the FLINK-13192 branch from a20e191 to e1dd937 Compare August 3, 2019 05:59

xuefuz approved these changes Aug 3, 2019

View reviewed changes

KurtYoung pushed a commit to KurtYoung/flink that referenced this pull request Aug 6, 2019

[FLINK-13192][hive] Add tests for different Hive table formats

6c25805

This closes apache#9264

KurtYoung approved these changes Aug 6, 2019

View reviewed changes

KurtYoung closed this in 24078de Aug 6, 2019

lirui-apache deleted the FLINK-13192 branch August 6, 2019 03:41

becketqin pushed a commit to becketqin/flink that referenced this pull request Aug 17, 2019

[FLINK-13192][hive] Add tests for different Hive table formats

742c2c0

This closes apache#9264

becketqin pushed a commit to becketqin/flink that referenced this pull request Aug 19, 2019

[FLINK-13192][hive] Add tests for different Hive table formats

e8f45f0

This closes apache#9264

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FLINK-13192][hive] Add tests for different Hive table formats #9264

[FLINK-13192][hive] Add tests for different Hive table formats #9264

lirui-apache commented Jul 29, 2019

lirui-apache commented Jul 29, 2019

flinkbot commented Jul 29, 2019 •

edited

flinkbot commented Jul 29, 2019 •

edited

xuefuz left a comment

xuefuz Jul 29, 2019

lirui-apache Jul 30, 2019

xuefuz Jul 30, 2019

lirui-apache Jul 31, 2019

xuefuz Aug 2, 2019

lirui-apache commented Jul 30, 2019

lirui-apache commented Aug 2, 2019

xuefuz Aug 2, 2019

lirui-apache Aug 3, 2019

xuefuz left a comment

lirui-apache commented Aug 3, 2019

xuefuz left a comment

lirui-apache commented Aug 5, 2019

KurtYoung commented Aug 5, 2019

KurtYoung left a comment

[FLINK-13192][hive] Add tests for different Hive table formats #9264

[FLINK-13192][hive] Add tests for different Hive table formats #9264

Conversation

lirui-apache commented Jul 29, 2019

What is the purpose of the change

Brief change log

Verifying this change

Does this pull request potentially affect one of the following parts:

Documentation

lirui-apache commented Jul 29, 2019

flinkbot commented Jul 29, 2019 • edited

Automated Checks

Review Progress

flinkbot commented Jul 29, 2019 • edited

CI report:

xuefuz left a comment

Choose a reason for hiding this comment

xuefuz Jul 29, 2019

Choose a reason for hiding this comment

lirui-apache Jul 30, 2019

Choose a reason for hiding this comment

xuefuz Jul 30, 2019

Choose a reason for hiding this comment

lirui-apache Jul 31, 2019

Choose a reason for hiding this comment

xuefuz Aug 2, 2019

Choose a reason for hiding this comment

lirui-apache commented Jul 30, 2019

lirui-apache commented Aug 2, 2019

xuefuz Aug 2, 2019

Choose a reason for hiding this comment

lirui-apache Aug 3, 2019

Choose a reason for hiding this comment

xuefuz left a comment

Choose a reason for hiding this comment

lirui-apache commented Aug 3, 2019

xuefuz left a comment

Choose a reason for hiding this comment

lirui-apache commented Aug 5, 2019

KurtYoung commented Aug 5, 2019

KurtYoung left a comment

Choose a reason for hiding this comment

flinkbot commented Jul 29, 2019 •

edited

flinkbot commented Jul 29, 2019 •

edited