[FLINK-13437][test] Add Hive SQL E2E test #10709

zjuwangg · 2019-12-27T16:38:09Z

What is the purpose of the change

Set up a docker-based yarn-cluster and hive service using the new java based test runtime framework, add HiveConnectorITCase to cover data read/write function, including:

hive data writen by Hive, read by Flink.
hive data writen by Flink, read by Hive.
read/write to a non-partition table.
multi-format for read and write, cover textfile/orc/parquet
Based on this PR, we can add more test such as function/view in further more.

Brief change log

3488ec6 Add e2e test for hive data connector using docker based environment
2f8b127 refactor hive e2e test using new java-based test framework
76c6f08 add muliti format test and all data types test case
31cc4b7 remote e2e bash test

Verifying this change

*Added integration tests for end-to-end deployment *

Does this pull request potentially affect one of the following parts:

Dependencies (does it add or upgrade a dependency): (no)
The public API, i.e., is any changed class annotated with @Public(Evolving): (no)
The serializers: (no)
The runtime per-record code paths (performance sensitive): (no)
Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Yarn/Mesos, ZooKeeper: (no)
The S3 file system connector: (no)

Documentation

Does this pull request introduce a new feature? (no)
If yes, how is the feature documented? (not applicable )

zjuwangg · 2019-12-27T16:39:21Z

cc @bowenli86 @xuefuz @JingsongLi @KurtYoung @lirui-apache to have a review

zjuwangg · 2019-12-27T16:39:49Z

It's a base work, and we can add more ITCase based on this PR.

flinkbot · 2019-12-27T16:40:35Z

Thanks a lot for your contribution to the Apache Flink project. I'm the @flinkbot. I help the community
to review your pull request. We will use this comment to track the progress of the review.

Automated Checks

Last check on commit e580f03 (Fri Feb 28 21:48:31 UTC 2020)

Warnings:

3 pom.xml files were touched: Check for build and licensing issues.
No documentation files were touched! Remember to keep the Flink docs up to date!

_{Mention the bot in a comment to re-run the automated checks.}

Review Progress

❓ 1. The [description] looks good.
❓ 2. There is [consensus] that the contribution should go into to Flink.
❓ 3. Needs [attention] from.
❓ 4. The change fits into the overall [architecture].
❓ 5. Overall code [quality] is good.

Please see the Pull Request Review Guide for a full explanation of the review process.

The Bot is tracking the review progress through labels. Labels are applied according to the order of the review items. For consensus, approval by a Flink committer of PMC member is required

Bot commands

The @flinkbot bot supports the following commands:

@flinkbot approve description to approve one or more aspects (aspects: description, consensus, architecture and quality)
@flinkbot approve all to approve all aspects
@flinkbot approve-until architecture to approve everything until architecture
@flinkbot attention @username1 [@username2 ..] to require somebody's attention
@flinkbot disapprove architecture to remove an approval you gave earlier

flinkbot · 2019-12-27T17:18:32Z

CI report:

e580f03 Travis: SUCCESS Azure: FAILURE

Bot commands

The @flinkbot bot supports the following commands:

@flinkbot run travis re-run the last Travis build
@flinkbot run azure re-run the last Azure build

zjuwangg · 2020-01-10T03:49:05Z

@bowenli86 @JingsongLi Do you guys have time to have a basic look?

lirui-apache · 2020-01-10T09:18:47Z

...ve-test/src/main/java/org/apache/flink/tests/util/hive/YarnClusterAndHiveDockerResource.java

+
+	@Override
+	public void before() throws Exception {
+		buildDockerImage();


IIUC, building the docker image will take a while for the 1st time, and will be pretty fast for later runs, correct?

lirui-apache · 2020-01-10T09:33:08Z

...ector-hive-test/src/main/java/org/apache/flink/tests/util/hive/YarnClusterFlinkResource.java

+	 * YarnClusterJobController can be used to fetch the execute log.
+	 */
+	public static class YarnClusterJobController implements JobController {
+		private List<String> lines;


make it final

lirui-apache · 2020-01-10T09:42:55Z

...ector-hive-test/src/main/java/org/apache/flink/tests/util/hive/YarnClusterFlinkResource.java

+		localFlinkDir = temporaryFolder.newFolder().toPath();
+
+		LOG.info("Copying distribution to {}.", localFlinkDir);
+		TestUtils.copyDirectory(originalFlinkDir, localFlinkDir);


Why do we need to copy the dist dir?

lirui-apache · 2020-01-10T09:46:43Z

...ector-hive-test/src/main/java/org/apache/flink/tests/util/hive/YarnClusterFlinkResource.java

+	@Override
+	public ClusterController startCluster(int numTaskManagers) throws IOException {
+		if (!deployFlinkToRemote) {
+			yarnCluster.copyLocalFileToYarnMaster(localFlinkDir.toAbsolutePath().toString(), remoteFlinkDir);


Instead of copying the dist dir to the container, can we instead mount the dir to container when it's started, with the -v option?

lirui-apache · 2020-01-10T09:53:21Z

...d-tests/flink-connector-hive-test/src/main/resources/docker-hive-hadoop-cluster/bootstrap.sh

+    nohup sudo -E -u mapred $HADOOP_PREFIX/bin/mapred historyserver 2>> /var/log/hadoop/historyserver.err >> /var/log/hadoop/historyserver.out &
+
+    hdfs dfsadmin -safemode wait
+    while [ $? -ne 0 ]; do hdfs dfsadmin -safemode wait; done


Why do we want to retry if the command fails? I think it's a potential infinite loop if something goes wrong.

lirui-apache · 2020-01-10T09:56:33Z

...d-tests/flink-connector-hive-test/src/main/resources/docker-hive-hadoop-cluster/bootstrap.sh

+    hdfs dfsadmin -safemode wait
+    while [ $? -ne 0 ]; do hdfs dfsadmin -safemode wait; done
+
+    hdfs dfs -chown hdfs:hadoop /


I think we have disabled dfs.permissions. So why we need to run these chown?

lirui-apache · 2020-01-10T10:10:04Z

...-connector-hive-test/src/test/java/org/apache/flink/connectors/hive/HiveConnectorITCase.java

+			JobSubmission.JobSubmissionBuilder jobSubmissionBuilder = new JobSubmission.JobSubmissionBuilder(testJarPath);
+			jobSubmissionBuilder.setParallelism(1)
+					.addOption("-ys", "1")
+					.addOption("-ytm", "1000")
+					.addOption("-yjm", "1000")
+					.addOption("-c", HiveReadWriteDataExample.class.getCanonicalName())
+					.addArgument("--hiveVersion", hiveVersion)
+					.addArgument("--sourceTable", "all_types_table")
+					.addArgument("--targetTable", "dest_all_types_table");


I think the job submission code is the same for the 2 test cases. Can we reuse it?

bowenli86 · 2020-01-12T18:32:22Z

@lirui-apache @JingsongLi can you guys help review? I can help merge once it passes

JingsongLi · 2020-02-27T03:55:36Z

Thanks @zjuwangg for your great work! I will continue to working on this.

sjwiesman · 2020-06-08T14:58:41Z

What's the status of this? With the new sql filesystem connector I suspect more flink users will rely on Hive integration. It would be good to try and get this in for 1.12.

JingsongLi · 2020-06-09T02:57:10Z

What's the status of this? With the new sql filesystem connector I suspect more flink users will rely on Hive integration. It would be good to try and get this in for 1.12.

Yes, we should move on.
I plan to modify testing to sql-client but don't have enough time in 1.11, we should finish this in 1.12.

zjuwangg · 2020-07-14T04:46:10Z

Close this comment!

rmetzger added review=description? component=Connectors/Hive component=Tests labels Dec 27, 2019

lirui-apache reviewed Jan 10, 2020

View reviewed changes

zjuwangg added 6 commits February 27, 2020 09:33

Add e2e test for hive data connector using docker based environment

f8b1179

refactor hive e2e test using new java-based test framework

7cef6b9

add muliti format test and all data types test case

4a8cec8

change HiveConnectorITcase category

aac58fa

remote e2e bash test

a618cd1

ignore unlicensed file in hive e2e test resources

889c2d9

zjuwangg force-pushed the hive-e2e branch from 31cc4b7 to 889c2d9 Compare February 27, 2020 01:43

update log4j2 configuration and adapt to new interface

e580f03

zjuwangg closed this Jul 14, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FLINK-13437][test] Add Hive SQL E2E test #10709

[FLINK-13437][test] Add Hive SQL E2E test #10709

zjuwangg commented Dec 27, 2019

zjuwangg commented Dec 27, 2019

zjuwangg commented Dec 27, 2019

flinkbot commented Dec 27, 2019 •

edited

flinkbot commented Dec 27, 2019 •

edited

zjuwangg commented Jan 10, 2020

lirui-apache Jan 10, 2020

lirui-apache Jan 10, 2020

lirui-apache Jan 10, 2020

lirui-apache Jan 10, 2020

lirui-apache Jan 10, 2020

lirui-apache Jan 10, 2020

lirui-apache Jan 10, 2020

bowenli86 commented Jan 12, 2020

JingsongLi commented Feb 27, 2020

sjwiesman commented Jun 8, 2020 •

edited

JingsongLi commented Jun 9, 2020

zjuwangg commented Jul 14, 2020

[FLINK-13437][test] Add Hive SQL E2E test #10709

[FLINK-13437][test] Add Hive SQL E2E test #10709

Conversation

zjuwangg commented Dec 27, 2019

What is the purpose of the change

Brief change log

Verifying this change

Does this pull request potentially affect one of the following parts:

Documentation

zjuwangg commented Dec 27, 2019

zjuwangg commented Dec 27, 2019

flinkbot commented Dec 27, 2019 • edited

Automated Checks

Review Progress

flinkbot commented Dec 27, 2019 • edited

CI report:

zjuwangg commented Jan 10, 2020

lirui-apache Jan 10, 2020

Choose a reason for hiding this comment

lirui-apache Jan 10, 2020

Choose a reason for hiding this comment

lirui-apache Jan 10, 2020

Choose a reason for hiding this comment

lirui-apache Jan 10, 2020

Choose a reason for hiding this comment

lirui-apache Jan 10, 2020

Choose a reason for hiding this comment

lirui-apache Jan 10, 2020

Choose a reason for hiding this comment

lirui-apache Jan 10, 2020

Choose a reason for hiding this comment

bowenli86 commented Jan 12, 2020

JingsongLi commented Feb 27, 2020

sjwiesman commented Jun 8, 2020 • edited

JingsongLi commented Jun 9, 2020

zjuwangg commented Jul 14, 2020

flinkbot commented Dec 27, 2019 •

edited

flinkbot commented Dec 27, 2019 •

edited

sjwiesman commented Jun 8, 2020 •

edited