Skip to content

Conversation

@zentol
Copy link
Contributor

@zentol zentol commented Jan 10, 2019

Must be merged together with #7454.

What is the purpose of the change

With this PR hadoop is not included in flink-dist by default, i.e. the include-hadoop profile is no longer activated by default.

Brief change log

  • update create_binary_release.sh to reflect profile activation changes
  • remove hadoop-free WordCount E2E test, since this is now the default
  • update YARN IT cases to use --yarnship option for hadoop jar
    • These tests directly use flink-dist, similar to e2e tests. They currently rely on hadoop being bundled by before, so I had to change them to explicitly ship hadoop. This required an additional dependency on flink-shaded-hadoop2-uber for flink-yarn-tests.

Verifying this change

The release scripts have to be verified manually.

The Yarn tests check that copying the shaded-hadoop2-uber jar is sufficient for running hadoop-dependent jobs.

Copy link
Contributor

@tillrohrmann tillrohrmann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this PR @zentol. My only comment concerns the copying of the whole flink-dist for each Yarn test class. I think instead we could also add the flink-shaded-hadoop2-uber.jar to the set of shippable files. That way we save a lot of copying operations.

make_binary_release "" "" "$SCALA_VERSION"
else
make_binary_release "hadoop2x" "-Dhadoop.version=$HADOOP_VERSION" "$SCALA_VERSION"
make_binary_release "hadoop2x" "-Pinclude-hadoop -Dhadoop.version=$HADOOP_VERSION" "$SCALA_VERSION"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it make a difference whether it is -Dinclude-hadoop or -Pinclude-hadoop?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no.

relHadoopPath = dependencyJars.filter(jar -> jar.getFileName().toString().startsWith("flink-shaded-hadoop2"))
.findAny().orElseThrow(() -> new AssertionError("Unable to locate flink-shaded-hadoop2 jar."));
}
Files.copy(relHadoopPath, flinkLibFolder.toPath().resolve("flink-shaded-hadoop2.jar"));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we strictly need to copy everything together? Wouldn't it also work to refactor the YarnTestBase a bit so that we have a method getShipFiles which returns all files in /lib plus the flink-shaded-hadoop2.jar? That way we would save a lot of copy operations. flink-dist is currently 347 MB large.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That could probably work, but you'd need to give me more hints how this could be implemented.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is mainly adding the flink-shaded-hadoop2.jar as a shippable file via -yt in the Yarn tests. With https://issues.apache.org/jira/browse/FLINK-11272 this should be possible by adding -yt path_to_shaded_hadoop to the test cases which use the Yarn cli. If you want, I can also take over to do it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't the client require hadoop as well? -yt doesn't automatically put the passed jar on the classpath of the client, does it?

@zentol zentol force-pushed the 11270 branch 2 times, most recently from a782ba9 to 543d07b Compare January 24, 2019 14:43
@zentol
Copy link
Contributor Author

zentol commented Jan 24, 2019

noooo, a new IT case was added since my last rebase...

Copy link
Contributor

@tillrohrmann tillrohrmann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for addressing my comments @zentol. The changes look really good to me. +1 for merging.

@zentol zentol merged commit 81acd0a into apache:master Jan 29, 2019
@zentol zentol deleted the 11270 branch January 29, 2019 15:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants