-
Notifications
You must be signed in to change notification settings - Fork 13.8k
[FLINK-11270][build] Do not include hadoop in flink-dist by default #7451
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
tillrohrmann
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this PR @zentol. My only comment concerns the copying of the whole flink-dist for each Yarn test class. I think instead we could also add the flink-shaded-hadoop2-uber.jar to the set of shippable files. That way we save a lot of copying operations.
| make_binary_release "" "" "$SCALA_VERSION" | ||
| else | ||
| make_binary_release "hadoop2x" "-Dhadoop.version=$HADOOP_VERSION" "$SCALA_VERSION" | ||
| make_binary_release "hadoop2x" "-Pinclude-hadoop -Dhadoop.version=$HADOOP_VERSION" "$SCALA_VERSION" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does it make a difference whether it is -Dinclude-hadoop or -Pinclude-hadoop?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no.
| relHadoopPath = dependencyJars.filter(jar -> jar.getFileName().toString().startsWith("flink-shaded-hadoop2")) | ||
| .findAny().orElseThrow(() -> new AssertionError("Unable to locate flink-shaded-hadoop2 jar.")); | ||
| } | ||
| Files.copy(relHadoopPath, flinkLibFolder.toPath().resolve("flink-shaded-hadoop2.jar")); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we strictly need to copy everything together? Wouldn't it also work to refactor the YarnTestBase a bit so that we have a method getShipFiles which returns all files in /lib plus the flink-shaded-hadoop2.jar? That way we would save a lot of copy operations. flink-dist is currently 347 MB large.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That could probably work, but you'd need to give me more hints how this could be implemented.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it is mainly adding the flink-shaded-hadoop2.jar as a shippable file via -yt in the Yarn tests. With https://issues.apache.org/jira/browse/FLINK-11272 this should be possible by adding -yt path_to_shaded_hadoop to the test cases which use the Yarn cli. If you want, I can also take over to do it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doesn't the client require hadoop as well? -yt doesn't automatically put the passed jar on the classpath of the client, does it?
a782ba9 to
543d07b
Compare
|
noooo, a new IT case was added since my last rebase... |
tillrohrmann
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for addressing my comments @zentol. The changes look really good to me. +1 for merging.
Must be merged together with #7454.
What is the purpose of the change
With this PR hadoop is not included in flink-dist by default, i.e. the
include-hadoopprofile is no longer activated by default.Brief change log
create_binary_release.shto reflect profile activation changes--yarnshipoption for hadoop jarflink-shaded-hadoop2-uberforflink-yarn-tests.Verifying this change
The release scripts have to be verified manually.
The Yarn tests check that copying the
shaded-hadoop2-uberjar is sufficient for running hadoop-dependent jobs.