Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FLINK-5998] Un-fat Hadoop from Flink fat jar #3604

Closed
wants to merge 2 commits into from

Conversation

haohui
Copy link

@haohui haohui commented Mar 23, 2017

This PR implements FLINK-5998.

It marks all Hadoop dependency in the dist jar as provided so that users can plug in the jars from their own Hadoop distribution.

@rmetzger
Copy link
Contributor

I have tested this change locally, but it does not exactly what my intention was.

What I want is that the lib/ folder contains a flink-dist.jar with all the Flink runtime dependencies, and a flink-dist-hadoop.jar with the packaged Hadoop dependencies.
This way, users have a good out of the box experience, but they can use their own Hadoop jars easily.

@greghogan
Copy link
Contributor

@rmetzger, how would users know which Hadoop dependencies to include in flink-dist-hadoop.jar? Would they be copying multiple component jars into lib/?

@rmetzger
Copy link
Contributor

Yes, the idea is that users in some environments can even delete flink-dist-hadoop.jar entirely, and just configure the classpath correctly to point to the Hadoop lib folder of their hadoop distribution.

@greghogan
Copy link
Contributor

@rmetzger thanks for the clarification. Sounds good!

@haohui
Copy link
Author

haohui commented Mar 28, 2017

@rmetzger -- thanks for the clarification. Do you think that it is sufficient to mark the dependency of flink-shaded-hadoop2 as provided and produces a dedicated jar for it?

@rmetzger
Copy link
Contributor

Mh, not sure.
What I would try is updating the assembly descriptor(s) in the flink-dist project, so that it creates two assemblies.
Afaik you need to define all dependencies as regular deps in flink-dist, and then include subsets of them into different assemblies.

@haohui
Copy link
Author

haohui commented Apr 12, 2017

@rmetzger ping...would you mind taking a look?

@StephanEwen
Copy link
Contributor

@haohui I think this goes into the right direction, but the flink-shaded-hadoop2-1.3.jar should be in /lib, not in opt. That way, we preserve the original behavior (Hadoop dependencies are available by default), but make it easy to swap/remove the Hadoop dependency.

@rmetzger
Copy link
Contributor

rmetzger commented May 2, 2017

Sorry for the delay. I'm now checking out the change....

@rmetzger
Copy link
Contributor

rmetzger commented May 4, 2017

Success!

I got it to work on my CDH 5.4.2 virtual machine with the following environment variables set

export HADOOP_CLASSPATH=/usr/lib/hadoop-yarn/*:/usr/lib/hadoop/*:/usr/lib/hadoop-yarn/lib/*:/usr/lib/hadoop/lib/*:/etc/hadoop/conf:/usr/lib/hadoop-hdfs/*

and it was using CDH's Hadoop version:

2017-05-04 10:49:58,243 INFO  org.apache.flink.yarn.YarnApplicationMasterRunner             -  Hadoop version: 2.6.0-cdh5.4.2

I'll merge this change to master.

@asfgit asfgit closed this in 43fa507 May 5, 2017
fanyon pushed a commit to fanyon/flink that referenced this pull request May 15, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants