[FLINK-2268] Allow Flink binary release without Hadoop #4636

aljoscha · 2017-09-04T07:44:29Z

This is a series of PRs that allows running a Flink without any Hadoop dependencies in the lib folder. Each PR stands on its own but all of them are necessary for the last commit to work. The commit's themselves clearly document what is changed.

R: @zentol

zentol · 2017-09-04T13:03:19Z

flink-runtime/src/main/java/org/apache/flink/runtime/util/EnvironmentInformation.java

+					"org.apache.hadoop.util.VersionInfo",
+					false,
+					EnvironmentInformation.class.getClassLoader());
+				log.info(" Hadoop version: " + VersionInfo.getVersion());


Did you intend to directly call VersionInfo, or should we maybe do this with reflection instead?

Yes, that is intended because I didn't want to fiddle with the reflection API. Ideally, I would like to do this:

try { log.info(" Hadoop version: " + VersionInfo.getVersion()); } catch (ClassNotFoundException e) { // ignore }

but java won't let you do this. With the explicit Class.forName() it will let me put the catch block.

zentol · 2017-09-04T13:05:56Z

flink-scala/src/main/scala/org/apache/flink/api/scala/codegen/TypeInformationGen.scala

@@ -320,14 +315,6 @@ private[flink] trait TypeInformationGen[C <: Context] {
    }
  }

-  def mkWritableTypeInfo[T <: Writable : c.WeakTypeTag](


What exactly does this removal mean for supporting Writable? Does hadoop-compat take care of that?

Unfortunately, this means that users now have to manually specify a TypeInformation that they can get from TypeExtractor.createHadoopWritableTypeInfo(MyWritable.class).

I'm not sure how often people are using Hadoop Writables in their Scala code but this is definitely something that will break.

Scratch that, this actually still works and I added a test for that in the Hadoop compat package.

zentol · 2017-09-04T13:07:27Z

flink-tests/pom.xml

@@ -54,6 +54,12 @@ under the License.

 		<dependency>
 			<groupId>org.apache.flink</groupId>
+			<artifactId>flink-shaded-hadoop2</artifactId>
+			<version>${project.version}</version>
+		</dependency>


add test scope?

I'll add test scope and see if everything still runs.

zentol · 2017-09-04T13:13:57Z

flink-runtime/src/main/java/org/apache/flink/runtime/security/KerberosUtils.java

@@ -52,6 +51,13 @@

 	private static final AppConfigurationEntry userKerberosAce;

+	/* Return the Kerberos login module name */
+	public static String getKrb5LoginModuleName() {


I assume this was copied from hadoop?

Yes, this had a dependency on KerberosUtil form Hadoop just for this method. Now we can have Kerberos independent of Hadoop.

zentol · 2017-09-04T13:23:09Z

flink-yarn/src/main/java/org/apache/flink/yarn/YarnTaskManagerRunner.java

 			} else {
 				sc = new SecurityUtils.SecurityConfiguration(configuration);
+


zentol · 2017-09-04T13:25:05Z

flink-runtime/src/main/java/org/apache/flink/runtime/util/HadoopUtils.java

+		// Try to load HDFS configuration from Hadoop's own configuration files
+		// 1. approach: Flink configuration
+		final String hdfsDefaultPath = flinkConfiguration.getString(ConfigConstants
+			.HDFS_DEFAULT_CONFIG, null);


this is a rather odd line-break

indeed, I'm fixing

zentol · 2017-09-04T13:26:14Z

flink-runtime/src/main/java/org/apache/flink/runtime/util/HadoopUtils.java

+		if (hdfsDefaultPath != null) {
+			retConf.addResource(new org.apache.hadoop.fs.Path(hdfsDefaultPath));
+		} else {
+			LOG.debug("Cannot find hdfs-default configuration file");


this should say that they could not be loaded from the flink configuration

You're right, I just copied these from another HadoopUtils. I'm fixing.

zentol · 2017-09-04T13:28:10Z

flink-runtime/src/main/java/org/apache/flink/runtime/util/HadoopUtils.java

+		for (String possibleHadoopConfPath : possibleHadoopConfPaths) {
+			if (possibleHadoopConfPath != null) {
+				if (new File(possibleHadoopConfPath).exists()) {
+					if (new File(possibleHadoopConfPath + "/core-site.xml").exists()) {


Let's track whether any of these succeeded and log something otherwise (mirroring the flink configuration approach).

zentol · 2017-09-04T13:28:37Z

flink-runtime/src/main/java/org/apache/flink/runtime/util/HadoopUtils.java

+
+		final String hdfsSitePath = flinkConfiguration.getString(ConfigConstants.HDFS_SITE_CONFIG, null);
+		if (hdfsSitePath != null) {
+			retConf.addResource(new org.apache.hadoop.fs.Path(hdfsSitePath));


add debug statement to mirror environment variables approach

zentol · 2017-09-04T13:29:22Z

flink-runtime/src/main/java/org/apache/flink/runtime/util/HadoopUtils.java

+				}
+			}
+		}
+		return retConf;


We should log something (WARN maybe) if we couldn't find anything.

EronWright · 2017-09-05T15:41:09Z

Overall this looks great. I looked for any signs of behavior changes, especially in the security code since there's some subtlety there, didn't notice any.

Please consider moving SecurityUtils.SecurityConfiguration to be a top-level class.

Are you sure that you can check for a class (forName) and then use it normally? I kinda thought that the classloader is more eager.

EronWright · 2017-09-05T15:48:37Z

In certain environments today, the Hadoop dependencies are actually duplicated on the classpath, i.e. in the Flink dist jar and also via the local installed Hadoop (when something like HADOOP_HOME is set). After this change, will having Hadoop on the classpath (not in the dist jar) be sufficient? I think so, just confirming.

aljoscha · 2017-09-06T09:31:34Z

Thanks for reviewing @EronWright and @zentol. I pushed some more commits that address your comments.

I did check the approach of first using Class.forName() and then using the class normally by building a Hadoop-free Flink and running a cluster and some examples. I think the class loader only loads classes if they appear in method signatures or fields, not when classes only appear in code.

@EronWright Yes, your hunch is correct and I did check this on GCE (dataproc)and AWS(EMR). This is actually quite nice because you can now build a Hadoop-free Flink and only use the Hadoop dependencies provided by your distro.

zentol · 2017-09-18T17:02:54Z

flink-runtime/src/main/java/org/apache/flink/runtime/util/HadoopUtils.java

+			}
+		}
+
+		LOG.debug("Could not find Hadoop configuration via any of the supported methods " +


Missing the check whether we actually didn't find anything.

Ok, that was stupid... Fixing.

aljoscha · 2017-09-19T10:25:19Z

I pushed a rebased version of this.

This removes all Hadoop-related methods from ExecutionEnvironment (there are already equivalent methods in flink-hadoop-compatibility (see HadoopUtils and HadoopInputs, etc.). This also removes Hadoop-specific tests from flink-tests because these are duplicated by tests in flink-hadoop-compatibility. This also removes Hadoop-specic example code from flink-examples: the DistCp example and related code.

There are methods for this in flink-hadoop-compatibility.

commons-io is only usable as a transitive dependency of the Hadoop dependencies. We can just use the Java ByteArrayOutputStream and get rid of that dependency.

This was in there because of legacy reasons but is not required by the test.

This was only used for the Enum for a specific http response type. The jets3t dependency is only available as a transitive dependency of the Hadoop dependencies, that's why we remove it.

This removes the dependency on Hadoop and ensures that we only close if Hadoop is available.

This also makes them optional in flink-runtime, which is enabled by the previous changes to only use Hadoop dependencies if they are available. This also requires adding a few explicit dependencies in other modules because they were using transitive dependencies of the Hadoop deps. The most common dependency there is, ha!, commons-io.

zentol reviewed Sep 4, 2017

View reviewed changes

zentol reviewed Sep 18, 2017

View reviewed changes

aljoscha force-pushed the hadoop-free-flink branch from 6fbba4b to 564b645 Compare September 19, 2017 10:25

aljoscha force-pushed the hadoop-free-flink branch 12 times, most recently from 914890b to 223e045 Compare September 26, 2017 13:34

aljoscha added 11 commits September 27, 2017 10:05

[FLINK-4048] Remove Hadoop GenericOptionsParser from ParameterTool

21e6d52

There are methods for this in flink-hadoop-compatibility.

[FLINK-2268] Remove Hadoop-related Akka Serializers from runtime

2807932

[FLINK-2268] Remove unused HDFS copy-utils from flink-streaming-java

492dd8e

[FLINK-2268] Don't use Hadoop Writable in JoinOperatorTest

fa0b463

[FLINK-2268] Don't use commons-io ByteArrayOutputStream in NFATest

d5866ad

commons-io is only usable as a transitive dependency of the Hadoop dependencies. We can just use the Java ByteArrayOutputStream and get rid of that dependency.

[FLINK-2268] Don't use Hadoop FileSystem in RocksDB tests

712db79

This was in there because of legacy reasons but is not required by the test.

[FLINK-2268] Don't use jets3t in MesosArtifactServer

5bdcf9b

This was only used for the Enum for a specific http response type. The jets3t dependency is only available as a transitive dependency of the Hadoop dependencies, that's why we remove it.

[FLINK-2268] Only print Hadoop env info if Hadoop is in the classpath

f5d3c72

[FLINK-2268] Close Hadoop FS reflectively in TestBaseUtils

23f3fde

This removes the dependency on Hadoop and ensures that we only close if Hadoop is available.

[FLINK-2268] Remove Writable support from Scala TypeInformation Macro

ed11548

aljoscha added 6 commits September 27, 2017 10:05

[FLINK-2268] Dynamically load Hadoop security module when available

7f1c233

[FLINK-2268] Allow not including the Hadoop uber jar

2b96c8d

[hotfix] Fix containerized.master.env.* config options

ec9e6e4

[hotfix] Make SecurityUtils.SecurityConfiguration a toplevel class

b4120c1

[hotfix] Java 8-ify HadoopSecurityContext

2eaf92b

aljoscha force-pushed the hadoop-free-flink branch from 223e045 to 2eaf92b Compare September 27, 2017 08:05

asfgit merged commit 2eaf92b into apache:master Sep 27, 2017

aljoscha deleted the hadoop-free-flink branch September 28, 2017 12:35

rmetzger added the component=BuildSystem label Mar 18, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FLINK-2268] Allow Flink binary release without Hadoop #4636

[FLINK-2268] Allow Flink binary release without Hadoop #4636

aljoscha commented Sep 4, 2017

zentol Sep 4, 2017

aljoscha Sep 6, 2017

zentol Sep 4, 2017

aljoscha Sep 6, 2017

aljoscha Sep 26, 2017

zentol Sep 4, 2017

aljoscha Sep 6, 2017

zentol Sep 4, 2017

aljoscha Sep 6, 2017

zentol Sep 4, 2017

aljoscha Sep 6, 2017

zentol Sep 4, 2017

aljoscha Sep 6, 2017

zentol Sep 4, 2017

aljoscha Sep 6, 2017

zentol Sep 4, 2017

aljoscha Sep 6, 2017

zentol Sep 4, 2017

aljoscha Sep 6, 2017

zentol Sep 4, 2017

aljoscha Sep 6, 2017

EronWright commented Sep 5, 2017

EronWright commented Sep 5, 2017

aljoscha commented Sep 6, 2017

zentol Sep 18, 2017

aljoscha Sep 19, 2017

aljoscha commented Sep 19, 2017

		} else {
		sc = new SecurityUtils.SecurityConfiguration(configuration);

[FLINK-2268] Allow Flink binary release without Hadoop #4636

[FLINK-2268] Allow Flink binary release without Hadoop #4636

Conversation

aljoscha commented Sep 4, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

EronWright commented Sep 5, 2017

EronWright commented Sep 5, 2017

aljoscha commented Sep 6, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

aljoscha commented Sep 19, 2017