From 5c8d86082a5208abb26c15b7ef02372f02882eaa Mon Sep 17 00:00:00 2001 From: "Tzu-Li (Gordon) Tai" Date: Tue, 21 Mar 2017 13:54:48 +0800 Subject: [PATCH 1/2] [FLINK-6139] [doc] Documentation for running Flink with MapR --- docs/setup/aws.md | 2 +- docs/setup/cluster_setup.md | 2 +- docs/setup/gce_setup.md | 2 +- docs/setup/mapr_setup.md | 207 ++++++++++++++++++++++++++++++++++++ docs/setup/yarn_setup.md | 2 +- 5 files changed, 211 insertions(+), 4 deletions(-) create mode 100644 docs/setup/mapr_setup.md diff --git a/docs/setup/aws.md b/docs/setup/aws.md index 9ebef61190aa9..cee568053b5da 100644 --- a/docs/setup/aws.md +++ b/docs/setup/aws.md @@ -2,7 +2,7 @@ title: "Amazon Web Services (AWS)" nav-title: AWS nav-parent_id: deployment -nav-pos: 10 +nav-pos: 5 --- + +This documentation provides instructions on how to prepare Flink for YARN +executions on a [MapR](https://mapr.com/) cluster. + +* This will be replaced by the TOC +{:toc} + +## Running Flink on YARN with MapR + +The instructions below assume MapR version 5.2.0. They will guide you +to be able to start submitting [Flink on YARN]({{ site.baseurl }}/setup/yarn_setup.html) +jobs or sessions to a MapR cluster. + +### Building Flink for MapR + +In order to run Flink on MapR, Flink needs to be built with MapR's own +Hadoop and Zookeeper distribution. Before this can be done, due to some of MapR's +Hadoop / Zookeeper dependency clashes with Flink's dependencies, the following +modifications to some of Flink's POM files is required: + +In `flink/pom.xml`, exclude the Netty dependency from Zookeeper: + +~~~xml + + ... + + + org.apache.zookeeper + zookeeper + ${zookeeper.version} + + + + org.jboss.netty + netty + + + + + ... + +~~~ + +In `flink/flink-shaded-hadoop/flink-shaded-hadoop2/pom.xml`, exclude +`com.mapr.hadoop.*` and `com.mapr.fs.*` dependencies from `hadoop-common` +so that they aren't bundled with Flink. This ensures that the native MapR +libraries on your MapR cluster nodes are correctly used: + +~~~xml + + ... + + + org.apache.hadoop + hadoop-common + ${hadoop.version} + + ... + + + + com.mapr.hadoop + maprfs-core + + + com.mapr.hadoop + hadoop2 + + + com.mapr.hadoop + maprfs + + + com.mapr.hadoop + maprfs-diagnostic-tools + + + com.mapr.hadoop + maprfs-jni + + + com.mapr.fs + libprotodefs + + + com.mapr.fs + mapr-hbase + + + ... + + + + ... + +~~~ + +Finally, build Flink for MapR by overriding the Hadoop and Zookeeper version: + +``` +mvn clean install -DskipTests -Pvendor-repos -Dhadoop.version=2.7.0-mapr-1607 -Dzookeeper.version=3.4.5-mapr-1604 +``` + +The `vendor-repos` profile is required to include MapR repositories when +searching for the MapR Hadoop / Zookeeper distributions. +For other MapR versions, simply change the `hadoop.version` and `zookeeper.version` values appropriately. + +### Job Submission Client Setup + +The client submitting Flink jobs to MapR also needs to be prepared with the below setups. + +Ensure that MapR's JAAS config file is picked up to avoid login failures: + +``` +export JVM_ARGS=-Djava.security.auth.login.config=/opt/mapr/conf/mapr.login.conf +``` + +Make sure that the `yarn.nodemanager.resource.cpu-vcores` property is set in `yarn-site.xml`: + +~~~xml + + + +... + + + yarn.nodemanager.resource.cpu-vcores + ... + + +... + +~~~ + +Also remember to set the `YARN_CONF_DIR` or `HADOOP_CONF_DIR` environment +variables to the path where `yarn-site.xml` is located: + +``` +export YARN_CONF_DIR=/opt/mapr/hadoop/hadoop-2.7.0/etc/hadoop/ +export HADOOP_CONF_DIR=/opt/mapr/hadoop/hadoop-2.7.0/etc/hadoop/ +``` + +Make sure that the MapR native libraries are picked up in the classpath: + +``` +export FLINK_CLASSPATH=/opt/mapr/lib/* +``` + +If you'll be starting Flink on YARN sessions with `yarn-session.sh`, the +below is also required: + +``` +export CC_CLASSPATH=/opt/mapr/lib/* +``` + +## Running Flink with a Secured MapR Cluster + +*Note: In Flink 1.2.0, Flink's Kerberos authentication for YARN execution has +a bug that forbids it to work with MapR Security. Please upgrade to later Flink +versions in order to use Flink with a secured MapR cluster. For more details, +please see [FLINK-5949](https://issues.apache.org/jira/browse/FLINK-5949).* + +Flink's [Kerberos authentication]({{ site.baseurl }}/ops/security-kerberos.html) is independent of +[MapR's Security authentication](http://maprdocs.mapr.com/home/SecurityGuide/Configuring-MapR-Security.html). +With the above build procedures and environment variable setups, Flink +does not require any additional configuration to work with MapR Security. + +Users simply need to login by using MapR's `maprlogin` authentication +utility. Users that haven't acquired MapR login credentials would not be +able to submit Flink jobs, erroring with: + +``` +java.lang.Exception: unable to establish the security context +Caused by: o.a.f.r.security.modules.SecurityModule$SecurityInstallException: Unable to set the Hadoop login user +Caused by: java.io.IOException: failure to login: Unable to obtain MapR credentials +``` diff --git a/docs/setup/yarn_setup.md b/docs/setup/yarn_setup.md index 53423b8232ea4..3149ec25ef9b4 100644 --- a/docs/setup/yarn_setup.md +++ b/docs/setup/yarn_setup.md @@ -2,7 +2,7 @@ title: "YARN Setup" nav-title: YARN nav-parent_id: deployment -nav-pos: 3 +nav-pos: 2 --- - - - org.jboss.netty - netty - - - - - ... - -~~~ - -In `flink/flink-shaded-hadoop/flink-shaded-hadoop2/pom.xml`, exclude -`com.mapr.hadoop.*` and `com.mapr.fs.*` dependencies from `hadoop-common` -so that they aren't bundled with Flink. This ensures that the native MapR -libraries on your MapR cluster nodes are correctly used: - -~~~xml - - ... - - - org.apache.hadoop - hadoop-common - ${hadoop.version} - - ... - - - - com.mapr.hadoop - maprfs-core - - - com.mapr.hadoop - hadoop2 - - - com.mapr.hadoop - maprfs - - - com.mapr.hadoop - maprfs-diagnostic-tools - - - com.mapr.hadoop - maprfs-jni - - - com.mapr.fs - libprotodefs - - - com.mapr.fs - mapr-hbase - - - ... - - - - ... - -~~~ - -Finally, build Flink for MapR by overriding the Hadoop and Zookeeper version: +Hadoop and Zookeeper distribution. Simply build Flink using Maven with +the following command from the project root directory: ``` -mvn clean install -DskipTests -Pvendor-repos -Dhadoop.version=2.7.0-mapr-1607 -Dzookeeper.version=3.4.5-mapr-1604 +mvn clean install -DskipTests -Pvendor-repos,mapr \ + -Dhadoop.version=2.7.0-mapr-1607 \ + -Dzookeeper.version=3.4.5-mapr-1604 ``` -The `vendor-repos` profile is required to include MapR repositories when -searching for the MapR Hadoop / Zookeeper distributions. -For other MapR versions, simply change the `hadoop.version` and `zookeeper.version` values appropriately. +The `vendor-repos` build profile adds MapR's repository to the build so that +MapR's Hadoop / Zookeeper dependencies can be fetched. The `mapr` build +profile additionally resolves some dependency clashes between MapR and +Flink, as well as ensuring that the native MapR libraries on the cluster +nodes are used. Both profiles must be activated. + +By default the `mapr` profile builds with Hadoop / Zookeeper dependencies +for MapR version 5.2.0, so you don't need to explicitly override +the `hadoop.version` and `zookeeper.version` properties. +For different MapR versions, simply override these properties to appropriate +values. The corresponding Hadoop / Zookeeper distributions for each MapR version +can be found on MapR documentations such as +[here](http://maprdocs.mapr.com/home/DevelopmentGuide/MavenArtifacts.html). ### Job Submission Client Setup diff --git a/flink-shaded-hadoop/flink-shaded-hadoop2/pom.xml b/flink-shaded-hadoop/flink-shaded-hadoop2/pom.xml index 86f3f9192ed44..c750bbda80584 100644 --- a/flink-shaded-hadoop/flink-shaded-hadoop2/pom.xml +++ b/flink-shaded-hadoop/flink-shaded-hadoop2/pom.xml @@ -652,4 +652,160 @@ under the License. + + + + + mapr + + + org.apache.hadoop + hadoop-common + ${hadoop.version} + + + + com.mapr.hadoop + maprfs-core + + + com.mapr.hadoop + hadoop2 + + + com.mapr.hadoop + maprfs + + + com.mapr.hadoop + maprfs-diagnostic-tools + + + com.mapr.hadoop + maprfs-jni + + + com.mapr.fs + libprotodefs + + + com.mapr.fs + mapr-hbase + + + + asm + asm + + + org.ow2.asm + asm + + + tomcat + jasper-compiler + + + tomcat + jasper-runtime + + + org.mortbay.jetty + jetty + + + org.mortbay.jetty + jsp-api-2.1 + + + org.mortbay.jetty + jsp-2.1 + + + + org.eclipse.jdt + core + + + org.mortbay.jetty + jetty + + + com.sun.jersey + jersey-json + + + org.codehaus.jettison + jettison + + + com.sun.jersey + jersey-server + + + tomcat + jasper-compiler + + + tomcat + jasper-runtime + + + javax.servlet.jsp + jsp-api + + + com.sun.jersey.jersey-test-framework + jersey-test-framework-grizzly2 + + + com.sun.jersey.jersey-test-framework + jersey-test-framework-core + + + com.sun.jersey + jersey-grizzly2 + + + org.glassfish.grizzly + grizzly-http + + + org.glassfish.grizzly + grizzly-framework + + + org.glassfish.grizzly + grizzly-http-server + + + org.glassfish.grizzly + grizzly-rcm + + + org.glassfish.grizzly + grizzly-http-servlet + + + org.glassfish + javax.servlet + + + com.sun.jersey.contribs + jersey-guice + + + + commons-beanutils + commons-beanutils + + + + + + diff --git a/pom.xml b/pom.xml index 538aa4b6c522f..6c51770d0eb9f 100644 --- a/pom.xml +++ b/pom.xml @@ -521,6 +521,42 @@ under the License. + + + mapr + + + + 2.7.0-mapr-1607 + 3.4.5-mapr-1604 + + + + + org.apache.zookeeper + zookeeper + ${zookeeper.version} + + + + org.jboss.netty + netty + + + + + + aggregate-scaladoc