From ddce5fbf14600bacb27833c379260d478ff96972 Mon Sep 17 00:00:00 2001 From: mbalassi Date: Sun, 14 Jun 2015 22:21:43 +0200 Subject: [PATCH 1/2] [FLINK-2209] [docs] Document linking with jars not in the binary dist --- docs/apis/cluster_execution.md | 72 ++++++++++++++++++++++++++++++++++ docs/apis/streaming_guide.md | 12 ++++-- docs/libs/gelly_guide.md | 2 + docs/libs/ml/index.md | 2 + docs/libs/table.md | 2 + 5 files changed, 87 insertions(+), 3 deletions(-) diff --git a/docs/apis/cluster_execution.md b/docs/apis/cluster_execution.md index c2b3c279a4db6..7193cf618999d 100644 --- a/docs/apis/cluster_execution.md +++ b/docs/apis/cluster_execution.md @@ -144,3 +144,75 @@ public static void main(String[] args) throws Exception { Note that the program contains custom UDFs and hence requires a JAR file with the classes of the code attached. The constructor of the remote executor takes the path(s) to the JAR file(s). + +## Linking with modules not contained in the binary distribution + +The binary distribution contains jar packages in the `lib` folder that are automatically +provided to the classpath of your distrbuted programs. Almost all of Flink classes are +located there with a few exceptions, for example the streaming connectors and some freshly +added modules. To run code depending on these modules you need to make them accessible +during runtime, for which we suggest two options: + +1. Either copy the required jar files to the `lib` folder onto all of your TaskManagers. +Note that you have to restar your TaskManagers after this. +2. Or package them with your usercode. + +The latter version is recommended as it respects the classloader management in Flink. + +### Packaging dependencies with your usercode with Maven + +To provide these dependencies not included by Flink we suggest two options with Maven. + +1. The maven assembly plugin builds a so called fat jar cointaining all your dependencies. +Assembly configuration is straight-forward, but the resulting jar might become bulky. See +[usage](http://maven.apache.org/plugins/maven-assembly-plugin/usage.html). +2. The maven unpack plugin, for unpacking the relevant parts of the dependencies and +then package it with your code. + +Using the latter approach in order to bundle the Kafka connector, `flink-connector-kafka` +you would need to add the classes from both the connector and the Kafka API itself. Add +the following to your plugins section. + +~~~xml + + org.apache.maven.plugins + maven-dependency-plugin + 2.9 + + + unpack + + prepare-package + + unpack + + + + + + org.apache.flink + flink-connector-kafka + {{ site.version }} + jar + false + ${project.build.directory}/classes + org/apache/flink/** + + + + org.apache.kafka + kafka_ + + jar + false + ${project.build.directory}/classes + kafka/** + + + + + + +~~~ + +Now when running `mvn clean package` the produced jar includes the required dependencies. diff --git a/docs/apis/streaming_guide.md b/docs/apis/streaming_guide.md index 0a7a486022f4d..a6e837f0a8476 100644 --- a/docs/apis/streaming_guide.md +++ b/docs/apis/streaming_guide.md @@ -1377,11 +1377,13 @@ This connector provides access to data streams from [Apache Kafka](https://kafka {% highlight xml %} org.apache.flink - flink-kafka-connector + flink-connector-kafka {{site.version }} {% endhighlight %} +Note that the streaming connectors are currently not part of the binary distribution. See linking with them for cluster execution [here](cluster_execution.html#linking-with-modules-not-contained-in-the-binary-distribution). + #### Installing Apache Kafka * Follow the instructions from [Kafka's quickstart](https://kafka.apache.org/documentation.html#quickstart) to download the code and launch a server (launching a Zookeeper and a Kafka server is required every time before starting the application). * On 32 bit computers [this](http://stackoverflow.com/questions/22325364/unrecognized-vm-option-usecompressedoops-when-running-kafka-from-my-ubuntu-in) problem may occur. @@ -1513,11 +1515,13 @@ This connector provides access to data streams from [RabbitMQ](http://www.rabbit {% highlight xml %} org.apache.flink - flink-rabbitmq-connector + flink-connector-rabbitmq {{site.version }} {% endhighlight %} +Note that the streaming connectors are currently not part of the binary distribution. See linking with them for cluster execution [here](cluster_execution.html#linking-with-modules-not-contained-in-the-binary-distribution). + #### Installing RabbitMQ Follow the instructions from the [RabbitMQ download page](http://www.rabbitmq.com/download.html). After the installation the server automatically starts, and the application connecting to RabbitMQ can be launched. @@ -1585,11 +1589,13 @@ Twitter Streaming API provides opportunity to connect to the stream of tweets ma {% highlight xml %} org.apache.flink - flink-twitter-connector + flink-connector-twitter {{site.version }} {% endhighlight %} +Note that the streaming connectors are currently not part of the binary distribution. See linking with them for cluster execution [here](cluster_execution.html#linking-with-modules-not-contained-in-the-binary-distribution). + #### Authentication In order to connect to Twitter stream the user has to register their program and acquire the necessary information for the authentication. The process is described below. diff --git a/docs/libs/gelly_guide.md b/docs/libs/gelly_guide.md index 804efabf9f307..c7880125a708f 100644 --- a/docs/libs/gelly_guide.md +++ b/docs/libs/gelly_guide.md @@ -43,6 +43,8 @@ Add the following dependency to your `pom.xml` to use Gelly. ~~~ +Note that Gelly is currently not part of the binary distribution. See linking with it for cluster execution [here](../apis/cluster_execution.html#linking-with-modules-not-contained-in-the-binary-distribution). + The remaining sections provide a description of available methods and present several examples of how to use Gelly and how to mix it with the Flink Java API. After reading this guide, you might also want to check the {% gh_link /flink-staging/flink-gelly/src/main/java/org/apache/flink/graph/example/ "Gelly examples" %}. Graph Representation diff --git a/docs/libs/ml/index.md b/docs/libs/ml/index.md index 9ff7a4b3f9bed..e81b354bfb50f 100644 --- a/docs/libs/ml/index.md +++ b/docs/libs/ml/index.md @@ -69,6 +69,8 @@ Next, you have to add the FlinkML dependency to the `pom.xml` of your project. {% endhighlight %} +Note that FlinkML is currently not part of the binary distribution. See linking with it for cluster execution [here](../apis/cluster_execution.html#linking-with-modules-not-contained-in-the-binary-distribution). + Now you can start solving your analysis task. The following code snippet shows how easy it is to train a multiple linear regression model. diff --git a/docs/libs/table.md b/docs/libs/table.md index 829c9cfd91a04..4db5a871d8639 100644 --- a/docs/libs/table.md +++ b/docs/libs/table.md @@ -37,6 +37,8 @@ The following dependency must be added to your project when using the Table API: {% endhighlight %} +Note that the Table API is currently not part of the binary distribution. See linking with it for cluster execution [here](../apis/cluster_execution.html#linking-with-modules-not-contained-in-the-binary-distribution). + ## Scala Table API The Table API can be enabled by importing `org.apache.flink.api.scala.table._`. This enables From 018135c6a8b32e3cbe47e3697313bc894484a497 Mon Sep 17 00:00:00 2001 From: mbalassi Date: Sun, 14 Jun 2015 22:24:52 +0200 Subject: [PATCH 2/2] [docs] Update obsolate cluster execution guide --- docs/apis/cluster_execution.md | 67 +--------------------------------- 1 file changed, 1 insertion(+), 66 deletions(-) diff --git a/docs/apis/cluster_execution.md b/docs/apis/cluster_execution.md index 7193cf618999d..f9844d728e3f4 100644 --- a/docs/apis/cluster_execution.md +++ b/docs/apis/cluster_execution.md @@ -60,7 +60,7 @@ The following illustrates the use of the `RemoteEnvironment`: ~~~java public static void main(String[] args) throws Exception { ExecutionEnvironment env = ExecutionEnvironment - .createRemoteEnvironment("strato-master", "7661", "/home/user/udfs.jar"); + .createRemoteEnvironment("flink-master", 6123, "/home/user/udfs.jar"); DataSet data = env.readTextFile("hdfs://path/to/file"); @@ -80,71 +80,6 @@ Note that the program contains custom user code and hence requires a JAR file wi the classes of the code attached. The constructor of the remote environment takes the path(s) to the JAR file(s). -## Remote Executor - -Similar to the RemoteEnvironment, the RemoteExecutor lets you execute -Flink programs on a cluster directly. The remote executor accepts a -*Plan* object, which describes the program as a single executable unit. - -### Maven Dependency - -If you are developing your program in a Maven project, you have to add the -`flink-clients` module using this dependency: - -~~~xml - - org.apache.flink - flink-clients - {{ site.version }} - -~~~ - -### Example - -The following illustrates the use of the `RemoteExecutor` with the Scala API: - -~~~scala -def main(args: Array[String]) { - val input = TextFile("hdfs://path/to/file") - - val words = input flatMap { _.toLowerCase().split("""\W+""") filter { _ != "" } } - val counts = words groupBy { x => x } count() - - val output = counts.write(wordsOutput, CsvOutputFormat()) - - val plan = new ScalaPlan(Seq(output), "Word Count") - val executor = new RemoteExecutor("strato-master", 7881, "/path/to/jarfile.jar") - executor.executePlan(p); -} -~~~ - -The following illustrates the use of the `RemoteExecutor` with the Java API (as -an alternative to the RemoteEnvironment): - -~~~java -public static void main(String[] args) throws Exception { - ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment(); - - DataSet data = env.readTextFile("hdfs://path/to/file"); - - data - .filter(new FilterFunction() { - public boolean filter(String value) { - return value.startsWith("http://"); - } - }) - .writeAsText("hdfs://path/to/result"); - - Plan p = env.createProgramPlan(); - RemoteExecutor e = new RemoteExecutor("strato-master", 7881, "/path/to/jarfile.jar"); - e.executePlan(p); -} -~~~ - -Note that the program contains custom UDFs and hence requires a JAR file with -the classes of the code attached. The constructor of the remote executor takes -the path(s) to the JAR file(s). - ## Linking with modules not contained in the binary distribution The binary distribution contains jar packages in the `lib` folder that are automatically