[BEAM-507] Fill in the documentation/runners/spark portion of the web… #103

amitsela · 2016-12-08T13:47:35Z

…site.

amitsela · 2016-12-08T13:47:44Z

asfbot · 2016-12-08T13:48:55Z

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/beam_PreCommit_Website_Test/136/
--none--

asfbot · 2016-12-08T13:54:05Z

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/beam_PreCommit_Website_Stage/179/

Jenkins built the site at commit id 5e5caf7 with Jekyll and staged it here. Happy reviewing.

Note that any previous site has been deleted. This staged site will be automatically deleted after its TTL expires. Push any commit to the pull request branch or re-trigger the build to get it staged again.

jbonofre · 2016-12-08T13:53:46Z

src/documentation/runners/spark.md

+  <artifactId>beam-runners-spark</artifactId>
+  <version>{{ site.release_latest }}</version>
+</dependency>
+```


As the Spark runner doesn't provide a BoM (I created another Jira about that), I think end-users have to define the following additional dependencies:

<dependency> <groupId>org.apache.hadoop</groupId> <artifactId>hadoop-common</artifactId> <version>${hadoop.version}</version> <exclusions>  <exclusion> <groupId>org.mortbay.jetty</groupId> <artifactId>servlet-api</artifactId> </exclusion> </exclusions> </dependency> <dependency> <groupId>org.apache.hadoop</groupId> <artifactId>hadoop-client</artifactId> <version>${hadoop.version}</version> </dependency> <dependency> <groupId>org.apache.hadoop</groupId> <artifactId>hadoop-mapreduce-client-core</artifactId> <version>${hadoop.version}</version> </dependency> <dependency> <groupId>com.fasterxml.jackson.core</groupId> <artifactId>jackson-core</artifactId> <version>${jackson.version}</version> </dependency> <dependency> <groupId>com.fasterxml.jackson.core</groupId> <artifactId>jackson-annotations</artifactId> <version>${jackson.version}</version> </dependency> <dependency> <groupId>com.fasterxml.jackson.core</groupId> <artifactId>jackson-databind</artifactId> <version>${jackson.version}</version> </dependency> <dependency> <groupId>com.fasterxml.jackson.module</groupId> <artifactId>jackson-module-scala_2.10</artifactId> <version>${jackson.version}</version> </dependency>

Great! I'll use this instead.
This one works for you ?

It's what I'm using in the spark-runner Maven profile in beam-samples. Not tested super recently (I will do it) but it worked fine.

I ran a very simple Create -> Distinct -> TextIO.Write pipeline and ran (self-contained) on a Spark Standalone cluster (master + 1 executor, on my laptop), and didn't require any dependencies (except for Spark, since it's self-contained.. not deployed like YARN installations are sometimes). Adding an example based on that. YARN examples better wait for HDFS support.

OK, fair enough. I will do a test but it sounds good.

jbonofre · 2016-12-08T13:54:23Z

src/documentation/runners/spark.md

+
+### Deploying Spark with your application
+
+In some cases, such as running in local mode, your (self-contained) application would be required to pack Spark by explicitly adding the following dependencies in your pom.xml:


I would add a "clear" sentence like: Spark runner standalone/embedded mode.

I used local mode because standalone is a bit confusing as it is the name of Spark's own resource manager.

Yes, agree, cleaner. My point was just really to let user understand that spark is "part" of the execution (it's not an external cluster).

jbonofre · 2016-12-08T13:55:03Z

src/documentation/runners/spark.md

+
+Deploying your Beam pipeline on a cluster that already has a Spark deployment does not require any additional dependencies.
+For more details on the different deployment modes see: [Standalone](http://spark.apache.org/docs/latest/spark-standalone.html), [YARN](http://spark.apache.org/docs/latest/running-on-yarn.html), or [Mesos](http://spark.apache.org/docs/latest/running-on-mesos.html).
+


Maybe an exemple using spark-submit would help (even if it might look stupid).

amitsela · 2016-12-08T14:17:22Z

I'll add the example pom dependencies and submit to yarn example.

amitsela · 2016-12-08T20:58:21Z

@jbonofre added an example for packaging and submitting in a Standalone cluster.
~40% of Spark users use it so it should be useful.
As mentioned in my comments, I prefer waiting with YARN examples to include HDFS.

asfbot · 2016-12-08T20:59:21Z

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/beam_PreCommit_Website_Stage/180/

Jenkins built the site at commit id d6938c5 with Jekyll and staged it here. Happy reviewing.

Note that any previous site has been deleted. This staged site will be automatically deleted after its TTL expires. Push any commit to the pull request branch or re-trigger the build to get it staged again.

asfbot · 2016-12-08T21:00:17Z

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/beam_PreCommit_Website_Test/137/
--none--

jbonofre

LGTM

davorbonaci · 2016-12-09T19:32:16Z

LGTM. (Happy to merge @amitsela, if you don't have the website setup ready.)

amitsela · 2016-12-09T20:33:40Z

@davorbonaci I'll happily accept your merge offer 😄 thanks!

davorbonaci · 2016-12-09T20:51:34Z

Merged. Thanks @amitsela, this is great.

(Separately, BEAM-900 would be an awesome improvement to the Quickstart, and probably very easy to do.)

[BEAM-507] Fill in the documentation/runners/spark portion of the web…

5e5caf7

…site.

jbonofre requested changes Dec 8, 2016

View reviewed changes

Added example.

d6938c5

jbonofre approved these changes Dec 9, 2016

View reviewed changes

asfgit closed this in eb5397b Dec 9, 2016

amitsela deleted the BEAM-507 branch December 9, 2016 20:53

robertwb pushed a commit to robertwb/incubator-beam that referenced this pull request Jun 5, 2018

This closes apache/beam-site#103

00fe4c6

melap pushed a commit to apache/beam that referenced this pull request Jun 20, 2018

This closes apache/beam-site#103

dd70c90

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BEAM-507] Fill in the documentation/runners/spark portion of the web… #103

[BEAM-507] Fill in the documentation/runners/spark portion of the web… #103

amitsela commented Dec 8, 2016

amitsela commented Dec 8, 2016

asfbot commented Dec 8, 2016

asfbot commented Dec 8, 2016

jbonofre Dec 8, 2016

amitsela Dec 8, 2016

jbonofre Dec 8, 2016

amitsela Dec 8, 2016

jbonofre Dec 9, 2016

jbonofre Dec 8, 2016

amitsela Dec 8, 2016

jbonofre Dec 8, 2016

jbonofre Dec 8, 2016

amitsela Dec 8, 2016

amitsela commented Dec 8, 2016

amitsela commented Dec 8, 2016 •

edited

asfbot commented Dec 8, 2016

asfbot commented Dec 8, 2016

jbonofre left a comment

davorbonaci commented Dec 9, 2016

amitsela commented Dec 9, 2016

davorbonaci commented Dec 9, 2016


		### Deploying Spark with your application

		In some cases, such as running in local mode, your (self-contained) application would be required to pack Spark by explicitly adding the following dependencies in your pom.xml:


		Deploying your Beam pipeline on a cluster that already has a Spark deployment does not require any additional dependencies.
		For more details on the different deployment modes see: [Standalone](http://spark.apache.org/docs/latest/spark-standalone.html), [YARN](http://spark.apache.org/docs/latest/running-on-yarn.html), or [Mesos](http://spark.apache.org/docs/latest/running-on-mesos.html).

[BEAM-507] Fill in the documentation/runners/spark portion of the web… #103

[BEAM-507] Fill in the documentation/runners/spark portion of the web… #103

Conversation

amitsela commented Dec 8, 2016

amitsela commented Dec 8, 2016

asfbot commented Dec 8, 2016

asfbot commented Dec 8, 2016

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

amitsela commented Dec 8, 2016

amitsela commented Dec 8, 2016 • edited

asfbot commented Dec 8, 2016

asfbot commented Dec 8, 2016

jbonofre left a comment

Choose a reason for hiding this comment

davorbonaci commented Dec 9, 2016

amitsela commented Dec 9, 2016

davorbonaci commented Dec 9, 2016

amitsela commented Dec 8, 2016 •

edited