[BEAM-11] Integrate Spark runner with Beam #42

amitsela · 2016-03-12T15:59:20Z

No description provided.

…be replaced by a SparkStateInternals implementation

davorbonaci · 2016-03-13T04:01:04Z

I'll take a peek at this one shortly.

davorbonaci · 2016-03-13T06:50:20Z

runners/spark/pom.xml

+        </dependency>
+        <dependency>
+            <groupId>com.google.cloud.dataflow</groupId>
+            <artifactId>google-cloud-dataflow-java-examples-all</artifactId>


Is this really needed?

It sounds fair to me to prefer the runners logger.. Flink runner does the same.

The dependency is needed because some of the examples are used to test the Spark runner.

davorbonaci · 2016-03-13T06:54:28Z

LGTM

Nice!

I think we'll have to go over all pom.xml files in the project and fix them up globally -- but, that's unrelated to this pull request.

davorbonaci · 2016-03-13T06:56:16Z

(We should get to the bottom of the Jenkins failure before merging.)

amitsela · 2016-03-13T07:31:22Z

R: @tomwhite as well

tomwhite · 2016-03-14T10:52:09Z

runners/spark/pom.xml

+                                <!--<transformers>-->
+                                    <!--<transformer implementation="org.apache.maven.plugins.shade.resource.ServicesResourceTransformer" />-->
+                                <!--</transformers>-->
+                            <!--</configuration>-->


Guava will still need to be relocated to run properly on a cluster, won't it?

The SDK upgraded to Guava 19 but I guess shading is still necessary for cluster. I'll reinstate the shade configuration.

tomwhite · 2016-03-14T11:12:26Z

Looks good to me. Thanks for working on it @amitsela. A few comments inline and here:

The license headers should be changed to ASF ones.
Since you are reorganising packages, how about keeping only the ones that clients use (SparkPipelineRunner, SparkPipelineOptions, EvaluationResult) in the top-level org.apache.beam.runners.spark package, and moving all the others to subpackages?
Remove .gitignore and .travis.yml.

tomwhite · 2016-03-14T11:17:34Z

Also, the note about the Spark runner on https://github.com/apache/incubator-beam#runners should be updated to say that it's now a part of Beam.

amitsela · 2016-03-14T11:43:57Z

Thanks @tomwhite and @davorbonaci !
I'll do a second iteration and hope that by the time I'm done, the Jenkins issue will be solved as well :)

I plan to address the following:

Shade configuration
ASF licenses
Remove .gitignore and .travis.iml
Package organization
Updated README

@davorbonaci like you said, I think that we need a cross-project pom.xml work to get all components inline, but let's get this runner running first :)

…th version used by Hadoop (v11)

amitsela · 2016-03-14T19:26:49Z

This pull request is till pending additional work so please DON'T MERGE.
An early push was executed to trigger a Jenkins job to test a new configuration.

Thanks!

amitsela · 2016-03-15T06:23:58Z

@tomwhite please review second iteration. Thanks.

tomwhite · 2016-03-15T16:29:38Z

+1 from me

Wire job service API into portable runner PipelineResults

This closes apache#42

chore: complete URN -> dyn Coder codes

Sela added 5 commits March 12, 2016 00:37

[BEAM-11] Spark runner directory structure and pom setup.

3687929

[BEAM-11] set coder for pipeline input

65355dd

[BEAM-11] extractOutput() should not return null

b34886e

[BEAM-11] This is a placeholder to get the TfIdfTest working. Should …

6cfa32b

…be replaced by a SparkStateInternals implementation

[BEAM-11] Add Spark Beam runner module

fa45813

davorbonaci reviewed Mar 13, 2016
View reviewed changes

tomwhite reviewed Mar 14, 2016
View reviewed changes

Sela added 3 commits March 14, 2016 18:53

[BEAM-11] relocate Guava used by Dataflow (v19) since it conflicts wi…

fca5a09

…th version used by Hadoop (v11)

[BEAM-11] Replaced license headers to ASF license

543c82b

[BEAM-11] remove gitignore and travis.yml

43acb60

Sela added 2 commits March 14, 2016 23:48

[BEAM-11] second iteration of package reorganisation

ae865e9

[BEAM-11] add Spark runner to included runners

014fbde

asfgit closed this in a91e115 Mar 15, 2016

aljoscha pushed a commit to aljoscha/beam that referenced this pull request Mar 29, 2018

Merge pull request apache#42 from bsidhom/portable-pipeline-result

92a8d0d

Wire job service API into portable runner PipelineResults

tvalentyn pushed a commit to tvalentyn/beam that referenced this pull request May 15, 2018

[BEAM-602] make feature branches more discoverable

a7be66d

This closes apache#42

sjvanrossum pushed a commit to sjvanrossum/beam that referenced this pull request May 22, 2023

Merge pull request apache#42 from laysakura/feat/complete-coder_from_urn

808f331

chore: complete URN -> dyn Coder codes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BEAM-11] Integrate Spark runner with Beam #42

[BEAM-11] Integrate Spark runner with Beam #42

amitsela commented Mar 12, 2016

davorbonaci commented Mar 13, 2016

davorbonaci Mar 13, 2016

amitsela Mar 13, 2016

tomwhite Mar 14, 2016

davorbonaci commented Mar 13, 2016

davorbonaci commented Mar 13, 2016

amitsela commented Mar 13, 2016

tomwhite Mar 14, 2016

amitsela Mar 14, 2016

tomwhite commented Mar 14, 2016

tomwhite commented Mar 14, 2016

amitsela commented Mar 14, 2016

amitsela commented Mar 14, 2016

amitsela commented Mar 15, 2016

tomwhite commented Mar 15, 2016

[BEAM-11] Integrate Spark runner with Beam #42

[BEAM-11] Integrate Spark runner with Beam #42

Conversation

amitsela commented Mar 12, 2016

davorbonaci commented Mar 13, 2016

davorbonaci Mar 13, 2016

Choose a reason for hiding this comment

amitsela Mar 13, 2016

Choose a reason for hiding this comment

tomwhite Mar 14, 2016

Choose a reason for hiding this comment

davorbonaci commented Mar 13, 2016

davorbonaci commented Mar 13, 2016

amitsela commented Mar 13, 2016

tomwhite Mar 14, 2016

Choose a reason for hiding this comment

amitsela Mar 14, 2016

Choose a reason for hiding this comment

tomwhite commented Mar 14, 2016

tomwhite commented Mar 14, 2016

amitsela commented Mar 14, 2016

amitsela commented Mar 14, 2016

amitsela commented Mar 15, 2016

tomwhite commented Mar 15, 2016