Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BEAM-11] Integrate Spark runner with Beam #42

Closed
wants to merge 10 commits into from

Conversation

amitsela
Copy link
Member

No description provided.

@davorbonaci
Copy link
Member

I'll take a peek at this one shortly.

R: @davorbonaci

</dependency>
<dependency>
<groupId>com.google.cloud.dataflow</groupId>
<artifactId>google-cloud-dataflow-java-examples-all</artifactId>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this really needed?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It sounds fair to me to prefer the runners logger.. Flink runner does the same.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The dependency is needed because some of the examples are used to test the Spark runner.

@davorbonaci
Copy link
Member

LGTM

Nice!

I think we'll have to go over all pom.xml files in the project and fix them up globally -- but, that's unrelated to this pull request.

@davorbonaci
Copy link
Member

(We should get to the bottom of the Jenkins failure before merging.)

@amitsela
Copy link
Member Author

R: @tomwhite as well

<!--<transformers>-->
<!--<transformer implementation="org.apache.maven.plugins.shade.resource.ServicesResourceTransformer" />-->
<!--</transformers>-->
<!--</configuration>-->
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Guava will still need to be relocated to run properly on a cluster, won't it?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The SDK upgraded to Guava 19 but I guess shading is still necessary for cluster. I'll reinstate the shade configuration.

@tomwhite
Copy link
Member

Looks good to me. Thanks for working on it @amitsela. A few comments inline and here:

  • The license headers should be changed to ASF ones.
  • Since you are reorganising packages, how about keeping only the ones that clients use (SparkPipelineRunner, SparkPipelineOptions, EvaluationResult) in the top-level org.apache.beam.runners.spark package, and moving all the others to subpackages?
  • Remove .gitignore and .travis.yml.

@tomwhite
Copy link
Member

Also, the note about the Spark runner on https://github.com/apache/incubator-beam#runners should be updated to say that it's now a part of Beam.

@amitsela
Copy link
Member Author

Thanks @tomwhite and @davorbonaci !
I'll do a second iteration and hope that by the time I'm done, the Jenkins issue will be solved as well :)

I plan to address the following:

  • Shade configuration
  • ASF licenses
  • Remove .gitignore and .travis.iml
  • Package organization
  • Updated README

@davorbonaci like you said, I think that we need a cross-project pom.xml work to get all components inline, but let's get this runner running first :)

@amitsela
Copy link
Member Author

This pull request is till pending additional work so please DON'T MERGE.
An early push was executed to trigger a Jenkins job to test a new configuration.

Thanks!

@amitsela
Copy link
Member Author

@tomwhite please review second iteration. Thanks.

@tomwhite
Copy link
Member

+1 from me

@asfgit asfgit closed this in a91e115 Mar 15, 2016
aljoscha pushed a commit to aljoscha/beam that referenced this pull request Mar 29, 2018
Wire job service API into portable runner PipelineResults
tvalentyn pushed a commit to tvalentyn/beam that referenced this pull request May 15, 2018
sjvanrossum pushed a commit to sjvanrossum/beam that referenced this pull request May 22, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants