Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BEAM-380] Remove Spark runner dependency on beam-examlpes-java #539

Closed
wants to merge 2 commits into from

Conversation

amitsela
Copy link
Member

@amitsela amitsela commented Jun 27, 2016

Be sure to do all of the following to help us incorporate your contribution
quickly and easily:

  • Make sure the PR title is formatted like:
    [BEAM-<Jira issue #>] Description of pull request
  • Make sure tests pass via mvn clean verify. (Even better, enable
    Travis-CI on your fork and ensure the whole test matrix passes).
  • Replace <Jira issue #> in the title with the actual Jira issue
    number, if there is one.
  • If this contribution is large, please file an Apache
    Individual Contributor License Agreement.

Duplicate WordCount into spark examlpes package.

Duplicate parts of TfIdf from beam examlpes.

Better reuse of WordCount and its parts.

Remove dependency on beam-examples-java

@amitsela
Copy link
Member Author

Following @peihe's comment in #516 this PR removes the dependency of the Spark runner on beam-examples-java.
R: @peihe @davorbonaci

@@ -21,10 +21,10 @@
import static org.junit.Assert.assertEquals;
import static org.junit.Assert.assertTrue;

import org.apache.beam.examples.WordCount;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It tests TextIO numShards.
Does it have to use WordCount?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No it doesn't, it's just been like that... Not a good answer 😄 but I'm definitely prioritising.

@peihe
Copy link
Contributor

peihe commented Jun 27, 2016

FYI
I am also working towards removing the examples dependency with a different approach in here:
#533 (need to fix the failed build)

This will avoid duplicating codes.

@amitsela
Copy link
Member Author

First of all, I'm sorry if I "jumped in", I was looking for an open JIRA because I knew you had this in mind, but when I couldn't find one I opened it and submitted this PR against it.
I don't mind your approach, but the Spark runner needs, for now, it's own WordCount example because it doesn't pass validation when trying to read from HDFS.. probably because the runner still lacks Read.from implementation and IOChannelFactory only has implementations for file/gs.

@peihe
Copy link
Contributor

peihe commented Jun 27, 2016

My fault. I should open a JIRA issue before I start on this. I thought it is a quick change.
For the WordCount part, I like to move it to a top level class, and document why Spark runners need its own copy. (+@dhalperi for Read.from doesn't work for HDFS)
For the Tfidf part, I think I am close to run it with Spark runner in beam-examples module. #533
I am having a spark dependency problem in the test (JavaSparkContext class not found):
https://builds.apache.org/job/beam_PreCommit_MavenVerify/1908/console
Could you help me to take a look? And, you can make a call on whether you want to include the Tfidf in this PR or left to me to handle in #533.

@amitsela
Copy link
Member Author

About the ClassNotFoundException, it's probably because the Spark runner has a provided dependency on Spark, which works in test, but is not transitive, so the examples module will have to have a provided dependency as well.. Which kinda sucks.
Let me think about it...
See: https://maven.apache.org/guides/introduction/introduction-to-dependency-mechanism.html provided dependencies are just intransitive.

@amitsela amitsela closed this Jun 28, 2016
@amitsela amitsela reopened this Jun 28, 2016
@amitsela
Copy link
Member Author

I honestly closed this by accident.. Though I'd be happy to have Jenkins try again since it failed on JavaDoc issues that, while might be valid, are unrelated to this PR.

@amitsela
Copy link
Member Author

Anyone got an idea why Jenkins suddenly fails on JavaDoc issues in classes that aren't related to this PR ?

@kennknowles
Copy link
Member

I was just browsing the output and didn't even see what the error was. FWIW the travis errors are different but also seem unrelated. I've seen the Travis issue before but have forgotten what it was about.

@dhalperi
Copy link
Contributor

dhalperi commented Jul 1, 2016

Hi Amit,

The logs are unfortunately a bit misleading, but there is one Javadoc error:

/home/jenkins/jenkins-slave/workspace/beam_PreCommit_MavenVerify/runners/spark/src/main/java/org/apache/
beam/runners/spark/examples/WordCount.java:38: error: reference not found
 * Duplicated from {@link org.apache.beam.examples.WordCount} to avoid dependency on

The link is not resolved because you removed the dependency on the examples module.

All the other Javadoc stuff is warnings and don't cause breaks.

@dhalperi
Copy link
Contributor

dhalperi commented Jul 1, 2016

Okay, I did a little more digging and here's my summary:

  • Jenkins - legit Javadoc error.
  • Travis - broken because the Travis part of the build was broken for a little bit.

A rebase should fix the latter.

Sela added 2 commits July 4, 2016 15:35
Duplicate WordCount into spark examlpes package.

Duplicate parts of TfIdf from beam examlpes.

Better reuse of WordCount and its parts.

Remove dependency on beam-examples-java
@iemejia
Copy link
Member

iemejia commented Jul 19, 2016

Hello, is there any blocking reason why this has not been merged ?
This will make the spark runner lighter and also from an architecture point of view it makes sense too.

@dhalperi
Copy link
Contributor

LGTM

@asfgit asfgit closed this in cf14644 Jul 23, 2016
johnjcasey pushed a commit to johnjcasey/beam that referenced this pull request Feb 8, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants