Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-8607] SparkR -- jars not being added to application classpath correctly #7001

Closed
wants to merge 9 commits into from

Conversation

cafreeman
Copy link

Add getStaticClass method in SparkR's RBackendHandler

This is a fix for the problem referenced in SPARK-5185.

cc @shivaram

Add `getStaticClass` method in SparkR's `RBackendHandler`
@cafreeman
Copy link
Author

JIRA issue open here: https://issues.apache.org/jira/browse/SPARK-8607

@@ -88,6 +88,19 @@ private[r] class RBackendHandler(server: RBackend)
ctx.close()
}

//get classPath for static object. This addresses SPARK-5185
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Style nit: Space after // - Also could you expand the comment to say Looks up a class given a class name. This function first checks the current class loader and if a class is not found, it looks up the class in the context class loader.

@shivaram
Copy link
Contributor

Jenkins, add to whitelist

@shivaram
Copy link
Contributor

Jenkins, ok to test

@SparkQA
Copy link

SparkQA commented Jun 24, 2015

Test build #35732 has finished for PR 7001 at commit 7c6bd0c.

  • This patch fails Scala style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@cafreeman
Copy link
Author

@shivaram Fixed style nits.

@shivaram
Copy link
Contributor

@cafreeman Code change looks good -- I'm wondering if there is way to add a test for this in SparkR. We could add a dummy jar file and then try and see if we can call into that ? @davies Do we have a dummy jar file we use for similar tests in Python / Java etc.

@SparkQA
Copy link

SparkQA commented Jun 25, 2015

Test build #35738 has finished for PR 7001 at commit 5a80844.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@shivaram
Copy link
Contributor

Jenkins, retest this please

@SparkQA
Copy link

SparkQA commented Jun 25, 2015

Test build #35742 has finished for PR 7001 at commit 5a80844.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@cafreeman
Copy link
Author

@shivaram where would we store test data? I have a JAR we could probably use.

@shivaram
Copy link
Contributor

We could put it in in R/pkg/inst/test_support or something like that. It would be similar to the python files at https://github.com/apache/spark/tree/master/python/test_support

@cafreeman
Copy link
Author

@shivaram Just committed a first draft of a test, but there are two issues:

  1. The current build appears to be broken. I'm getting issues about sparkPackages being an unused argument and "object 'sparkSubmitBinName' not found". Is this branch currently in a state of flux?
  2. I'm using a relative path in the test to point to the JAR (which has been placed in test_support), and I'm not sure how that will play. I can't actually run it right now though because I can't launch a spark context due to the first issue.

@shivaram
Copy link
Contributor

Sorry the first one was a bug we introduced yesterday. Let me send a hotfix for it right now

@shivaram
Copy link
Contributor

@cafreeman The worrying thing is that the tests seem to pass right now on Jenkins and on my machine if I run ./run-tests.sh -- If you get a chance it'll be good to figure out why that is happening

@shivaram
Copy link
Contributor

@cafreeman Created #7022 -- Could you take a look at it ?

@shivaram
Copy link
Contributor

@cafreeman Regarding the test itself, a couple of things

  1. Could we have a smaller JAR file ? This one looks like its around 5MB which is pretty big :) Instead of using sbt assembly to generate it, could you try to use sbt package ? That should make it smaller
  2. Regarding the paths, you can use SPARK_HOME and refer things relative to that. See https://github.com/apache/spark/blob/master/python/pyspark/tests.py#L400 for an example in Python

@SparkQA
Copy link

SparkQA commented Jun 25, 2015

Test build #35798 has finished for PR 7001 at commit 9a5c362.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@cafreeman
Copy link
Author

@shivaram Alright, I've uploaded a smaller JAR and switched to an absolute filepath using SPARK_HOME. So far this test runs successfully in both the sparkR shell and in regular R, but fails when triggered along with the entire test suite, but I'm not sure why. I added a sparkR.stop() so we could reboot the Spark Context and add the JAR, but it's still having trouble when it's run with run-tests.sh

@SparkQA
Copy link

SparkQA commented Jun 25, 2015

Test build #35810 has finished for PR 7001 at commit 0bea809.

  • This patch fails SparkR unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@shivaram
Copy link
Contributor

Thanks @cafreeman for the update. I'll try this out today and see why run-tests.sh is not picking it.

@shivaram
Copy link
Contributor

So I took a look at this and I managed to figure out why the tests don't work but why it works from sparkR shell. Its due to the fact we have two different ways in which things are initialized

  1. For the sparkR shell (and RStudio), the R process comes up first and then the JVM is launched when the spark context is created. In this case if we pass in sparkJars they are added to the spark-submit command and things work fine.
  2. For the unit tests (and any script run with spark-submit) the JVM comes up first and forks the R process. In this case adding sparkJars to the context has no effect as the JVM is already running. In this case the user needs to run sparkR --jars <jarFile> to get things to work correctly.

For the unit tests we can't test the first case as it is a batch script. But we can test the second case by actually running the sparkR script from the unit test. This would be something similar to https://github.com/apache/spark/blob/master/python/pyspark/tests.py#L1641

cafreeman added 2 commits June 26, 2015 12:58
`test_includeJAR.R` now executes an external sparkR script which, in turn, initializes a new spark context with the test JAR, runs the test functions, and returns the output to the original test.

As a result, we can now run this test inside the SparkR test suite and get around the fact that we can't add the "sparkJars" argument to `sparkR.init()` once the JVM has already started.
@cafreeman
Copy link
Author

@shivaram Took your advice and changed things so that the test runs another script using R's system2. The tests work now :)

@SparkQA
Copy link

SparkQA commented Jun 26, 2015

Test build #35876 has finished for PR 7001 at commit 8f81194.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@shivaram
Copy link
Contributor

Hmm - weirdly the Scala spark-submit test failed. Lets give this another go and see if it was just a flaky test.

Jenkins, retest this please

@shivaram
Copy link
Contributor

Jenkins, retest this please

@SparkQA
Copy link

SparkQA commented Jun 26, 2015

Test build #35884 has finished for PR 7001 at commit 8f81194.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@shivaram
Copy link
Contributor

LGTM. That test case looks great. Thanks @cafreeman

@cafreeman
Copy link
Author

No problem!

asfgit pushed a commit that referenced this pull request Jun 27, 2015
…correctly

Add `getStaticClass` method in SparkR's `RBackendHandler`

This is a fix for the problem referenced in [SPARK-5185](https://issues.apache.org/jira/browse/SPARK-5185).

cc shivaram

Author: cafreeman <cfreeman@alteryx.com>

Closes #7001 from cafreeman/branch-1.4 and squashes the following commits:

8f81194 [cafreeman] Add missing license
31aedcf [cafreeman] Refactor test to call an external R script
2c22073 [cafreeman] Merge branch 'branch-1.4' of github.com:apache/spark into branch-1.4
0bea809 [cafreeman] Fixed relative path issue and added smaller JAR
ee25e60 [cafreeman] Merge branch 'branch-1.4' of github.com:apache/spark into branch-1.4
9a5c362 [cafreeman] test for including JAR when launching sparkContext
9101223 [cafreeman] Merge branch 'branch-1.4' of github.com:apache/spark into branch-1.4
5a80844 [cafreeman] Fix style nits
7c6bd0c [cafreeman] [SPARK-8607] SparkR
@asfgit asfgit closed this in 9d11817 Jun 27, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants