Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-16581][SPARKR] Make JVM backend calling functions public #14775

Closed
wants to merge 8 commits into from

Conversation

shivaram
Copy link
Contributor

What changes were proposed in this pull request?

This change exposes a public API in SparkR to create objects, call methods on the Spark driver JVM

How was this patch tested?

(Please explain how this patch was tested. E.g. unit tests, integration tests, manual tests)

Unit tests, CRAN checks

@shivaram
Copy link
Contributor Author

cc @felixcheung @olarayej

@SparkQA
Copy link

SparkQA commented Aug 23, 2016

Test build #64307 has finished for PR 14775 at commit d267f2f.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

#'
#' Create a new Java object in the JVM running the Spark driver.
#'
#' @param className name of the class to create
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

when certain JVM types are created because our SerDe they don't always come back as jobj - should we call this out?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Java fully qualified class name

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried to qualify this in the comment on what gets returned. The trouble is that we dont have an externally visible documentation of what types will get converted vs. what will not. Also I think this is bound to change with versions (like the SQL decimal change for example).

However this is a very low level API that is only for advanced developers. So I wonder if we should just leave a pointer to the source file ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the behavior could use some explaining and agree that this API is really for advanced or package developers. How about we put a statement or a few words to that effect in the API doc? I think it's ok we don't put a link or describe this in details in the programming guide. Maybe a .md file later?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added some comments in a Details section. Let me know if it reads ok. I agree that having a md file might be useful in the long run.

@felixcheung
Copy link
Member

felixcheung commented Aug 23, 2016

I think the downside of naming them as-is and keeping the signature (... at the end) are that it would be very hard to change or add to the signature later on (say, to add a context/context_id)

How about naming them sparkR.callJMethod or similar for the exported versions to wrap around the internal ones?
this way we don't have to break internal stuff and could just add a new method if/when we want.
we could also add additional parameter checks or translation as needed that might not apply to internal stuff, which could also mitigate risk

#'
#' Call a Java method in the JVM running the Spark driver.
#'
#' @param objId object to invoke the method on. Should be a "jobj" created by newJObject.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if we have a wrapper, it would be better to name this x

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@shivaram
Copy link
Contributor Author

@felixcheung Good point about having a wrapper -- That will make it easier to update the methods going forward. I added a new file jvm.R with the wrapper functions.

@SparkQA
Copy link

SparkQA commented Aug 27, 2016

Test build #64539 has finished for PR 14775 at commit 0959208.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Aug 27, 2016

Test build #64540 has finished for PR 14775 at commit 448de0c.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@@ -11,7 +11,7 @@ Authors@R: c(person("Shivaram", "Venkataraman", role = c("aut", "cre"),
email = "felixcheung@apache.org"),
person(family = "The Apache Software Foundation", role = c("aut", "cph")))
URL: http://www.apache.org/ http://spark.apache.org/
BugReports: https://issues.apache.org/jira/secure/CreateIssueDetails!init.jspa?pid=12315420&components=12325400&issuetype=4
BugReports: http://issues.apache.org/jira/browse/SPARK
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps that's too long but that opens directly to a form with the component field preset to SparkR - otherwise it seems easy to get lost?
Also BugReports is supposed to be forbug.report(package = "SparkR")

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Other than being long that link also doesn't seem to work if you are not logged in (would be good if you can also check this). The other thing we could do is to just link to the wiki page at https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark#ContributingtoSpark-ContributingBugReports -- Do you think that is better ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah, I thought I tested it logged out. You are right, let's scratch that then.

I like the idea with the wiki - though should that mention when to check with user@spark, when to email dev@spark and when to open a JIRA?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I updated the wiki - Let me know if it looks better or if you have other suggestions.

@felixcheung
Copy link
Member

github seems to like to hide stuff.
FYI, this comment #14775 (comment)
and this #14775 (comment)

@shivaram
Copy link
Contributor Author

Thanks @felixcheung - Addressed both the comments. Let me know if this looks good.

@SparkQA
Copy link

SparkQA commented Aug 29, 2016

Test build #64585 has finished for PR 14775 at commit d1ec80b.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@felixcheung
Copy link
Member

LGTM. Thanks.

asfgit pushed a commit that referenced this pull request Aug 29, 2016
## What changes were proposed in this pull request?

This change exposes a public API in SparkR to create objects, call methods on the Spark driver JVM

## How was this patch tested?

(Please explain how this patch was tested. E.g. unit tests, integration tests, manual tests)

Unit tests, CRAN checks

Author: Shivaram Venkataraman <shivaram@cs.berkeley.edu>

Closes #14775 from shivaram/sparkr-java-api.

(cherry picked from commit 736a791)
Signed-off-by: Shivaram Venkataraman <shivaram@cs.berkeley.edu>
@asfgit asfgit closed this in 736a791 Aug 29, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
5 participants