Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-11244][SPARKR] sparkR.stop() should remove SQLContext #9205

Closed
wants to merge 1 commit into from

Conversation

saurfang
Copy link
Contributor

SparkR should remove .sparkRSQLsc and .sparkRHivesc when sparkR.stop() is called. Otherwise even when SparkContext is reinitialized, sparkRSQL.init returns the stale copy of the object and complains:

sc <- sparkR.init("local")
sqlContext <- sparkRSQL.init(sc)
sparkR.stop()
sc <- sparkR.init("local")
sqlContext <- sparkRSQL.init(sc)
sqlContext

producing

Error in callJMethod(x, "getClass") : 
  Invalid jobj 1. If SparkR was restarted, Spark operations need to be re-executed.

I have added the check and removal only when SparkContext itself is initialized. I have also added corresponding test for this fix. Let me know if you want me to move the test to SQL test suite instead.

p.s. I tried lint-r but ended up a lots of errors on existing code.

@shivaram
Copy link
Contributor

Thanks @saurfang -- Test is fine in the current location. Just curious what are the lint-r errors you saw ?

@saurfang
Copy link
Contributor Author

Most of them are "style: Commented code should be removed."

For example:

R/RDD.R:260:3: style: Commented code should be removed.
# unpersist(rdd) # rdd@@env$isCached == FALSE
  ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
R/RDD.R:283:3: style: Commented code should be removed.
# sc <- sparkR.init()
  ^~~~~~~~~~~~~~~~~~~
R/RDD.R:284:3: style: Commented code should be removed.
# setCheckpointDir(sc, "checkpoint")
  ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Looks like to be a new feature introduced at: r-lib/lintr#83
Also shouldn't # be #' for roxygen doc?

@shivaram
Copy link
Contributor

Ah I see. Hmm we use # instead of #' for a bunch of cases where we dont want roxygen to parse it as the functions are not part of the public API. Not sure if there is a better way to do this.

cc @yu-iskw

@SparkQA
Copy link

SparkQA commented Oct 21, 2015

Test build #44086 has finished for PR 9205 at commit 7f51549.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@saurfang
Copy link
Contributor Author

I see. Thanks for the information. I thought the convention in R package is to expose as much documentation as possible but don't export functions that is unstable or developer only. If users choose to use them, they acknowledge the risk by using :::. The linter complains only for commented R code rather than any comments. I feel if we already spent the time to write examples, both users and package developers should and would benefit from these.

@sun-rui
Copy link
Contributor

sun-rui commented Oct 22, 2015

is it possible for lin-r to skip # comment with code outside a function body?

@saurfang
Copy link
Contributor Author

I still advocate making developer API documentation public. However I think one workaround is to make them roxygen doc again with #' but add #' @rdname .ignore for each function. Roxygen will skip these with message Skipping invalid path: .ignore.Rd

@shivaram
Copy link
Contributor

@saurfang Can you open a new JIRA for these style failures ? We can discuss the points there.

LGTM. Merging this in master and branch-1.5

@shivaram
Copy link
Contributor

cc @olarayej

asfgit pushed a commit that referenced this pull request Oct 22, 2015
SparkR should remove `.sparkRSQLsc` and `.sparkRHivesc` when `sparkR.stop()` is called. Otherwise even when SparkContext is reinitialized, `sparkRSQL.init` returns the stale copy of the object and complains:

```r
sc <- sparkR.init("local")
sqlContext <- sparkRSQL.init(sc)
sparkR.stop()
sc <- sparkR.init("local")
sqlContext <- sparkRSQL.init(sc)
sqlContext
```
producing
```r
Error in callJMethod(x, "getClass") :
  Invalid jobj 1. If SparkR was restarted, Spark operations need to be re-executed.
```

I have added the check and removal only when SparkContext itself is initialized. I have also added corresponding test for this fix. Let me know if you want me to move the test to SQL test suite instead.

p.s. I tried lint-r but ended up a lots of errors on existing code.

Author: Forest Fang <forest.fang@outlook.com>

Closes #9205 from saurfang/sparkR.stop.

(cherry picked from commit 94e2064)
Signed-off-by: Shivaram Venkataraman <shivaram@cs.berkeley.edu>
@asfgit asfgit closed this in 94e2064 Oct 22, 2015
@saurfang saurfang deleted the sparkR.stop branch October 22, 2015 19:25
@saurfang
Copy link
Contributor Author

Thanks @shivaram! Opened https://issues.apache.org/jira/browse/SPARK-11263 for further discussion.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
4 participants