Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-22344][SPARKR] clean up install dir if running test as source package #19657

Closed
wants to merge 9 commits into from

Conversation

felixcheung
Copy link
Member

@felixcheung felixcheung commented Nov 4, 2017

What changes were proposed in this pull request?

remove spark if spark downloaded & installed

How was this patch tested?

manually by building package
Jenkins, AppVeyor

@felixcheung
Copy link
Member Author

odd build failure

[EnvInject] - Variables injected successfully.
[SparkPullRequestBuilder] $ /bin/bash /tmp/hudson3188739775964134398.sh
fixing target dir permissions
chmod: cannot access `target/*': No such file or directory
running git clean -fdx
Python versions prior to 2.7 are not supported.

@felixcheung
Copy link
Member Author

Jenkins, retest this please

@felixcheung
Copy link
Member Author

need to update vignettes build... and realizing that we are downloading the spark jar twice, once for test and once for vignettes

@HyukjinKwon
Copy link
Member

retest this please

@SparkQA
Copy link

SparkQA commented Nov 4, 2017

Test build #83442 has finished for PR 19657 at commit d4433e1.

  • This patch fails SparkR unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@felixcheung
Copy link
Member Author

retest this please

@SparkQA
Copy link

SparkQA commented Nov 5, 2017

Test build #83458 has finished for PR 19657 at commit 0ea7c9b.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@felixcheung
Copy link
Member Author

@shivaram could you take a look? I think this would do it

Copy link
Contributor

@shivaram shivaram left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @felixcheung - I think the approach looks good. I had some minor questions

@@ -152,6 +152,9 @@ install.spark <- function(hadoopVersion = "2.7", mirrorUrl = NULL,
})
if (!tarExists || overwrite || !success) {
unlink(packageLocalPath)
if (success) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why should this be in inside this if block for overwrite || !success -- Can't we just have it outside this if as an else below ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

basically if success is TRUE then only the other cases matter: !tarExists || overwrite - that's the exact condition where the download would have occurred in L126 which is the else case of tarExists && !overwrite on L123

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm ok that still looks weird in the code. Maybe add a comment before this of the form If we downloaded a tarfile or overwrote it, set sparkDownloaded flag ?

```{r cleanup, include=FALSE}
# clean up if Spark was downloaded
# get0 not supported before R 3.2.0
sparkDownloaded <- mget(".sparkDownloaded"[1L],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we make this an internal util function and call it from here ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes! but both call sites are outside of the package technically and we would need to call the private function with SparkR:::, which is kinda ugly...

I guess we could wrap the entire cleanup thing into a private/internal function (since it has to access a private flag anyway..)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah - lets do that. I might even be fine with exposing an external function 'uninstallSpark' or 'uninstallDownloadedSpark' ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since this needs to go into 2.2, let's not add a public method for now, we could revisit this for 2.3

parentDir <- SparkR:::sparkCachePath()
dirs <- list(parentDir, dirname(parentDir), dirname(dirname(parentDir)))
lapply(dirs, function(d) {
if (length(list.files(d, all.files = TRUE, include.dirs = TRUE, no.. = TRUE)) == 0) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One consequence of this is that if we run R CMD check --as-cran we will do the download twice -- once for the unit tests and once for the vignettes

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, it would, as commented above #19657 (comment)
problem is we have no idea whether the vignettes build is going to happen or not (it could easily be disabled via commandline)

@SparkQA
Copy link

SparkQA commented Nov 7, 2017

Test build #83542 has finished for PR 19657 at commit 31f3bd0.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Nov 7, 2017

Test build #83544 has finished for PR 19657 at commit ca5349b.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

"c:\\Users\\user\\AppData\\Local\\Apache\\Spark",
"c:\\Users\\user\\AppData\\Local\\Apache")
} else {
dirs <- traverseParentDirs("/Users/user/Library/Caches/spark/spark2.2", 1)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we also test the linux one (/home/user/.cache) - Just want to make sure we will not miss hidden files / directories.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure, but well hopefully the implementation is not platform dependent, otherwise we will need to test linux as well as osx
(and it doesn't check if the path is valid/present)

@shivaram
Copy link
Contributor

shivaram commented Nov 7, 2017

Thanks @felixcheung -- The Appveyor test seems to have failed with the following err

1. Failure: traverseParentDirs (@test_utils.R#255) -----------------------------
`dirs` not equal to `expect`.
3/4 mismatches
x[2]: "c:/Users/user/AppData/Local/Apache/Spark/Cache"
y[2]: "c:\\Users\\user\\AppData\\Local\\Apache\\Spark\\Cache"
x[3]: "c:/Users/user/AppData/Local/Apache/Spark"
y[3]: "c:\\Users\\user\\AppData\\Local\\Apache\\Spark"
x[4]: "c:/Users/user/AppData/Local/Apache"
y[4]: "c:\\Users\\user\\AppData\\Local\\Apache"

@HyukjinKwon could you also take a quick look at this PR ?

@SparkQA
Copy link

SparkQA commented Nov 7, 2017

Test build #83553 has finished for PR 19657 at commit f2aa5b7.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Nov 7, 2017

Test build #83554 has finished for PR 19657 at commit f21a90b.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@shivaram
Copy link
Contributor

shivaram commented Nov 7, 2017

AppVeyor still has an error

1. Failure: traverseParentDirs (@test_utils.R#252) -----------------------------
`dirs` not equal to `expect`.
1/4 mismatches
x[1]: "c:\\Users\\user\\AppData\\Local\\Apache\\Spark\\Cache\\spark2.2"
y[1]: "c:/Users/user/AppData/Local/Apache/Spark/Cache/spark2.2"

@HyukjinKwon
Copy link
Member

HyukjinKwon commented Nov 7, 2017

Will take a look within today (KST).

@HyukjinKwon
Copy link
Member

Yup, I just checked it too and was writing a comment .. The current change should pass :).

@SparkQA
Copy link

SparkQA commented Nov 8, 2017

Test build #83582 has finished for PR 19657 at commit 18e238a.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@felixcheung
Copy link
Member Author

build failure this time

@felixcheung felixcheung closed this Nov 8, 2017
@felixcheung felixcheung reopened this Nov 8, 2017
@felixcheung
Copy link
Member Author

@HyukjinKwon hey I think the appveyor test pass is just timing out after 1 hr 30 min - is there a way to up the timeout?

@HyukjinKwon
Copy link
Member

HyukjinKwon commented Nov 9, 2017

I actually took a look to decrease the build time (as you know) and am currently away from it. If I remember correctly, what I observed was that a single(?) particular test takes 20ish(?) mins. It was related with ML in R.

Let me try to take a look again first and will leave some comments about what I investigated in SPARK-21693 if I can't deal with it by myself (probably by my limited ML knowledge).

If that's actually not that quite simple, then, let me ask it to increase 2 hours (like my own account).
In AppVeyor, sounds they actually recommend to separate the build as I proposed in the JIRA or reduce the time ..

@felixcheung
Copy link
Member Author

ok thanks, in that case, would you mind cherry pick these changes into your account to run under appveyor - fixing test run is lower priority than getting this merged to kick off 2.2.1... :) thanks

@felixcheung felixcheung closed this Nov 9, 2017
@felixcheung felixcheung reopened this Nov 9, 2017
@HyukjinKwon
Copy link
Member

HyukjinKwon commented Nov 9, 2017

Sure, I can! .. but seems something has gone wrong with package installiation .. Let me trigger it by my account anyway. I can retrigger it.

@HyukjinKwon
Copy link
Member

Build started: [SparkR] ALL PR-19657
Diff: master...spark-test:33DABC16-224C-40D8-8A29-EF9CFA72625E

@felixcheung
Copy link
Member Author

ouch

Error in packageVersion("knitr") : package 'knitr' not found
[00:03:37] Execution halted

@felixcheung felixcheung closed this Nov 9, 2017
@felixcheung felixcheung reopened this Nov 9, 2017
@HyukjinKwon
Copy link
Member

Looks mine was set to 1.5 hours back lately ..

@HyukjinKwon
Copy link
Member

Oh but the test passed now with Apache account here.

asfgit pushed a commit that referenced this pull request Nov 10, 2017
…package

## What changes were proposed in this pull request?

remove spark if spark downloaded & installed

## How was this patch tested?

manually by building package
Jenkins, AppVeyor

Author: Felix Cheung <felixcheung_m@hotmail.com>

Closes #19657 from felixcheung/rinstalldir.

(cherry picked from commit b70aa9e)
Signed-off-by: Felix Cheung <felixcheung@apache.org>
@felixcheung
Copy link
Member Author

thanks! merged to master, 2.2.
2.1 attempt had conflict, so leaving that out for now.

@asfgit asfgit closed this in b70aa9e Nov 10, 2017
MatthewRBruce pushed a commit to Shopify/spark that referenced this pull request Jul 31, 2018
…package

## What changes were proposed in this pull request?

remove spark if spark downloaded & installed

## How was this patch tested?

manually by building package
Jenkins, AppVeyor

Author: Felix Cheung <felixcheung_m@hotmail.com>

Closes apache#19657 from felixcheung/rinstalldir.

(cherry picked from commit b70aa9e)
Signed-off-by: Felix Cheung <felixcheung@apache.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
4 participants