Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-33408][K8S][R][3.0] Use R 3.6.3 in K8s R image #30310

Closed
wants to merge 2 commits into from
Closed

[SPARK-33408][K8S][R][3.0] Use R 3.6.3 in K8s R image #30310

wants to merge 2 commits into from

Conversation

dongjoon-hyun
Copy link
Member

@dongjoon-hyun dongjoon-hyun commented Nov 10, 2020

What changes were proposed in this pull request?

This PR aims to upgrade K8s R image to use R 3.6.3 which is the same version installed in Jenkins Servers.

Why are the changes needed?

Jenkins Server is using R 3.6.3.

+ SPARK_HOME=/home/jenkins/workspace/SparkPullRequestBuilder-K8s
+ /usr/bin/R CMD check --as-cran --no-tests SparkR_3.1.0.tar.gz
* using log directory ‘/home/jenkins/workspace/SparkPullRequestBuilder-K8s/R/SparkR.Rcheck’
* using R version 3.6.3 (2020-02-29)

OpenJDK docker image is using R 3.5.2 (2018-12-20) which is old and currently spark-3.0.1 fails to run SparkR.

$ cd spark-3.0.1-bin-hadoop3.2

$ bin/docker-image-tool.sh -R kubernetes/dockerfiles/spark/bindings/R/Dockerfile -n build

$ bin/spark-submit --master k8s://https://192.168.64.49:8443 --deploy-mode cluster --conf spark.kubernetes.container.image=spark-r:latest local:///opt/spark/examples/src/main/r/dataframe.R

$ k logs dataframe-r-b1c14b75b0c09eeb-driver
...
+ exec /usr/bin/tini -s -- /opt/spark/bin/spark-submit --conf spark.driver.bindAddress=172.17.0.4 --deploy-mode client --properties-file /opt/spark/conf/spark.properties --class org.apache.spark.deploy.RRunner local:///opt/spark/examples/src/main/r/dataframe.R
20/11/10 06:03:58 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
log4j:WARN No appenders could be found for logger (io.netty.util.internal.logging.InternalLoggerFactory).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
Error: package or namespace load failed for ‘SparkR’ in rbind(info, getNamespaceInfo(env, "S3methods")):
 number of columns of matrices must match (see arg 2)
In addition: Warning message:
package ‘SparkR’ was built under R version 4.0.2
Execution halted

Does this PR introduce any user-facing change?

How was this patch tested?

Pass K8s IT.

@dongjoon-hyun dongjoon-hyun marked this pull request as draft November 10, 2020 05:52
@dongjoon-hyun dongjoon-hyun changed the title [SPARK-33408][K8S][R] Use R 4.0 in K8s R image [SPARK-33408][K8S][R][3.0] Use R 4.0 in K8s R image Nov 10, 2020
@HyukjinKwon
Copy link
Member

HyukjinKwon commented Nov 10, 2020

Looks good once the tests pass. I guess the fix will also go to the master, right?

Also, I think we can just drop 3.5 and change the minimum R version to 3.6 as well if this doesn't work. Should change the files fixed in #28908. If I remember correctly, I manually tested it only with R 3.6. R 3.5 was not tested - I just assumed that it works based on the documentation.

@dongjoon-hyun
Copy link
Member Author

dongjoon-hyun commented Nov 10, 2020

Thank you, @HyukjinKwon . I made a patch for master before this. Here~

Yes, although I tested this manually, we need to pass K8s CI.

@dongjoon-hyun
Copy link
Member Author

This PR is using R 4.0 in K8s R image only. Technically, this depends on the base-OS image used openjdk docker image. So, I believe it will be better to install R 4.0 explicitly.

@dongjoon-hyun dongjoon-hyun marked this pull request as ready for review November 10, 2020 06:32
@SparkQA

This comment has been minimized.

@SparkQA

This comment has been minimized.

@SparkQA

This comment has been minimized.

@dongjoon-hyun dongjoon-hyun marked this pull request as draft November 10, 2020 08:19
@dongjoon-hyun dongjoon-hyun changed the title [SPARK-33408][K8S][R][3.0] Use R 4.0 in K8s R image [SPARK-33408][K8S][R][3.0] Use R 3.6.3 in K8s R image Nov 12, 2020
@SparkQA
Copy link

SparkQA commented Nov 12, 2020

Test build #130963 has finished for PR 30310 at commit 3babd50.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Nov 12, 2020

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35569/

@SparkQA
Copy link

SparkQA commented Nov 12, 2020

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35569/

@dongjoon-hyun dongjoon-hyun marked this pull request as ready for review November 12, 2020 05:46
@dongjoon-hyun
Copy link
Member Author

Hi, @HyukjinKwon .
Could you review this again?

@dongjoon-hyun
Copy link
Member Author

Hi, @viirya
Could you review this?

@HyukjinKwon
Copy link
Member

Merged to branch-3.0.

HyukjinKwon pushed a commit that referenced this pull request Nov 12, 2020
### What changes were proposed in this pull request?

This PR aims to upgrade K8s R image to use R 3.6.3 which is the same version installed in Jenkins Servers.

### Why are the changes needed?

Jenkins Server is using `R 3.6.3`.
```
+ SPARK_HOME=/home/jenkins/workspace/SparkPullRequestBuilder-K8s
+ /usr/bin/R CMD check --as-cran --no-tests SparkR_3.1.0.tar.gz
* using log directory ‘/home/jenkins/workspace/SparkPullRequestBuilder-K8s/R/SparkR.Rcheck’
* using R version 3.6.3 (2020-02-29)
```

OpenJDK docker image is using `R 3.5.2 (2018-12-20)` which is old and currently `spark-3.0.1` fails to run SparkR.
```
$ cd spark-3.0.1-bin-hadoop3.2

$ bin/docker-image-tool.sh -R kubernetes/dockerfiles/spark/bindings/R/Dockerfile -n build

$ bin/spark-submit --master k8s://https://192.168.64.49:8443 --deploy-mode cluster --conf spark.kubernetes.container.image=spark-r:latest local:///opt/spark/examples/src/main/r/dataframe.R

$ k logs dataframe-r-b1c14b75b0c09eeb-driver
...
+ exec /usr/bin/tini -s -- /opt/spark/bin/spark-submit --conf spark.driver.bindAddress=172.17.0.4 --deploy-mode client --properties-file /opt/spark/conf/spark.properties --class org.apache.spark.deploy.RRunner local:///opt/spark/examples/src/main/r/dataframe.R
20/11/10 06:03:58 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
log4j:WARN No appenders could be found for logger (io.netty.util.internal.logging.InternalLoggerFactory).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
Error: package or namespace load failed for ‘SparkR’ in rbind(info, getNamespaceInfo(env, "S3methods")):
 number of columns of matrices must match (see arg 2)
In addition: Warning message:
package ‘SparkR’ was built under R version 4.0.2
Execution halted
```

### Does this PR introduce _any_ user-facing change?

### How was this patch tested?

Pass K8s IT.

Closes #30310 from dongjoon-hyun/SPARK-33408.

Lead-authored-by: Dongjoon Hyun <dongjoon@apache.org>
Co-authored-by: Dongjoon Hyun <dhyun@apple.com>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
Copy link
Member

@viirya viirya left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@dongjoon-hyun
Copy link
Member Author

Thank you, @HyukjinKwon and @viirya !

@dongjoon-hyun dongjoon-hyun deleted the SPARK-33408 branch November 12, 2020 07:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants