Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-33162][INFRA][3.0] Use pre-built image at GitHub Action PySpark jobs #30253

Closed
wants to merge 2 commits into from
Closed

Conversation

dongjoon-hyun
Copy link
Member

@dongjoon-hyun dongjoon-hyun commented Nov 4, 2020

What changes were proposed in this pull request?

This is a backport of #30059 .

This PR aims to use pre-built image at Github Action PySpark jobs. To isolate the changes, pyspark jobs are split from the main job. The docker image is built by the following.

Item URL
Dockerfile https://github.com/dongjoon-hyun/ApacheSparkGitHubActionImage/blob/main/Dockerfile
Builder https://github.com/dongjoon-hyun/ApacheSparkGitHubActionImage/blob/main/.github/workflows/build.yml
Image Location https://hub.docker.com/r/dongjoon/apache-spark-github-action-image

Please note that.

  1. The community still will use build_and_test.yml to add new features like as we did until now. The Dockerfile will be updated regularly.
  2. When Apache Spark gets an official docker repository location, we will use it.
  3. Also, it's the best if we keep this docker file and builder script at a new Apache Spark dev branch instead of outside GitHub repository.

Why are the changes needed?

This will reduce the Python and its package installation time.

BEFORE (branch-3.0)
Screen Shot 2020-11-04 at 2 28 49 PM

AFTER (branch-3.0)
Screen Shot 2020-11-04 at 2 29 43 PM

Does this PR introduce any user-facing change?

No.

How was this patch tested?

Pass the GitHub Action on this PR without package installation steps.

@SparkQA
Copy link

SparkQA commented Nov 4, 2020

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35224/

@SparkQA
Copy link

SparkQA commented Nov 4, 2020

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35224/

@SparkQA
Copy link

SparkQA commented Nov 5, 2020

Test build #130623 has finished for PR 30253 at commit ce70ad6.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@dongjoon-hyun
Copy link
Member Author

dongjoon-hyun commented Nov 5, 2020

New PySpark jobs are passed. SparkR and Linter job failed at R installation. I will backport R job migration in another PR.

Screen Shot 2020-11-04 at 8 18 49 PM

@dongjoon-hyun
Copy link
Member Author

Hi, @HyukjinKwon .
Could you review this PR?

@dongjoon-hyun
Copy link
Member Author

Thank you, @HyukjinKwon ! Merged to branch-3.0.

dongjoon-hyun added a commit that referenced this pull request Nov 5, 2020
…k jobs

### What changes were proposed in this pull request?

This is a backport of #30059 .

This PR aims to use `pre-built image` at Github Action PySpark jobs. To isolate the changes, `pyspark` jobs are split from the main job. The docker image is built by the following.

| Item                   | URL                |
| --------------- | ------------- |
| Dockerfile         | https://github.com/dongjoon-hyun/ApacheSparkGitHubActionImage/blob/main/Dockerfile |
| Builder               | https://github.com/dongjoon-hyun/ApacheSparkGitHubActionImage/blob/main/.github/workflows/build.yml |
| Image Location | https://hub.docker.com/r/dongjoon/apache-spark-github-action-image |

Please note that.
1. The community still will use `build_and_test.yml` to add new features like as we did until now. The `Dockerfile` will be updated regularly.
2. When Apache Spark gets an official docker repository location, we will use it.
3. Also, it's the best if we keep this docker file and builder script at a new Apache Spark dev branch instead of outside GitHub repository.

### Why are the changes needed?

This will reduce the Python and its package installation time.

**BEFORE (branch-3.0)**
![Screen Shot 2020-11-04 at 2 28 49 PM](https://user-images.githubusercontent.com/9700541/98174664-17f2e500-1eaa-11eb-9222-018eead9c418.png)

**AFTER (branch-3.0)**
![Screen Shot 2020-11-04 at 2 29 43 PM](https://user-images.githubusercontent.com/9700541/98174758-378a0d80-1eaa-11eb-8e6a-929158c2fea3.png)

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Pass the GitHub Action on this PR without `package installation steps`.

Closes #30253 from dongjoon-hyun/GHA-3.0.

Authored-by: Dongjoon Hyun <dhyun@apple.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
@dongjoon-hyun dongjoon-hyun deleted the GHA-3.0 branch November 5, 2020 04:38
@SparkQA
Copy link

SparkQA commented Nov 5, 2020

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35244/

@SparkQA
Copy link

SparkQA commented Nov 5, 2020

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35244/

@SparkQA
Copy link

SparkQA commented Nov 5, 2020

Test build #130635 has finished for PR 30253 at commit 80f5b4b.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants