Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-40746][INFRA] Switch topull_request_target and fix workflow #5

Closed
wants to merge 2 commits into from

Conversation

Yikun
Copy link
Member

@Yikun Yikun commented Oct 11, 2022

What changes were proposed in this pull request?

This patch is to make the workflow work in apache repo:

  • Swtich event from pull_request to pull_request_target to make sure PR has permision to push test image to ghcr.io/apache/spark-docker/spark:TAG. This real push image will help us do K8s E2E test in future.
  • Add .github/workflows/build_3.3.0.yaml and 3.3.0/** to trigger paths
  • Add packages write permissions
  • Change apache/spark-docker:TAG to ghcr.io/apache/spark-docker/spark:TAG

Why are the changes needed?

To make the workflow works well in apache repo.

Does this PR introduce any user-facing change?

No

How was this patch tested?

The CI will only work after this patch merged, it won't be triggered in this PR (due to pull_request_target)

We can't do complete test before this merge but:

@Yikun Yikun changed the title [SPARK-40516][INFRA][FOLLOWUP] Switch topull_request_target and fix workflow [SPARK-40746][INFRA] Switch topull_request_target and fix workflow Oct 11, 2022
@Yikun Yikun marked this pull request as ready for review October 11, 2022 09:23
@Yikun
Copy link
Member Author

Yikun commented Oct 11, 2022

cc @HyukjinKwon @martin-g

- '.github/workflows/main.yml'

jobs:
run-build:
permissions:
packages: write
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this needed ? This workflow just delegates to main.yml where this is also specified.
It does not do any harm either!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's needed see also: apache/spark#38145

.github/workflows/main.yml Outdated Show resolved Hide resolved
Co-authored-by: Martin Grigorov <martin-g@users.noreply.github.com>
@@ -98,7 +102,7 @@ jobs:
with:
context: ${{ env.IMAGE_PATH }}
push: true
tags: ${{ env.TEST_REPO }}:${{ env.UNIQUE_IMAGE_TAG }}
tags: ${{ env.TEST_REPO }}/${{ env.IMAGE_NAME }}:${{ env.UNIQUE_IMAGE_TAG }}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Emm, this might has some issue when multiple concurrency jobs push the same tag. Of course, we could use action concurrency or add some random hash tag to protect.

But after some consideration, I guess what we need to do just remove push in this PR

push: true

And change it to local build in minikube docker (don't do a real push) when we adding K8s test.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR is replaced by #7

@Yikun Yikun closed this in c116698 Oct 12, 2022
Yikun added a commit that referenced this pull request Oct 17, 2022
### What changes were proposed in this pull request?
This patch:
- Add spark uid/gid in dockerfile (useradd and groupadd). (used in entrypoint) This way is also used by [others DOI](https://github.com/search?p=2&q=org%3Adocker-library+useradd&type=Code) and apache DOI (such as [zookeeper](https://github.com/31z4/zookeeper-docker/blob/master/3.8.0/Dockerfile#L17-L21), [solr](https://github.com/apache/solr-docker/blob/a20477ed123cd1a72132aebcc0742cee46b5f976/9.0/Dockerfile#L108-L110), [flink](https://github.com/apache/flink-docker/blob/master/1.15/scala_2.12-java11-ubuntu/Dockerfile#L55-L56)).
- Use `spark` user in `entrypoint.sh` rather than Dockerfile. (make sure the spark process is executed as non-root users)
- Remove `USER` setting in Dockerfile. (make sure base image has permission to extend dockerifle, such as execute `apt update`)
- Chown script to `spark:spark` instead of `root:root`. (avoid permission issue such like standalone mode)
- Add `gosu` deps, a `sudo` replacement recommanded by [docker](https://docs.docker.com/develop/develop-images/dockerfile_best-practices/#user) and [docker official image](https://github.com/docker-library/official-images/blob/9a4d54f1a42ea82970baa4e6f3d0bc75e98fc961/README.md#consistency), and also are used by other DOI images.

This change also follow the rules of docker official images, see also [consistency](https://github.com/docker-library/official-images/blob/9a4d54f1a42ea82970baa4e6f3d0bc75e98fc961/README.md#consistency) and [dockerfile best practices about user](https://docs.docker.com/develop/develop-images/dockerfile_best-practices/#user).

### Why are the changes needed?

The below issues are what I have found so far

1. **Irregular login username**
  Docker images username is not very standard, docker run with `185` username is a little bit weird.

  ```
  $ docker run -ti apache/spark bash
  185d88a24357413:/opt/spark/work-dir$
  ```

2. **Permission issue of spark sbin**
And also there are some permission issue when running some spark script, such as standalone mode:

  ```
  $ docker run -ti apache/spark /opt/spark/sbin/start-master.sh

  mkdir: cannot create directory ‘/opt/spark/logs’: Permission denied
  chown: cannot access '/opt/spark/logs': No such file or directory
  starting org.apache.spark.deploy.master.Master, logging to /opt/spark/logs/spark--org.apache.spark.deploy.master.Master-1-1c345a00e312.out
  /opt/spark/sbin/spark-daemon.sh: line 135: /opt/spark/logs/spark--org.apache.spark.deploy.master.Master-1-1c345a00e312.out: No such file or directory
  failed to launch: nice -n 0 /opt/spark/bin/spark-class org.apache.spark.deploy.master.Master --host 1c345a00e312 --port 7077 --webui-port 8080
  tail: cannot open '/opt/spark/logs/spark--org.apache.spark.deploy.master.Master-1-1c345a00e312.out' for reading: No such file or directory
  full log in /opt/spark/logs/spark--org.apache.spark.deploy.master.Master-1-1c345a00e312.out
  ```

  </details>

3. **spark as base image case is not supported well**
  Due to static USER set in Dockerfile.
  ```
  $ cat Dockerfile
  FROM apache/spark
  RUN apt update

  $  docker build -t spark-test:1015 .
  // ...
  ------
   > [2/2] RUN apt update:
  #5 0.405 E: Could not open lock file /var/lib/apt/lists/lock - open (13: Permission denied)
  #5 0.405 E: Unable to lock directory /var/lib/apt/lists/
  ------
  executor failed running [/bin/sh -c apt update]: exit code: 100

  ```

### Does this PR introduce _any_ user-facing change?
Yes.

### How was this patch tested?
- CI passed: all k8s test

- Regression test:
```
# Username is set to spark rather than 185
docker run -ti spark:scala2.12-java11-python3-r-ubuntu bash
spark27bbfca0a581:/opt/spark/work-dir$
```
```
# start-master.sh no permission issue
$ docker run -ti spark:scala2.12-java11-python3-r-ubuntu bash

spark8d1118e26766:~/work-dir$ /opt/spark/sbin/start-master.sh
starting org.apache.spark.deploy.master.Master, logging to /opt/spark/logs/spark--org.apache.spark.deploy.master.Master-1-8d1118e26766.out
```
```
# Image as parent case
$ cat Dockerfile
FROM spark:scala2.12-java11-python3-r-ubuntu
RUN apt update
$ docker build -t spark-test:1015 .
[+] Building 7.8s (6/6) FINISHED
 => [1/2] FROM docker.io/library/spark:scala2.12-java11-python3-r-ubuntu                                                                                                              0.0s
 => [2/2] RUN apt update                                                                                                                                                              7.7s
```

- Other test:
```
# Test on pyspark
$ cd spark-docker/3.3.0/scala2.12-java11-python3-r-ubuntu
$ docker build -t spark:scala2.12-java11-python3-r-ubuntu .
$ docker run -p 4040:4040 -ti spark:scala2.12-java11-python3-r-ubuntu /opt/spark/bin/pyspark
```

```
# A simple test for `start-master.sh` (standalone mode)
$ docker run -ti spark:scala2.12-java11-python3-r-ubuntu bash
spark8d1118e26766:~/work-dir$ /opt/spark/sbin/start-master.sh
starting org.apache.spark.deploy.master.Master, logging to /opt/spark/logs/spark--org.apache.spark.deploy.master.Master-1-8d1118e26766.out
```

Closes #11 from Yikun/spark-user.

Authored-by: Yikun Jiang <yikunkero@gmail.com>
Signed-off-by: Yikun Jiang <yikunkero@gmail.com>
Yikun added a commit that referenced this pull request May 25, 2023
### What changes were proposed in this pull request?
- This patch changes the `build-args` to `patch in test` in build and publish workflow, because the docker official image do not support **parameterized FROM** values. docker-library/official-images#13089 (comment)
- And also Refactor publish workflow:
![image](https://user-images.githubusercontent.com/1736354/236613626-96f8fbf6-7df7-4d10-b4fb-be4d57c56dce.png)
### Why are the changes needed?
Same change with build workflow refactor, to avoid the publish issue like:
```
#5 [linux/amd64 internal] load metadata for docker.io/library/spark:3.4.0-scala2.12-java11-ubuntu
#5 ERROR: pull access denied, repository does not exist or may require authorization: server message: insufficient_scope: authorization failed
------
 > [linux/amd64 internal] load metadata for docker.io/library/spark:3.4.0-scala2.12-java11-ubuntu:
------
Dockerfile:18
--------------------
  16 |     #
  17 |     ARG BASE_IMAGE=spark:3.4.0-scala2.12-java11-ubuntu
  18 | >>> FROM $BASE_IMAGE
  19 |
  20 |     RUN set -ex && \
--------------------
ERROR: failed to solve: spark:3.4.0-scala2.12-java11-ubuntu: pull access denied, repository does not exist or may require authorization: server message: insufficient_scope: authorization failed
Error: buildx failed with: ERROR: failed to solve: spark:3.4.0-scala2.12-java11-ubuntu: pull access denied, repository does not exist or may require authorization: server message: insufficient_scope: authorization failed
```

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Publish test in my local fork:
- https://github.com/Yikun/spark-docker/actions/runs/5076986823/jobs/9120029759: Skip the local base build use the [published base](https://github.com/Yikun/spark-docker/actions/runs/5076986823/jobs/9120029759#step:11:135) image:

![image](https://user-images.githubusercontent.com/1736354/236612540-2b454c14-e194-4d73-b859-0df001570d27.png)

```
#3 [linux/amd64 internal] load metadata for ghcr.io/yikun/spark-docker/spark:3.4.0-scala2.12-java11-ubuntu
#3 DONE 0.9s

#4 [linux/arm64 internal] load metadata for ghcr.io/yikun/spark-docker/spark:3.4.0-scala2.12-java11-ubuntu
#4 DONE 0.9s
```

- CI passed: do local base build first and build base on the local build

Closes #39 from Yikun/publish-build.

Authored-by: Yikun Jiang <yikunkero@gmail.com>
Signed-off-by: Yikun Jiang <yikunkero@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants