Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update Spark Docker Images publish workflow #458

Closed
wants to merge 1 commit into from

Conversation

Yikun
Copy link
Member

@Yikun Yikun commented Apr 18, 2023

This PR try to update Create and upload Spark Docker Images to use new workflow, the new workflow will first build docker image, and then test (k8s / standalone) the image, finally publish the docker image, see example in here.

The different between old workflow and new workflow:

workflow Tag
Previous apache/spark:v3.4.0, apache/spark-py:v3.4.0, apache/spark-r:v3.4.0
New image

After
image

We already configure the DOCKER_USER and DOCKER_TOKEN in https://issues.apache.org/jira/browse/INFRA-23882 .

After this patch merged, the 3.4.0 image can be published as first docker image to apache/spark dockerhub (new version). In future release, only new version docker images will be published.

@Yikun Yikun marked this pull request as ready for review April 18, 2023 09:50
@HyukjinKwon
Copy link
Member

HyukjinKwon commented Apr 18, 2023

Merged to asf-site branch.

@Yikun
Copy link
Member Author

Yikun commented Apr 19, 2023

@HyukjinKwon @zhengruifeng @yaooqinn Thanks all! If no more comments, I will publish the docker image with new tags for 3.4.0 version later today!

@Yikun
Copy link
Member Author

Yikun commented Apr 19, 2023

I hitted a issue when I try to push the apache/spark image using github action, seems dockerhub token is invalid, already raise a issue on INFRA-24476 (seems a similar issue with INFRA-24459)

@Yikun
Copy link
Member Author

Yikun commented Apr 23, 2023

The INFRA-24476 had been resolved, and docker image already published at https://hub.docker.com/r/apache/spark/tags

Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you!

@dongjoon-hyun
Copy link
Member

Hi, @Yikun , @HyukjinKwon , @yaooqinn , @zhengruifeng .

it seems that this PR introduced two user-facing breaking changes

  • Downgrade Java from 17 to 11
$ docker run -it --rm apache/spark:v3.4.0 java -version | tail -n1
OpenJDK 64-Bit Server VM Temurin-17.0.6+10 (build 17.0.6+10, mixed mode, sharing)
$ docker run -it --rm apache/spark:3.4.0 java -version | tail -n1
OpenJDK 64-Bit Server VM Temurin-11.0.18+10 (build 11.0.18+10, mixed mode, sharing)
  • Drop v prefix, v3.4.0 -> 3.4.0

If these breaking changes are intentional, I'm wondering if we discuss the rationals in the dev mailing list. Otherwise, I'd like to recommend to revert those breaking changes to avoid any confusion.

@dongjoon-hyun
Copy link
Member

cc @sunchao , too

@Yikun
Copy link
Member Author

Yikun commented Apr 25, 2023

Thanks for catch and reminder it! @dongjoon-hyun

For java17, I think we can switch to java17 in docker image.

For tag, it's intentional. In 3.4.0, v3.4.0 tag and 3.4.0 serious tags will co-exsiting.

It should be a new feature but not breaking changes. (But for latest tag where to point is a breaking changes).

So do you have any suggestion how we could migrate to new image fluently?

Maybe

  1. 3.4.0 keep previous and new tags, latest point to previous tag.
  2. 3.5.0 keep previous and new tags, and mark previous tag as deprecated, latest point to new tag.

@dongjoon-hyun
Copy link
Member

It's already Java 17 by default.

For java17, I think we can switch to java17 in docker image.

@Yikun
Copy link
Member Author

Yikun commented Apr 25, 2023

Yes, I meant we still need to update in apache/spark-docker.

@dongjoon-hyun
Copy link
Member

  1. If we are going to maintain the manual process, why do you remove old description in this PR?

For tag, it's intentional. In 3.4.0, v3.4.0 tag and 3.4.0 serious tags will co-exsiting.

  1. To be clear, I really appreciate your contribution because this is good enough to mitigate Apache Spark release managers' burden.

Given (1) and (2), I don't think this PR claims to maintain an existing manual publish process. In other words, this PR already burn the bridge.

I believe 3.4.0 and v3.4.0 only causes many confusions to the users. In Apache Spark community, that's the reason why I propose to keep v3.4.0 tag format only (whatever we used; manual or GitHub)

@Yikun
Copy link
Member Author

Yikun commented Apr 26, 2023

I believe 3.4.0 and v3.4.0 only causes many confusions to the users. In Apache Spark community, that's the reason why I propose to keep v3.4.0 tag format only (whatever we used; manual or GitHub)

The reason of we remove v is to follow DOI rules. If we still think v should be keep in future, we can both keep them (to point new images).

Given (1) and (2), I don't think this PR claims to maintain an existing manual publish process. In other words, this PR already burn the bridge.

At this point, below might the solutions to continue the work:

Solution 1:

  1. Switch Java 17 to default in spark-docker: [SPARK-40513] Switch 3.4.0 default Java to Java17 spark-docker#35
  2. Keep v3.4.0 and 3.4.0 to point GA publish image.
  3. Keep this PR to instead of manual process

Solution 2:

  1. Recover latest to manual publish image for 3.4.0 images.
  2. Revert this PR (Recover the manual process and also keep the GA publish steps in this PR's doc)
  3. v3.4.0 point to manual image, 3.4.0 point to GA publish image
  4. Keep manual image and GA publish image in 3.4.0 series, and unifed all images into GA publish image to in 3.5.0

@dongjoon-hyun @HyukjinKwon @zhengruifeng WDYT?

@dongjoon-hyun
Copy link
Member

dongjoon-hyun commented Apr 26, 2023

@Yikun . We cannot simply remove anything like this PR. The official way to adopt new standard is to send the email on dev mailing list and build a community consensus. In other words, you need to inform and persuade.

The reason of we remove v is to follow DOI rules. If we still think v should be keep in future, we can both keep them (to point new images).

As you know, Apache Spark dev mailing list is the only official way to build a community consensus. All the other channels (GitHub or even user mailing list). Please shoot the email. :)

@Yikun
Copy link
Member Author

Yikun commented Apr 26, 2023

As you know, Apache Spark dev mailing list is the only official way to build a community consensus. All the other channels (GitHub or even user mailing list). Please shoot the email. :)

Sure! I will send the mail in this week!

@yaooqinn
Copy link
Member

The reason of we remove v is to follow DOI rules.

Probably these are guidelines but not hard and fast rules. And I didn't see the prefix v is not recommended.

@Yikun
Copy link
Member Author

Yikun commented Apr 26, 2023

@yaooqinn I also raise a issue on DOI as our discussion input: docker-library/official-images#14506 .

@HyukjinKwon
Copy link
Member

Yeah, let's discuss in the dev mailing list, and set that down.

I personally think it's fine to make some breaking changes at this moment, but the more important thing is that we should set that down after the discussion so we don't make breaking changes in the future.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
5 participants