[SPARK-49243][INFRA][K8S] Add OpenContainers Annotations in docker images#47766
[SPARK-49243][INFRA][K8S] Add OpenContainers Annotations in docker images#47766dongjoon-hyun wants to merge 2 commits intoapache:masterfrom
OpenContainers Annotations in docker images#47766Conversation
35f93f2 to
e34203c
Compare
e34203c to
abbf7b8
Compare
|
Could you review this PR, @viirya ? |
viirya
left a comment
There was a problem hiding this comment.
Looks good to me. Thanks @dongjoon-hyun
|
Thank you, @viirya ! |
|
|
||
| # Image for building Spark releases. Based on Ubuntu 22.04. | ||
| FROM ubuntu:jammy-20240227 | ||
| LABEL org.opencontainers.image.authors="Apache Spark project <dev@spark.apache.org>" |
There was a problem hiding this comment.
Apache Spark Community?
Isn't it too broad because it includes a read-only users?
There was a problem hiding this comment.
I believe it's acceptable to include read-only users because non-code contributions, such as testing and sharing use cases, should also be acknowledged.
The User also plays a role in contributing according to 'The Apache Way'.
There was a problem hiding this comment.
I believe it's acceptable to include read-only users because non-code contributions, such as testing and sharing use cases, should also be acknowledged. The User also plays a role in contributing according to 'The Apache Way'.
I got your point. It seems that there was a little misunderstanding between you and me.
- When I say
read-only users, I embracedread-only users(who didn't contribute at all in all ways) asApache Spark Community.
The point is that I have a broader concept on the Community than yours.
The next question is that
ContributorsareAuthors?
There was a problem hiding this comment.
One more thing. I used the term Project as the author in this way.
An open source project includes all aspects of creating, maintaining, and distributing open source software including community building and mentoring, communication, the release process, and everything in between.
There was a problem hiding this comment.
Shall we capitalize 'project' to make it consistent with others
There was a problem hiding this comment.
Maybe, did you mean Apache Spark by with others? It's possible.
However, technically, Apache Spark™ is a trademark and the unique name of our project while project is a normal noun.
Here is a sample example sentence according to ASF guideline. Unlike Apache Spark, the following normal noun like software is used in a lower case. You can see that the phrase, the Apache Foo project, in the same document as a example.
https://www.apache.org/foundation/marks/#guidelines
"Free copies of Apache ProjectName software under the Apache License and support services for Apache ProjectName are available at my own company website."
There was a problem hiding this comment.
Oh, I mean 'Apache Spark pProject' like 'S'cala, 'I'mage etc, but it's just a nit
| LABEL org.opencontainers.image.authors="Apache Spark project <dev@spark.apache.org>" | ||
| LABEL org.opencontainers.image.licenses="Apache-2.0" | ||
| LABEL org.opencontainers.image.ref.name="Apache Spark Release Manager Image" | ||
| LABEL org.opencontainers.image.version="" |
There was a problem hiding this comment.
Can it be removed if it's blank?
There was a problem hiding this comment.
Ya, I tried, but we should have a blank here to remove the previous OS layer version as described in the PR description.
$ docker inspect apache/spark:3.5.2 | jq '.[0].Config.Labels'
{
"org.opencontainers.image.ref.name": "ubuntu",
"org.opencontainers.image.version": "20.04"
}
There was a problem hiding this comment.
Without this blank, the final information could be interpreted as Apache Spark 20.04.
There was a problem hiding this comment.
Got it. Thanks for the explanation.
Maybe we can add some comment here.
There was a problem hiding this comment.
Sure. I will add comments, @yaooqinn .
Ya, I agree that the AS-IS is obscure.
So, I'm currently trying to improve this area including bin/docker-image-tool.sh via version detection from RELEASE. For the K8s images, I guess I can deliver a better way to fill this field. But, for this spark-rm/Dockerfile, I guess it's not worth to do.
There was a problem hiding this comment.
via version detection from RELEASE
+1
| FROM ${base_img:-spark} | ||
| LABEL org.opencontainers.image.authors="Apache Spark project <dev@spark.apache.org>" | ||
| LABEL org.opencontainers.image.licenses="Apache-2.0" | ||
| LABEL org.opencontainers.image.ref.name="Apache Spark Python Image" |
There was a problem hiding this comment.
Ya, I also considered Apache Spark PySpark Image and Apache Spark SparkR Image.
But, I believe Python would be better because we have Apache Spark Scala/Java Image as a base image.
In addition, I'm not sure, but Apache Spark Go Image comes later. So, for consistency, language name could work extensively.
There was a problem hiding this comment.
I was considering whether it will conflict with the https://pypi.org/project/pyspark-connect/ or not.
But it looks fine to me now
|
Thank you for thorough review, @yaooqinn . I addressed your comments by adding a comment line and replied for the discussion part here.
Please let me know your opinion. |
|
Thank you! Merged to master. |


What changes were proposed in this pull request?
This PR aims to add
OpenContainersAnnotations to docker image to add metadata likeApache License.Why are the changes needed?
BEFORE
AFTER
Does this PR introduce any user-facing change?
No, it's only a metadata change.
How was this patch tested?
Manual review.
Was this patch authored or co-authored using generative AI tooling?
No.