New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Apache Spark Docker Official Image #13089
Conversation
Finally, will move from Currently, the dockerfile is maintained by apache spark. This dockerfile(https://github.com/Yikun/spark-docker/tree/master/3.3.0/scala2.12-java11-focal) is changed from link. We have made some improvements as requested by offciail image, but don't know how many gaps there are. Then next step, we will have further discussions in the spark community to make it happen. So, could you help with a preliminary review on dockerfile and entrypoint? @tianon |
|
A quick update (TL;DR: docker official image vote passed in Apache Spark community):
@tianon @yosifkit So, it's ready for review now. I know it would be slow for new images approved, would you mind giving a rough start time of reviewing? We (The Apache Spark Community) will spare no effort to help DOI support, many thanks! also cc @HyukjinKwon @zhengruifeng |
Quick update (TL;DR: Adding more test and script to ensure official image dockerfiles quality):
I believe we are ready. @tianon @yosifkit BTW, review note: https://github.com/apache/spark-docker/tree/master/3.3.0/scala2.12-java11-python3-r-ubuntu |
0f3653c
to
eb180a2
Compare
+1 for an official Docker image for Apache Spark! |
This comment has been minimized.
This comment has been minimized.
Any ETA when this could be reviewed and merged ? |
Hello! ✨ Thanks for your interest in contributing to the official images program. 💭 As you may have noticed, we've usually got a pretty decently sized queue of new images (not to mention image updates and maintenance of images under @docker-library which are maintained by the core official images team). As such, it may be some time before we get to reviewing this image (image updates get priority both because users expect them and because reviewing new images is a more involved process than reviewing updates), so we apologize in advance! Please be patient with us -- rest assured, we've seen your PR and it's in the queue. ❤️ We do try to proactively add and update the "new image checklist" on each PR, so if you haven't looked at it yet, that's a good use of time while you wait. ☔ Thanks! 💖 💙 💚 ❤️ |
@yosifkit Thank you very much for your reply! We (Apache Spark Community) will also actively address comments once getting review! Looking forward to your review! |
@yosifkit Today, we just published the latest release spark 3.4 https://spark.apache.org/releases/spark-release-3-4-0.html I am wondering when we can have our own official docker image? |
Ok, I finally have some feedback ready🎉😻. Sorry for the delay 🙇 Let me know if you have any questions. And thank you for your patience 🙇🥰💖
non-blocking:
|
@yosifkit Thanks for your reply! We will address all comments as soon as possible!
Good suggestion
Fixing in: apache/spark-docker#36
It was introduced with
For
I found it was introduced in SPARK-25275 for security improvement. It should also for OpenShift, so if it's not a blocker, we prefer to keep.
It was introduced for OpenShift, apache-spark-on-k8s/spark#404 . ~But I'm not sure it's still a problem after 6 years? ~cc @erikerlandson If it's not a blcoker, we prefer to keep. BTW, a case might related, would mind share some idea about how we would switch username in entrypoint.sh like apache/spark#40831 to meet the spark on k8s user switch requirement? such as something like
Actually, I tried others apache offcial image like solr / flink:
It use the specific username, did I missed something?
We will address soon! |
Regarding the modification of Does Spark still require an entry to exist in see: https://cloud.redhat.com/blog/a-guide-to-openshift-and-uids |
@erikerlandson Really much thanks for your reply, we added And any more background about question 3 ( |
@Yikun I don't think that will work. It adds an entry for |
Regarding SPARK-25275, I no longer recall what the underlying issue was. Unless you can run CI testing on OpenShift, I'd recommend you leave these things in. |
### What changes were proposed in this pull request? This PR changes Dockerfile and workflow based on base image to save space by sharing layers by having one image from another. After this PR: - The spark / PySpark / SparkR related files extract into base image - Install PySpark / SparkR deps in PySpark / SparkR images. - Add the base image build step - Apply changes to template: `./add-dockerfiles.sh 3.4.0` to make it work. - This PR didn't contain changes on 3.3.X Dockerfiles to make PR more clear, the 3.3.x changes will be a separate PR when we address all comments for 3.4.0. [1] docker-library/official-images#13089 ### Why are the changes needed? Address DOI comments, and also to save space by sharing layers by having one image from another. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? CI passed. Closes #36 from Yikun/official. Authored-by: Yikun Jiang <yikunkero@gmail.com> Signed-off-by: Yikun Jiang <yikunkero@gmail.com>
### What changes were proposed in this pull request? This patch change `apt` to `apt-get` and also remove useless `rm -rf /var/cache/apt/*; \`. And also apply the change to 3.4.0 and 3.4.1 ### Why are the changes needed? Address comments from DOI: - `apt install ...`, This should be apt-get (apt is not intended for unattended use, as the warning during build makes clear). - `rm -rf /var/cache/apt/*; \` This is harmless, but should be unnecessary (the base image configuration already makes sure this directory stays empty). See more in: [1] docker-library/official-images#13089 (comment) ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? CI passed Closes #47 from Yikun/apt-get. Authored-by: Yikun Jiang <yikunkero@gmail.com> Signed-off-by: Yikun Jiang <yikunkero@gmail.com>
### What changes were proposed in this pull request? Add 'set -eo pipefail' to entrypoint and quote variables ### Why are the changes needed? Address DOI comments: 1. Have you considered a set -eo pipefail on the entrypoint script to help prevent any errors from being silently ignored? 2. You probably want to quote this (and many of the other variables in this execution); ala --driver-url "$SPARK_DRIVER_URL" [1] docker-library/official-images#13089 (comment) [2] docker-library/official-images#13089 (comment) ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? CI passed Closes #49 from Yikun/quote. Authored-by: Yikun Jiang <yikunkero@gmail.com> Signed-off-by: Yikun Jiang <yikunkero@gmail.com>
### What changes were proposed in this pull request? Remove useless lib64 path ### Why are the changes needed? Address comments: docker-library/official-images#13089 (comment) It was introduced by apache/spark@f13ea15 to address the issue about snappy on alpine OS, but we already switch the OS to ubuntu, so `/lib64` hack can be cleanup. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? CI passed Closes #48 from Yikun/rm-lib64-hack. Authored-by: Yikun Jiang <yikunkero@gmail.com> Signed-off-by: Yikun Jiang <yikunkero@gmail.com>
This comment has been minimized.
This comment has been minimized.
Added env descirption and configurtation link to Docker Hub readme doc PR.
Addressed: apache/spark-docker#49
Addressed: apache/spark-docker#48
Addressed: apache/spark-docker#47
Addressed: apache/spark-docker#49 |
Sorry, did one more pass and noticed one last thing 🙇 🙇 ❤️ + GPG_KEY=34F0FC5C Oops, this should be the full key fingerprint: |
@tianon Oh, thanks for catch, I didn't realize before. But seems I got the key from https://dist.apache.org/repos/dist/dev/spark/KEYS of release manager. I only get the short key for v3.4.1 (https://github.com/apache/spark-docker/blob/master/tools/template.py#L34). Maybe a stupid question: how to map the short key to full key fingerprint? |
@dongjoon-hyun Would you mind taking a look about why we can't validate the long key. It seems short key (it can be validated) and long key (failed to be validated due to no public key found raise) is same one. 😂 Does it need to import or some other operations? |
Oh fun -- the way I mapped it was by downloading and importing that full I think what's happening in your PR is that previously, keys.openpgp.org was probably rejecting the short ID completely so your code was falling back to keyserver.ubuntu.com, but now it's accepting the full fingerprint, but because the email address isn't verified, it's not returning a UID (https://keys.openpgp.org/about), but GnuPG is quietly ignoring the key and returning a successful exit code, which is less than ideal. I've followed the steps in https://keys.openpgp.org/about/usage#gnupg-upload which should've emailed @dongjoon-hyun with a verification link that once verified, should allow you to restart that build and see success (as the path of least resistance here). 🙇 |
(all your |
@tianon Thanks, I have contacted dongjoon to help address this issue, after above 2 PRs ^ passed and merged, I will update. |
### What changes were proposed in this pull request? Change GPG key from `34F0FC5C` to `F28C9C925C188C35E345614DEDA00CE834F0FC5C` to avoid pontential collision. The full finger print can get from below cmd: ``` $ wget https://dist.apache.org/repos/dist/dev/spark/KEYS $ gpg --import KEYS $ gpg --fingerprint 34F0FC5C pub rsa4096 2015-05-05 [SC] F28C 9C92 5C18 8C35 E345 614D EDA0 0CE8 34F0 FC5C uid [ unknown] Dongjoon Hyun (CODE SIGNING KEY) <dongjoonapache.org> sub rsa4096 2015-05-05 [E] ``` ### Why are the changes needed? - A short gpg key had been added as v3.4.0 gpg key in #46 . - The short key `34F0FC5C` is from https://dist.apache.org/repos/dist/dev/spark/KEYS - According DOI review comments, docker-library/official-images#13089 (comment) , `this should be the full key fingerprint: F28C9C925C188C35E345614DEDA00CE834F0FC5C (generating a collision for such a short key ID is trivial.` - We'd better to switch the short key to full fingerprint ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? CI passed Closes #50 from Yikun/gpg_key. Authored-by: Yikun Jiang <yikunkero@gmail.com> Signed-off-by: Yikun Jiang <yikunkero@gmail.com>
### What changes were proposed in this pull request? Add --batch to gpg command which essentially puts GnuPG into "API mode" instead of "UI mode". Apply changes to 3.4.x dockerfile. ### Why are the changes needed? Address DOI comments: docker-library/official-images#13089 (comment) ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? CI passed Closes #51 from Yikun/batch. Authored-by: Yikun Jiang <yikunkero@gmail.com> Signed-off-by: Yikun Jiang <yikunkero@gmail.com>
Signed-off-by: Yikun Jiang <yikunkero@gmail.com>
Diff for 1fb130f:diff --git a/_bashbrew-arches b/_bashbrew-arches
index 8b13789..e85a97f 100644
--- a/_bashbrew-arches
+++ b/_bashbrew-arches
@@ -1 +1,2 @@
-
+amd64
+arm64v8
diff --git a/_bashbrew-cat b/_bashbrew-cat
index bdfae4a..6bd4235 100644
--- a/_bashbrew-cat
+++ b/_bashbrew-cat
@@ -1 +1,22 @@
-Maintainers: New Image! :D (@docker-library-bot)
+Maintainers: Apache Spark Developers <dev@spark.apache.org> (@ApacheSpark)
+GitRepo: https://github.com/apache/spark-docker.git
+
+Tags: 3.4.1-scala2.12-java11-python3-r-ubuntu
+Architectures: amd64, arm64v8
+GitCommit: 58d288546e8419d229f14b62b6a653999e0390f1
+Directory: 3.4.1/scala2.12-java11-python3-r-ubuntu
+
+Tags: 3.4.1-scala2.12-java11-python3-ubuntu, 3.4.1-python3, python3, 3.4.1, latest
+Architectures: amd64, arm64v8
+GitCommit: 58d288546e8419d229f14b62b6a653999e0390f1
+Directory: 3.4.1/scala2.12-java11-python3-ubuntu
+
+Tags: 3.4.1-scala2.12-java11-r-ubuntu, 3.4.1-r, r
+Architectures: amd64, arm64v8
+GitCommit: 58d288546e8419d229f14b62b6a653999e0390f1
+Directory: 3.4.1/scala2.12-java11-r-ubuntu
+
+Tags: 3.4.1-scala2.12-java11-ubuntu, 3.4.1-scala, scala
+Architectures: amd64, arm64v8
+GitCommit: 58d288546e8419d229f14b62b6a653999e0390f1
+Directory: 3.4.1/scala2.12-java11-ubuntu
diff --git a/_bashbrew-list b/_bashbrew-list
index e69de29..d4a584b 100644
--- a/_bashbrew-list
+++ b/_bashbrew-list
@@ -0,0 +1,12 @@
+spark:3.4.1
+spark:3.4.1-python3
+spark:3.4.1-r
+spark:3.4.1-scala
+spark:3.4.1-scala2.12-java11-python3-r-ubuntu
+spark:3.4.1-scala2.12-java11-python3-ubuntu
+spark:3.4.1-scala2.12-java11-r-ubuntu
+spark:3.4.1-scala2.12-java11-ubuntu
+spark:latest
+spark:python3
+spark:r
+spark:scala
diff --git a/_bashbrew-list-build-order b/_bashbrew-list-build-order
index e69de29..66dee52 100644
--- a/_bashbrew-list-build-order
+++ b/_bashbrew-list-build-order
@@ -0,0 +1,4 @@
+spark:scala
+spark:3.4.1-scala2.12-java11-python3-r-ubuntu
+spark:latest
+spark:r
diff --git a/spark_3.4.1-scala2.12-java11-python3-r-ubuntu/Dockerfile b/spark_3.4.1-scala2.12-java11-python3-r-ubuntu/Dockerfile
new file mode 100644
index 0000000..30e6b86
--- /dev/null
+++ b/spark_3.4.1-scala2.12-java11-python3-r-ubuntu/Dockerfile
@@ -0,0 +1,29 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements. See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License. You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+FROM spark:3.4.1-scala2.12-java11-ubuntu
+
+USER root
+
+RUN set -ex; \
+ apt-get update; \
+ apt-get install -y python3 python3-pip; \
+ apt-get install -y r-base r-base-dev; \
+ rm -rf /var/lib/apt/lists/*
+
+ENV R_HOME /usr/lib/R
+
+USER spark
diff --git a/spark_latest/Dockerfile b/spark_latest/Dockerfile
new file mode 100644
index 0000000..124ef71
--- /dev/null
+++ b/spark_latest/Dockerfile
@@ -0,0 +1,26 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements. See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License. You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+FROM spark:3.4.1-scala2.12-java11-ubuntu
+
+USER root
+
+RUN set -ex; \
+ apt-get update; \
+ apt-get install -y python3 python3-pip; \
+ rm -rf /var/lib/apt/lists/*
+
+USER spark
diff --git a/spark_r/Dockerfile b/spark_r/Dockerfile
new file mode 100644
index 0000000..1c9fc38
--- /dev/null
+++ b/spark_r/Dockerfile
@@ -0,0 +1,28 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements. See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License. You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+FROM spark:3.4.1-scala2.12-java11-ubuntu
+
+USER root
+
+RUN set -ex; \
+ apt-get update; \
+ apt-get install -y r-base r-base-dev; \
+ rm -rf /var/lib/apt/lists/*
+
+ENV R_HOME /usr/lib/R
+
+USER spark
diff --git a/spark_scala/Dockerfile b/spark_scala/Dockerfile
new file mode 100644
index 0000000..d8bba7e
--- /dev/null
+++ b/spark_scala/Dockerfile
@@ -0,0 +1,79 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements. See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License. You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+FROM eclipse-temurin:11-jre-focal
+
+ARG spark_uid=185
+
+RUN groupadd --system --gid=${spark_uid} spark && \
+ useradd --system --uid=${spark_uid} --gid=spark spark
+
+RUN set -ex; \
+ apt-get update; \
+ apt-get install -y gnupg2 wget bash tini libc6 libpam-modules krb5-user libnss3 procps net-tools gosu libnss-wrapper; \
+ mkdir -p /opt/spark; \
+ mkdir /opt/spark/python; \
+ mkdir -p /opt/spark/examples; \
+ mkdir -p /opt/spark/work-dir; \
+ chmod g+w /opt/spark/work-dir; \
+ touch /opt/spark/RELEASE; \
+ chown -R spark:spark /opt/spark; \
+ echo "auth required pam_wheel.so use_uid" >> /etc/pam.d/su; \
+ rm -rf /var/lib/apt/lists/*
+
+# Install Apache Spark
+# https://downloads.apache.org/spark/KEYS
+ENV SPARK_TGZ_URL=https://archive.apache.org/dist/spark/spark-3.4.1/spark-3.4.1-bin-hadoop3.tgz \
+ SPARK_TGZ_ASC_URL=https://archive.apache.org/dist/spark/spark-3.4.1/spark-3.4.1-bin-hadoop3.tgz.asc \
+ GPG_KEY=F28C9C925C188C35E345614DEDA00CE834F0FC5C
+
+RUN set -ex; \
+ export SPARK_TMP="$(mktemp -d)"; \
+ cd $SPARK_TMP; \
+ wget -nv -O spark.tgz "$SPARK_TGZ_URL"; \
+ wget -nv -O spark.tgz.asc "$SPARK_TGZ_ASC_URL"; \
+ export GNUPGHOME="$(mktemp -d)"; \
+ gpg --batch --keyserver hkps://keys.openpgp.org --recv-key "$GPG_KEY" || \
+ gpg --batch --keyserver hkps://keyserver.ubuntu.com --recv-keys "$GPG_KEY"; \
+ gpg --batch --verify spark.tgz.asc spark.tgz; \
+ gpgconf --kill all; \
+ rm -rf "$GNUPGHOME" spark.tgz.asc; \
+ \
+ tar -xf spark.tgz --strip-components=1; \
+ chown -R spark:spark .; \
+ mv jars /opt/spark/; \
+ mv bin /opt/spark/; \
+ mv sbin /opt/spark/; \
+ mv kubernetes/dockerfiles/spark/decom.sh /opt/; \
+ mv examples /opt/spark/; \
+ mv kubernetes/tests /opt/spark/; \
+ mv data /opt/spark/; \
+ mv python/pyspark /opt/spark/python/pyspark/; \
+ mv python/lib /opt/spark/python/lib/; \
+ mv R /opt/spark/; \
+ chmod a+x /opt/decom.sh; \
+ cd ..; \
+ rm -rf "$SPARK_TMP";
+
+COPY entrypoint.sh /opt/
+
+ENV SPARK_HOME /opt/spark
+
+WORKDIR /opt/spark/work-dir
+
+USER spark
+
+ENTRYPOINT [ "/opt/entrypoint.sh" ]
diff --git a/spark_scala/entrypoint.sh b/spark_scala/entrypoint.sh
new file mode 100755
index 0000000..2e3d2a8
--- /dev/null
+++ b/spark_scala/entrypoint.sh
@@ -0,0 +1,126 @@
+#!/bin/bash
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements. See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License. You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+# Prevent any errors from being silently ignored
+set -eo pipefail
+
+attempt_setup_fake_passwd_entry() {
+ # Check whether there is a passwd entry for the container UID
+ local myuid; myuid="$(id -u)"
+ # If there is no passwd entry for the container UID, attempt to fake one
+ # You can also refer to the https://github.com/docker-library/official-images/pull/13089#issuecomment-1534706523
+ # It's to resolve OpenShift random UID case.
+ # See also: https://github.com/docker-library/postgres/pull/448
+ if ! getent passwd "$myuid" &> /dev/null; then
+ local wrapper
+ for wrapper in {/usr,}/lib{/*,}/libnss_wrapper.so; do
+ if [ -s "$wrapper" ]; then
+ NSS_WRAPPER_PASSWD="$(mktemp)"
+ NSS_WRAPPER_GROUP="$(mktemp)"
+ export LD_PRELOAD="$wrapper" NSS_WRAPPER_PASSWD NSS_WRAPPER_GROUP
+ local mygid; mygid="$(id -g)"
+ printf 'spark:x:%s:%s:${SPARK_USER_NAME:-anonymous uid}:%s:/bin/false\n' "$myuid" "$mygid" "$SPARK_HOME" > "$NSS_WRAPPER_PASSWD"
+ printf 'spark:x:%s:\n' "$mygid" > "$NSS_WRAPPER_GROUP"
+ break
+ fi
+ done
+ fi
+}
+
+if [ -z "$JAVA_HOME" ]; then
+ JAVA_HOME=$(java -XshowSettings:properties -version 2>&1 > /dev/null | grep 'java.home' | awk '{print $3}')
+fi
+
+SPARK_CLASSPATH="$SPARK_CLASSPATH:${SPARK_HOME}/jars/*"
+for v in "${!SPARK_JAVA_OPT_@}"; do
+ SPARK_EXECUTOR_JAVA_OPTS+=( "${!v}" )
+done
+
+if [ -n "$SPARK_EXTRA_CLASSPATH" ]; then
+ SPARK_CLASSPATH="$SPARK_CLASSPATH:$SPARK_EXTRA_CLASSPATH"
+fi
+
+if ! [ -z "${PYSPARK_PYTHON+x}" ]; then
+ export PYSPARK_PYTHON
+fi
+if ! [ -z "${PYSPARK_DRIVER_PYTHON+x}" ]; then
+ export PYSPARK_DRIVER_PYTHON
+fi
+
+# If HADOOP_HOME is set and SPARK_DIST_CLASSPATH is not set, set it here so Hadoop jars are available to the executor.
+# It does not set SPARK_DIST_CLASSPATH if already set, to avoid overriding customizations of this value from elsewhere e.g. Docker/K8s.
+if [ -n "${HADOOP_HOME}" ] && [ -z "${SPARK_DIST_CLASSPATH}" ]; then
+ export SPARK_DIST_CLASSPATH="$($HADOOP_HOME/bin/hadoop classpath)"
+fi
+
+if ! [ -z "${HADOOP_CONF_DIR+x}" ]; then
+ SPARK_CLASSPATH="$HADOOP_CONF_DIR:$SPARK_CLASSPATH";
+fi
+
+if ! [ -z "${SPARK_CONF_DIR+x}" ]; then
+ SPARK_CLASSPATH="$SPARK_CONF_DIR:$SPARK_CLASSPATH";
+elif ! [ -z "${SPARK_HOME+x}" ]; then
+ SPARK_CLASSPATH="$SPARK_HOME/conf:$SPARK_CLASSPATH";
+fi
+
+# Switch to spark if no USER specified (root by default) otherwise use USER directly
+switch_spark_if_root() {
+ if [ $(id -u) -eq 0 ]; then
+ echo gosu spark
+ fi
+}
+
+case "$1" in
+ driver)
+ shift 1
+ CMD=(
+ "$SPARK_HOME/bin/spark-submit"
+ --conf "spark.driver.bindAddress=$SPARK_DRIVER_BIND_ADDRESS"
+ --deploy-mode client
+ "$@"
+ )
+ attempt_setup_fake_passwd_entry
+ # Execute the container CMD under tini for better hygiene
+ exec $(switch_spark_if_root) /usr/bin/tini -s -- "${CMD[@]}"
+ ;;
+ executor)
+ shift 1
+ CMD=(
+ ${JAVA_HOME}/bin/java
+ "${SPARK_EXECUTOR_JAVA_OPTS[@]}"
+ -Xms"$SPARK_EXECUTOR_MEMORY"
+ -Xmx"$SPARK_EXECUTOR_MEMORY"
+ -cp "$SPARK_CLASSPATH:$SPARK_DIST_CLASSPATH"
+ org.apache.spark.scheduler.cluster.k8s.KubernetesExecutorBackend
+ --driver-url "$SPARK_DRIVER_URL"
+ --executor-id "$SPARK_EXECUTOR_ID"
+ --cores "$SPARK_EXECUTOR_CORES"
+ --app-id "$SPARK_APPLICATION_ID"
+ --hostname "$SPARK_EXECUTOR_POD_IP"
+ --resourceProfileId "$SPARK_RESOURCE_PROFILE_ID"
+ --podName "$SPARK_EXECUTOR_POD_NAME"
+ )
+ attempt_setup_fake_passwd_entry
+ # Execute the container CMD under tini for better hygiene
+ exec $(switch_spark_if_root) /usr/bin/tini -s -- "${CMD[@]}"
+ ;;
+
+ *)
+ # Non-spark-on-k8s command provided, proceeding in pass-through mode...
+ exec "$@"
+ ;;
+esac |
All comments addressed. Ready for review again. |
@yosifkit Would you mind taking a look recently? Thanks! |
awesome! |
congrats! @Yikun |
Thank you ! |
@yosifkit @tianon Much thanks for your help! I also have a post test, it passed all tests! Thanks all cc @gatorsmile @erikerlandson @HyukjinKwon @zhengruifeng @pan3793 @emiliofernandes @julien-faye |
This patch is adding Apache Spark Docker Official Image.
Checklist for Review
NOTE: This checklist is intended for the use of the Official Images maintainers both to track the status of your PR and to help inform you and others of where we're at. As such, please leave the "checking" of items to the repository maintainers. If there is a point below for which you would like to provide additional information or note completion, please do so by commenting on the PR. Thanks! (and thanks for staying patient with us ❤️)
foobar
needs Node.js, hasFROM node:...
instead of grabbingnode
via other means been considered?)eclipse-temurin
[ ] ifFROM scratch
, tarballs only exist in a single commit within the associated history?