-
Notifications
You must be signed in to change notification settings - Fork 28k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-33759][K8S] docker entrypoint should using spark-class
for spark executor
#30738
Conversation
resource-managers/kubernetes/docker/src/main/dockerfiles/spark/entrypoint.sh
Outdated
Show resolved
Hide resolved
@dungdm93, can you file a JIRA and keep the format in the PR title? See also http://spark.apache.org/contributing.html |
Signed-off-by: Đặng Minh Dũng <dungdm93@live.com>
spark-class
in docker entrypointspark-class
in docker entrypoint
spark-class
in docker entrypointspark-class
for spark executor
ok to test |
cc @vanzin FYI |
Test build #132712 has finished for PR 30738 at commit
|
Kubernetes integration test starting |
Kubernetes integration test status success |
WDYT @dongjoon-hyun, @tgravescs, @holdenk ? |
I thought most of our launcher use straight java. ie Yarn, standalone, etc. They setup the env as necessary for the executor side. So without more detail I'm against the change, but I'm not as familiar with the k8s side here so if there is something broken in the executor env can you please describe in more detail as to what is happening so we can decide on an appropriate fix. |
@tgravescs I understand that env could be passed to driver & executors in k8s. But the main focus of this change is supporting pre-start hooks. For example, in our in-house DWH system, we have a s3-compatible service with self-signed certificate. So custom CA need to be imported into both driver & executor pods. |
FYI, we already use |
so that seems like a very specific use case that could very well break other people or put more requirements on the container anyway. |
Let's drop this anyway since it looks going to break other users' applications very easily given the discussion above. |
It would be good to hear back and see if there is an alternate solution to the problem |
@tgravescs I don't know about ExecutorPlugin, which plugin should I used? |
@HyukjinKwon I don't understand how this change could break other users' applications. |
@tgravescs There are an other way is create an startup-hooks script in |
its a matter of the confs and environment and what is in the docker images and keeping things consistent. If I create a docker image that contains the spark and spark confs with a bunch of things set in spark-env, but then when I launch my job I change the configs or specifically launch it with different config files its still going to pick up the spark-env.sh from the docker image. It shouldn't do that to keep things consistent and get reliable results. There is a driver and executor plugin that run at launch, its a developer api so probably not great docs on it: it does run after start though so I'm not sure exactly when you need this to run. |
@dungdm93 could you use Spark's pod templates to specify a Kubernetes a pre-start hook (e.g https://www.decodingdevops.com/kubernetes-pod-lifecycle-tutorial/ )? |
@tgravescs here is an example of my deployment. apiVersion: sparkoperator.k8s.io/v1beta2
kind: SparkApplication
metadata:
name: pyspark-pi
namespace: default
spec:
type: Python
sparkVersion: "3.0.0"
pythonVersion: "3"
mode: cluster
image: hub.acme.corp/library/pyspark:v3.0.0
imagePullPolicy: Always
mainApplicationFile: local://path/to/python/main.py
hadoopConf:
fs.s3a.impl: org.apache.hadoop.fs.s3a.S3AFileSystem
fs.s3a.endpoint: https://minio.acme.corp:443/
fs.s3a.path.style.access: "true"
fs.s3a.access.key: sample-access-key
fs.s3a.secret.key: sample-secret-key
# other config here
sparkConfigMap: spark-conf
driver:
cores: 1
coreLimit: "1200m"
memory: "512m"
serviceAccount: spark
executor:
instances: 1
cores: 1
memory: "512m"
---
apiVersion: v1
kind: ConfigMap
metadata:
name: spark-conf
namespace: default
data:
spark-env.sh: |
keytool -import -alias "acme.corp" -noprompt -file $SPARK_CONF_DIR/ca.crt -keystore $JAVA_HOME/lib/security/cacerts -storepass changeit
ca.crt: |
-----BEGIN CERTIFICATE-----
....
-----END CERTIFICATE----- A few points worth noting:
@holdenk thanks for you suggestion. I used to use that feature but it fails in some situations because Kubernetes don't guarantee that the hook will execute before the container entrypoint. You could checkout k8s documentation here. |
Hi, @dungdm93 . Apache Spark distributions provide docker files and build scripts instead of docker image. It seems that you can do the following to achieve your use cases. How do you think about that? It's just one line before you build your docker image. $ sed -i.bak 's/${JAVA_HOME}\/bin\/java/\"\$SPARK_HOME\/bin\/spark-class\"/' kubernetes/dockerfiles/spark/entrypoint.sh
$ bin/docker-image-tool.sh -p kubernetes/dockerfiles/spark/bindings/python/Dockerfile -n build |
@dongjoon-hyun Yes, It's OK. |
Or, you may want to switch this PR to support |
Can one of the admins verify this patch? |
I close this PR for now. Please feel free to reopen this if there is any change. Happy New Year! |
Signed-off-by: Đặng Minh Dũng dungdm93@live.com
What changes were proposed in this pull request?
In docker entrypoint.sh, spark executor should using
spark-class
instead of purejava
command.spark/resource-managers/kubernetes/docker/src/main/dockerfiles/spark/entrypoint.sh
Lines 70 to 102 in 8b97b19
Why are the changes needed?
In docker
entrypoint.sh
, spark driver usingspark-submit
command but spark executor using purejava
command which don't loadspark-env.sh
in$SPARK_HOME/conf
directory.This can lead configuration mismatch between driver and executors in the cases
spark-env.sh
contains something like custom envvars or pre-start hooksDoes this PR introduce any user-facing change?
N/A
How was this patch tested?