-
Notifications
You must be signed in to change notification settings - Fork 28.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-32067][K8S] Use unique ConfigMap name for executor pod template #29934
[SPARK-32067][K8S] Use unique ConfigMap name for executor pod template #29934
Conversation
Can you file a jira and link it to the PR title? See also https://spark.apache.org/contributing.html |
@onursatici @vanzin I see you guys have changed parts of this code would you kindly review? |
@HyukjinKwon Sorry forgot to add the existing JIRA ticket in the title. I also added the K8S component in the title now |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please consider using the already-defined podspec-configmap
constant.
...etes/core/src/main/scala/org/apache/spark/deploy/k8s/features/PodTemplateConfigMapStep.scala
Outdated
Show resolved
Hide resolved
...core/src/test/scala/org/apache/spark/deploy/k8s/features/PodTemplateConfigMapStepSuite.scala
Outdated
Show resolved
Hide resolved
...core/src/test/scala/org/apache/spark/deploy/k8s/features/PodTemplateConfigMapStepSuite.scala
Outdated
Show resolved
Hide resolved
ok to test |
Kubernetes integration test starting |
Kubernetes integration test status success |
Test build #129385 has finished for PR 29934 at commit
|
Test build #129389 has finished for PR 29934 at commit
|
Test build #129390 has finished for PR 29934 at commit
|
Kubernetes integration test starting |
Kubernetes integration test starting |
Kubernetes integration test status success |
Kubernetes integration test status success |
Thank you for updates, @stijndehaes . |
...etes/core/src/main/scala/org/apache/spark/deploy/k8s/features/PodTemplateConfigMapStep.scala
Outdated
Show resolved
Hide resolved
...etes/core/src/main/scala/org/apache/spark/deploy/k8s/features/PodTemplateConfigMapStep.scala
Outdated
Show resolved
Hide resolved
...etes/core/src/main/scala/org/apache/spark/deploy/k8s/features/PodTemplateConfigMapStep.scala
Outdated
Show resolved
Hide resolved
...etes/core/src/main/scala/org/apache/spark/deploy/k8s/features/PodTemplateConfigMapStep.scala
Outdated
Show resolved
Hide resolved
resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/Constants.scala
Outdated
Show resolved
Hide resolved
Test build #129399 has finished for PR 29934 at commit
|
Kubernetes integration test starting |
Kubernetes integration test status success |
Test build #129419 has finished for PR 29934 at commit
|
Kubernetes integration test starting |
Kubernetes integration test status success |
Wow, it's much simpler, @stijndehaes . :) |
@dongjoon-hyun Could you please consider include this bug fix as part of any coming release. This is a much needed fix for probably many people. |
Of course, @yuj . I'm considering this for Apache Spark 3.1/3.0.2/2.4.8. I've been reviewing this PR since two days ago. |
@@ -31,14 +31,16 @@ private[spark] class PodTemplateConfigMapStep(conf: KubernetesConf) | |||
|
|||
private val hasTemplate = conf.contains(KUBERNETES_EXECUTOR_PODTEMPLATE_FILE) | |||
|
|||
private val configmapName = s"${conf.resourceNamePrefix}-$POD_TEMPLATE_CONFIGMAP" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This makes the following status.
tpcds-py-eebc4474fa60b6f7-driver-conf-map 1 5s
tpcds-py-eebc4474fa60b6f7-podspec-configmap 1 5s
This is inconsistent because Driver is using driver-conf-map
postfix and executor is using podspec-configmap
.
resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/submit/KubernetesClientApplication.scala: val configMapName = s"${conf.resourceNamePrefix}-driver-conf-map"
Let's use -exec-conf-map
as a postfix.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wel actually this configmap is not used by the executors but also by the driver. The driver needs this configmap to spawn executors with the right template. So I'll change the name to:driver-podspec-conf-map
. Let me know if that is fine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you share your result from k get cm
? Does it looks reasonably consistent?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, please update your PR description. I added AFTER paragraph at this PR description. You can fill in the final output there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
During running the integration tests when watching the configmap it looks something like this:
aaece65ef82e4a30b7b7800aad600d4f spark-test-app-aac9f37502b2ca55-driver-conf-map 1 0s
aaece65ef82e4a30b7b7800aad600d4f spark-test-app-aac9f37502b2ca55-driver-podspec-conf-map 1 0
For me this looks nice because it's clear that the confimap is used by the driver. And when sorted alphabetically they are neatly together.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it. Thank you, @stijndehaes .
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@stijndehaes . This PR looks good except the consistency between driver and executor. Please update it.
The pod template configmap always had the same name for each job submitted. This means if you schedule 2 spark jobs in the same namespace there will be conflicts.
Test build #129433 has finished for PR 29934 at commit
|
Kubernetes integration test starting |
Kubernetes integration test status success |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1, LGTM. Merged to master/3.0 for Apache Spark 3.1.0/3.0.2.
### What changes were proposed in this pull request? The pod template configmap always had the same name. This PR makes it unique. ### Why are the changes needed? If you scheduled 2 spark jobs they will both use the same configmap name this will result in conflicts. This PR fixes that **BEFORE** ``` $ kubectl get cm --all-namespaces -w | grep podspec podspec-configmap 1 65s ``` **AFTER** ``` $ kubectl get cm --all-namespaces -w | grep podspec aaece65ef82e4a30b7b7800aad600d4f spark-test-app-aac9f37502b2ca55-driver-podspec-conf-map 1 0s ``` This can be seen when running the integration tests ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Unit tests and the integration tests test if this works Closes #29934 from stijndehaes/bugfix/SPARK-32067-unique-name-for-template-configmap. Authored-by: Stijn De Haes <stijndehaes@gmail.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com> (cherry picked from commit 3099fd9) Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
Thank you again, @stijndehaes . I added you to the Apache Spark contributor group and assigned SPARK-32067 to you. Welcome to the Apache Spark community again! |
Awesome. Thanks guys @stijndehaes @dongjoon-hyun |
@dongjoon-hyun @yuj Thanks for all the help guys :) |
### What changes were proposed in this pull request? The pod template configmap always had the same name. This PR makes it unique. ### Why are the changes needed? If you scheduled 2 spark jobs they will both use the same configmap name this will result in conflicts. This PR fixes that **BEFORE** ``` $ kubectl get cm --all-namespaces -w | grep podspec podspec-configmap 1 65s ``` **AFTER** ``` $ kubectl get cm --all-namespaces -w | grep podspec aaece65ef82e4a30b7b7800aad600d4f spark-test-app-aac9f37502b2ca55-driver-podspec-conf-map 1 0s ``` This can be seen when running the integration tests ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Unit tests and the integration tests test if this works Closes apache#29934 from stijndehaes/bugfix/SPARK-32067-unique-name-for-template-configmap. Authored-by: Stijn De Haes <stijndehaes@gmail.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com> (cherry picked from commit 3099fd9) Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
What changes were proposed in this pull request?
The pod template configmap always had the same name. This PR makes it unique.
Why are the changes needed?
If you scheduled 2 spark jobs they will both use the same configmap name this will result in conflicts. This PR fixes that
BEFORE
AFTER
This can be seen when running the integration tests
Does this PR introduce any user-facing change?
No
How was this patch tested?
Unit tests and the integration tests test if this works