-
Notifications
You must be signed in to change notification settings - Fork 28.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-35956][K8S] Support auto assigning labels to decommissioning pods #33270
[SPARK-35956][K8S] Support auto assigning labels to decommissioning pods #33270
Conversation
Test build #140814 has finished for PR 33270 at commit
|
Kubernetes integration test starting |
Kubernetes integration test status success |
@@ -284,6 +284,24 @@ private[spark] object Config extends Logging { | |||
.toSequence | |||
.createWithDefault(Nil) | |||
|
|||
val KUBERNETES_EXECUTOR_POD_DECOMMISSION_LABEL = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
doc updating for new config: docs/running-on-kubernetes.md
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good call :)
val KUBERNETES_EXECUTOR_POD_DECOMMISSION_LABEL_VALUE = | ||
ConfigBuilder("spark.kubernetes.executor.pod.decommmissionLabelValue") | ||
.doc("Label value to apply to a pod which is being decommissioned." + | ||
" Designed for use with pod disruption budgets and similar mechanism " + |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto, redundant space
val KUBERNETES_EXECUTOR_POD_DECOMMISSION_LABEL = | ||
ConfigBuilder("spark.kubernetes.executor.pod.decommmissionLabel") | ||
.doc("Label to apply to a pod which is being decommissioned." + | ||
" Designed for use with pod disruption budgets and similar mechanism " + |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: redundant space in line end.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks :)
.doc("Label value to apply to a pod which is being decommissioned." + | ||
" Designed for use with pod disruption budgets and similar mechanism " + | ||
" such as pod-deletion-cost.") | ||
.version("3.2.0") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And maybe 3.3.0 for now?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah good point
Sorry for being late, @holdenk . I was on vacation last week. |
@@ -181,7 +184,51 @@ private[spark] class KubernetesClusterSchedulerBackend( | |||
super.getExecutorIds() | |||
} | |||
|
|||
private def labelLowPriorityExecs(execIds: Seq[String]) = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The naming labelLowPriorityExecs
looks misleading to me because the pods already have PriorityClass
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. We could say labelExecsForDecommissioning
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good to me.
" such as pod-deletion-cost.") | ||
.version("3.2.0") | ||
.stringConf | ||
.createWithDefault("") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shall we use createOptional
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So my thought was that for a labels value we'd use an empty string if the user doesn't specify a value, but I'm down for something else.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, we can still attach LABEL
without values with createOptional
conf.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So I don't have strong feelings about this one, but the label does need a value (albiet the empty string is a valid value), so I've switched to use optional and then we do a getOrElse on it but I can switch it back.
...st/scala/org/apache/spark/scheduler/cluster/k8s/KubernetesClusterSchedulerBackendSuite.scala
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you address the review comments from @Yikun ? I also added some comments.
cc @attilapiros
...rc/main/scala/org/apache/spark/scheduler/cluster/k8s/KubernetesClusterSchedulerBackend.scala
Outdated
Show resolved
Hide resolved
Test build #141090 has finished for PR 33270 at commit
|
Kubernetes integration test starting |
Kubernetes integration test status success |
Thank you for updating, @holdenk . |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we could extend the integration tests of DecommissionSuite
as well with the decommmissionLabel
and decommmissionLabelValue
configs and at the end of the tests we could check whether it is set. WDYT?
...st/scala/org/apache/spark/scheduler/cluster/k8s/KubernetesClusterSchedulerBackendSuite.scala
Outdated
Show resolved
Hide resolved
...st/scala/org/apache/spark/scheduler/cluster/k8s/KubernetesClusterSchedulerBackendSuite.scala
Show resolved
Hide resolved
Yeah I think we can extend that test suite to pick it up as well so we aren't just depending on mocks/unit tests. I'll give that shot. |
937732e
to
83be9be
Compare
Ah now I'm remembering why I didn't change the |
Test build #141353 has finished for PR 33270 at commit
|
Kubernetes integration test starting |
Kubernetes integration test status success |
Test build #141368 has finished for PR 33270 at commit
|
Kubernetes integration test starting |
Kubernetes integration test status failure |
Could you take a look at K8s IT failure? You may want to re-trigger it simply.
|
Test build #141434 has finished for PR 33270 at commit
|
Increase timeouts for running in CI.
Bump up timeout
…L_VALUE as suggested by dongjoon.
e21d2c7
to
bb52a4c
Compare
Test build #141511 has finished for PR 33270 at commit
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Like the existing Spark label configuration spark.kubernetes.executor.label.
, shall we use spark.kubernetes.executor.decommmissionLabel
instead of adding a new namespace .pod.
?
Kubernetes integration test starting |
docs/running-on-kubernetes.md
Outdated
</tr> | ||
<tr> | ||
<td><code>spark.kubernetes.executor.pod.decommmissionLabelValue<code></td> | ||
<td>""</td> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
""
-> (none)
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks good catch.
Kubernetes integration test status success |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1, LGTM (with two minor comments). K8s IT passed, too.
- Avoiding
.pod.
namespace from the conf naming. - Fixing the default value for
decommmissionLabelValue
.
Thank you, @holdenk . After revising them, feel free to merge.
Test build #141525 has finished for PR 33270 at commit
|
Kubernetes integration test starting |
Kubernetes integration test status failure |
Test build #141533 has finished for PR 33270 at commit
|
Kubernetes integration test starting |
Kubernetes integration test status success |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I went through it again and I haven't found any issue.
LGTM
Merged to the current master. |
### What changes were proposed in this pull request? This is a follow-up of #33270 to fix typos in config names ```scala - ConfigBuilder("spark.kubernetes.executor.decommmissionLabel") + ConfigBuilder("spark.kubernetes.executor.decommissionLabel") - ConfigBuilder("spark.kubernetes.executor.decommmissionLabelValue") + ConfigBuilder("spark.kubernetes.executor.decommissionLabelValue") ``` ### Why are the changes needed? To fix them before Apache Spark 3.3 branch cut. ### Does this PR introduce _any_ user-facing change? No, this is not released yet. ### How was this patch tested? Pass the CIs. Closes #35767 from dongjoon-hyun/SPARK-35956. Authored-by: Dongjoon Hyun <dongjoon@apache.org> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
What changes were proposed in this pull request?
Add a new configuration flag to allow Spark to provide hints to the scheduler when we are decommissioning or exiting a pod that this pod will have the least impact for a pre-emption event.
Why are the changes needed?
Kubernetes added the concepts of pod disruption budgets (which can have selectors based on labels) as well pod deletion for providing hints to the scheduler as to what we would prefer to have pre-empted.
Does this PR introduce any user-facing change?
New configuration flag
How was this patch tested?
The deletion unit test was extended.