[SPARK-24353][K8S][2.4] Support for specifying multiple custom tolerations to driver and executor pods #26505
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
What changes were proposed in this pull request?
The PR allows a user to specify custom k8s tolerations when running Spark on kubernetes for both driver and executor pods.
The user can now set the key, effect, operator and value properties when running spark-submit, as long as they're prefixed by:
spark.kubernetes.driver.tolerations.n
orspark.kubernetes.executor.tolerations.n
where n = a number starting from 0 that is incremented of 1.
Why not using the key field, as per original design doc attached to the JIRA issue?
As per kubernetes toleration v1 spec, the field key can be unset, thus disqualifying it as a unique identifier.
Some examples:
Why are the changes needed?
Spark already supports using node selectors, which is helpful to pin driver/executor pods to a specific worker group. However, without tainting the workers and setting at least one toleration to the pods, the specialised worker group cannot ensure repelling non-spark pods, which could potentially lead to undesired, potentially unsuitable pods being scheduled in the specialised worker group.
This can be very important for many use cases, such as running spark jobs in AWS spot instances, which can be reclaimed at anytime by Amazon thus making it not suitable for every kind of workload.
Does this PR introduce any user-facing change?
No, it's a new feature.
How was this patch tested?
added both unit tests and k8s integration tests