-
Notifications
You must be signed in to change notification settings - Fork 28.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-23529][K8s] Support mounting volumes #21260
Conversation
jenkins, ok to test |
Test build #90363 has finished for PR 21260 at commit
|
Kubernetes integration test starting |
Kubernetes integration test status success |
This is indeed concerning, given that we don't yet support a lot of pod customization options yet, e.g., affinity and anti-affinity, security context, etc. Ideally, pod specs should be specified declaratively like in Deployment and StatefulSet, but Spark is configuration property based. The Spark Operator attempted to address this using initializers. But initializers are alpha and dangerous. Admission webhooks are an option, but again, they pose risks. |
@liyinan926 sounds fair, it also pending tests, I'll add those in todo list for this PR and ping you once its done |
How does one configure a PV/PVC with this change? |
@er0sin PVCs can be mounted similarly to the example below:
|
Test build #90956 has finished for PR 21260 at commit
|
Kubernetes integration test starting |
Kubernetes integration test status success |
@liyinan926 now Would appreciate feedback on documentation, as we have to make clear which features are supported. Following refactorings might be done:
|
Test build #90960 has finished for PR 21260 at commit
|
@AmplabJenkins failed due to github flakiness. Please retest. |
Test build #90962 has finished for PR 21260 at commit
|
@andrusha Is this ready for review? |
@liyinan926 yeap, ready for review. If you think it makes sense to make refactors I'm speaking of then I can add those too. |
* @param prefix the given property name prefix | ||
* @return a Map storing with volume name as key and spec as value | ||
*/ | ||
def parseVolumesWithPrefix( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tests are missing
Test build #91112 has finished for PR 21260 at commit
|
f87bf3f
to
f218d8a
Compare
Test build #91114 has finished for PR 21260 at commit
|
Kubernetes integration test starting |
Kubernetes integration test status success |
Kubernetes integration test status success |
import org.apache.spark.deploy.k8s._ | ||
|
||
private[spark] class MountVolumesFeatureStep( | ||
kubernetesConf: KubernetesConf[_ <: KubernetesRoleSpecificConf]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The class parameter line should be indented 4 spaces.
val sizeLimitKey = s"$volumeType.$volumeName.$KUBERNETES_VOLUMES_OPTIONS_SIZE_LIMIT_KEY" | ||
for { | ||
medium <- options.getTry(mediumKey) | ||
sizeLimit <- options.getTry(sizeLimitKey) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Both Medium
and SizeLimit
are optional for EmptyDirVolumeSource
so users are not required to set them. What happens with options.getTry
if users do not set the key?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.10/#emptydirvolumesource-v1-core I don't quite like that you have to pass ""
and null
to k8s, but it works.
|
||
for { | ||
path <- properties.getTry(pathKey) | ||
readOnly <- properties.getTry(readOnlyKey) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ReadOnly
is optional and defaults to false so users are not required to set it explicitly. Is this semantics captured by properties.getTry
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Made it optional with default false
value. We basically ended up with two ways to parse spark options now, one is custom for volumes as there is just so many combinations and one which the rest of spark is using. It must be possible to refactor this somehow, maybe improving spark options parser itself.
Test build #91663 has finished for PR 21260 at commit
|
Kubernetes integration test starting |
Kubernetes integration test status success |
Test build #91775 has finished for PR 21260 at commit
|
Kubernetes integration test starting |
Test build #91904 has finished for PR 21260 at commit
|
Kubernetes integration test status failure |
retest please |
Jenkins, retest this please |
Test build #92346 has finished for PR 21260 at commit
|
Test build #92560 has finished for PR 21260 at commit
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
@felixcheung @mccheah Can you take a look and merge this? |
@felixcheung gentle ping, this is pretty useful. |
Will review today.
|
Nice work, only one thing I would consider including a StorageClass name option for the PersistentVolumeClaim volume type which defaults to an empty string. Without that the PVC will always use the default StorageClass which may not exists in all scenarios. Thus the pod will remain in pending state indefinitely. |
@skonto is it better to generalize the approach to match the one in https://issues.apache.org/jira/browse/SPARK-24434? not sure if @mccheah @foxish @erikerlandson have any last thought |
@felixcheung This feature was discussed and this PR was started before https://issues.apache.org/jira/browse/SPARK-24434 was even brought up. Being able to mount commonly used types of volumes seems super useful for some users, so it might make sense to accept it while https://issues.apache.org/jira/browse/SPARK-24434 is still going through design review. |
ok that makes sense. could we get some clear guidance in SPARK-24434 on how to decide what should be a conf and what should be in an external template file? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, would be great to add a complete example in running-on-kubernetes.md as a follow up
(will wait for 24hr, then merge)
We talked about stop accepting any new config options for customizing the driver/executor pods. Moving forward, all new customization needs will be fulfilled by the solution to SPARK-24434. |
merged to master |
@felixcheung volumes will be supported by the pod template and so you will be able to do it without the conf options defined here. If both spark conf and template properties exist we have defined a precedence order in the design doc. |
This PR continues #21095 and intersects with #21238. I've added volume mounts as a separate step and added PersistantVolumeClaim support.
There is a fundamental problem with how we pass the options through spark conf to fabric8. For each volume type and all possible volume options we would have to implement some custom code to map config values to fabric8 calls. This will result in big body of code we would have to support and means that Spark will always be somehow out of sync with k8s.
I think there needs to be a discussion on how to proceed correctly (eg use PodPreset instead)
Due to the complications of provisioning and managing actual resources this PR addresses only volume mounting of already present resources.