New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-32971][K8S] Support dynamic PVC creation/deletion for K8s executors #29846
Conversation
Test build #129011 has finished for PR 29846 at commit
|
Test build #129013 has finished for PR 29846 at commit
|
Kubernetes integration test starting |
Kubernetes integration test starting |
Kubernetes integration test status failure |
Kubernetes integration test status failure |
cc @holdenk, could you take a look for the test failure please? K8S PRs are blocked by it. |
The decomission failure is irrelevant to this one.
|
I agree this PR isn't touching anything in the decommissioning logic. That being said, I'll spend some time today on the decommissioning integration tests. |
Thanks, @holdenk . |
Thanks @holdenk and @dongjoon-hyun. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks reasonable at first glance, minor question around if it makes sense to make some more parts of the PVC creation configurable while we're here.
|
||
// Add a OwnerReference to the given resources making the pod an owner of them so when | ||
// the pod is deleted, the resources are garbage collected. | ||
def addOwnerReference(pod: Pod, resources: Seq[HasMetadata]): Unit = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like this :)
Initially I was thinking about asking about re-using the PVCs because I know that in some systems PVC creating is slow (for dynamic scaling), but it's probably better to return the resources and not do some weird storage pool like I was thinking of.
val PVC_ON_DEMAND = "OnDemand" | ||
val PVC = "PersistentVolumeClaim" | ||
val PVC_POSTFIX = "-pvc" | ||
val PVC_ACCESS_MODE = "ReadWriteOnce" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it make sense to have the OnDemand, postfix, and access mode configurable?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ya, it's possible, but I didn't do that in this PR because of the followings.
PVC_ON_DEMAND
: It's a dummy placeholder because the existing code expects some pre-defined names. We had better recommend a fixed one instead of making a configurable one in this case.PVC_POSTFIX
: It can be configurable but doesn't give much benefit to users because this is a part of transient ids and the prefix already guarantees no conflicts.PVC_ACCESS_MODE
: Ya. I thought like you at the beginning, but I changed to this form to reduce the problem surface. AlthoughPVC_ACCESS_MODE config
makes sense a lot, I leave this PR to focus on a fixed one because this PR aims to generate a new PVC for each executor. In other words, this PR is not suggesting creating aReadWriteMany
PVC and sharing across in multiple executors.
For ReadWriteMany
PVC, we don't need to use this PR. The existing Spark PVC feature can mount a single ReadWriteMany
PVC into all executors without any problem and there is no burden to maintain ReadWriteMany
PVC, because it's always a single one. In addition, we also support NFS (AWS EFS) mounting, additionally.
// Add a OwnerReference to the given resources making the driver pod an owner of them so when | ||
// the driver pod is deleted, the resources are garbage collected. | ||
private def addDriverOwnerReference(driverPod: Pod, resources: Seq[HasMetadata]): Unit = { | ||
val driverPodOwnerReference = new OwnerReferenceBuilder() | ||
.withName(driverPod.getMetadata.getName) | ||
.withApiVersion(driverPod.getApiVersion) | ||
.withUid(driverPod.getMetadata.getUid) | ||
.withKind(driverPod.getKind) | ||
.withController(true) | ||
.build() | ||
resources.foreach { resource => | ||
val originalMetadata = resource.getMetadata | ||
originalMetadata.setOwnerReferences(Collections.singletonList(driverPodOwnerReference)) | ||
} | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for unifying this code :)
Thank you for review, @holdenk . |
Retest this please. |
Thanks for the clarity @dongjoon-hyun LGTM pending jenkins |
Thanks! |
Test build #129050 has finished for PR 29846 at commit
|
storageClass: Option[String] = None, | ||
size: Option[String] = None) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do storageClass
and size
must be both defined or empty? Can we only define storageClass
but without size
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. We need both. In this PR, if one of them is missing, the fallback operation is the existing PVC mounting behavior. So, it doesn't try to create PVC and assume the PVC exists with the given PVC name.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know too much details about K8s, but the change looks making sense.
Kubernetes integration test starting |
Kubernetes integration test status success |
All tests passed. Merged to master for Apache Spark 3.1.0 on December 2020. |
@dongjoon-hyun Looks like this broke Scala 2.13 build. Would you like to fix it? Error: ] /home/runner/work/spark/spark/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/features/MountVolumesFeatureStep.scala:117: type mismatch;
found : scala.collection.mutable.ArrayBuffer[io.fabric8.kubernetes.api.model.HasMetadata]
required: Seq[io.fabric8.kubernetes.api.model.HasMetadata] |
Oh, thank you for reporting! Sure! |
The follow-up PR is ready, #29859 . |
…utors This PR aims to support dynamic PVC creation and deletion for K8s executors. The PVCs are created with executor pods and deleted when the executor pods are deleted. **Configuration** Mostly, this PR reuses the existing PVC volume configs and `storageClass` is added. ``` spark.executor.instances=2 spark.kubernetes.executor.volumes.persistentVolumeClaim.spark-local-dir-1.options.claimName=OnDemand spark.kubernetes.executor.volumes.persistentVolumeClaim.spark-local-dir-1.options.storageClass=gp2 spark.kubernetes.executor.volumes.persistentVolumeClaim.spark-local-dir-1.options.sizeLimit=500Gi spark.kubernetes.executor.volumes.persistentVolumeClaim.spark-local-dir-1.mount.path=/data spark.kubernetes.executor.volumes.persistentVolumeClaim.spark-local-dir-1.mount.readOnly=false ``` **Executors** ``` $ kubectl get pod -l spark-role=executor NAME READY STATUS RESTARTS AGE spark-pi-f4d80574b9bb0941-exec-1 1/1 Running 0 2m6s spark-pi-f4d80574b9bb0941-exec-2 1/1 Running 0 2m6s ``` **PVCs** ``` $ kubectl get pvc NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLA SS AGE spark-pi-f4d80574b9bb0941-exec-1-pvc-0 Bound pvc-7d20173f-278b-4c7b-b7e5-7f0ed414ee64 500Gi RWO gp2 48s spark-pi-f4d80574b9bb0941-exec-2-pvc-0 Bound pvc-1138f00d-87f1-47f4-9b58-ce5d13ea0c3a 500Gi RWO gp2 48s ``` **Executor Disk** ``` $ k exec -it spark-pi-f4d80574b9bb0941-exec-1 -- df -h /data Filesystem Size Used Avail Use% Mounted on /dev/nvme3n1 493G 74M 492G 1% /data ``` ``` $ k exec -it spark-pi-f4d80574b9bb0941-exec-1 -- ls /data blockmgr-81dcebaf-11a7-4d7b-91d6-3c580187d914 lost+found spark-6be42db8-2c58-4389-b52c-8aeeafe76bd5 ``` While SPARK-32655 supports to mount a pre-created PVC, this PR can create PVC itself dynamically and reduce lots of manual efforts. Yes. This is a new feature. Pass the newly added test cases. Closes apache#29846 from dongjoon-hyun/SPARK-32971. Authored-by: Dongjoon Hyun <dhyun@apple.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com> (cherry picked from commit 46ff4164511b9515e8fd9f8eaeeea13f27d32bac) Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
…dynamic allocation failure ### What changes were proposed in this pull request? This PR aims to show a directional error message for executor PVC dynamic allocation failure. ### Why are the changes needed? #29846 supports dynamic PVC creation/deletion for K8s executors. #29557 support execId placeholder in executor PVC conf. If not set `spark.kubernetes.executor.volumes.persistentVolumeClaim.spark-local-dir-1.options.claimName` with `onDemand` or `SPARK_EXECUTOR_ID`, spark will continue to try to create the executor pod. After this PR, spark can show a directional error message for this situation. ```plain ERROR ExecutorPodsSnapshotsStoreImpl: Going to stop due to IllegalArgumentException java.lang.IllegalArgumentException: PVC ClaimName should contain OnDemand or SPARK_EXECUTOR_ID when multiple executors are required ``` ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Add unit test. Closes #36374 from dcoliversun/SPARK-39006. Authored-by: Qian.Sun <qian.sun2020@gmail.com> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
What changes were proposed in this pull request?
This PR aims to support dynamic PVC creation and deletion for K8s executors. The PVCs are created with executor pods and deleted when the executor pods are deleted.
Configuration
Mostly, this PR reuses the existing PVC volume configs and
storageClass
is added.Executors
PVCs
Executor Disk
Why are the changes needed?
While SPARK-32655 supports to mount a pre-created PVC, this PR can create PVC itself dynamically and reduce lots of manual efforts.
Does this PR introduce any user-facing change?
Yes. This is a new feature.
How was this patch tested?
Pass the newly added test cases.