Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-32971][K8S] Support dynamic PVC creation/deletion for K8s executors #29846

Closed
wants to merge 1 commit into from
Closed

[SPARK-32971][K8S] Support dynamic PVC creation/deletion for K8s executors #29846

wants to merge 1 commit into from

Conversation

dongjoon-hyun
Copy link
Member

@dongjoon-hyun dongjoon-hyun commented Sep 23, 2020

What changes were proposed in this pull request?

This PR aims to support dynamic PVC creation and deletion for K8s executors. The PVCs are created with executor pods and deleted when the executor pods are deleted.

Configuration
Mostly, this PR reuses the existing PVC volume configs and storageClass is added.

spark.executor.instances=2
spark.kubernetes.executor.volumes.persistentVolumeClaim.spark-local-dir-1.options.claimName=OnDemand
spark.kubernetes.executor.volumes.persistentVolumeClaim.spark-local-dir-1.options.storageClass=gp2
spark.kubernetes.executor.volumes.persistentVolumeClaim.spark-local-dir-1.options.sizeLimit=500Gi
spark.kubernetes.executor.volumes.persistentVolumeClaim.spark-local-dir-1.mount.path=/data
spark.kubernetes.executor.volumes.persistentVolumeClaim.spark-local-dir-1.mount.readOnly=false

Executors

$ kubectl get pod -l spark-role=executor
NAME                               READY   STATUS    RESTARTS   AGE
spark-pi-f4d80574b9bb0941-exec-1   1/1     Running   0          2m6s
spark-pi-f4d80574b9bb0941-exec-2   1/1     Running   0          2m6s

PVCs

$ kubectl get pvc
NAME                                     STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLA
SS   AGE
spark-pi-f4d80574b9bb0941-exec-1-pvc-0   Bound    pvc-7d20173f-278b-4c7b-b7e5-7f0ed414ee64   500Gi      RWO            gp2
     48s
spark-pi-f4d80574b9bb0941-exec-2-pvc-0   Bound    pvc-1138f00d-87f1-47f4-9b58-ce5d13ea0c3a   500Gi      RWO            gp2
     48s

Executor Disk

$ k exec -it spark-pi-f4d80574b9bb0941-exec-1 -- df -h /data
Filesystem      Size  Used Avail Use% Mounted on
/dev/nvme3n1    493G   74M  492G   1% /data
$ k exec -it spark-pi-f4d80574b9bb0941-exec-1 -- ls /data
blockmgr-81dcebaf-11a7-4d7b-91d6-3c580187d914
lost+found
spark-6be42db8-2c58-4389-b52c-8aeeafe76bd5

Why are the changes needed?

While SPARK-32655 supports to mount a pre-created PVC, this PR can create PVC itself dynamically and reduce lots of manual efforts.

Does this PR introduce any user-facing change?

Yes. This is a new feature.

How was this patch tested?

Pass the newly added test cases.

@SparkQA
Copy link

SparkQA commented Sep 23, 2020

Test build #129011 has finished for PR 29846 at commit 2560335.

  • This patch fails Scala style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Sep 23, 2020

Test build #129013 has finished for PR 29846 at commit 97d80fc.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Sep 23, 2020

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/33634/

@SparkQA
Copy link

SparkQA commented Sep 23, 2020

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/33636/

@SparkQA
Copy link

SparkQA commented Sep 23, 2020

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/33634/

@SparkQA
Copy link

SparkQA commented Sep 23, 2020

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/33636/

@HyukjinKwon
Copy link
Member

cc @holdenk, could you take a look for the test failure please? K8S PRs are blocked by it.

@dongjoon-hyun
Copy link
Member Author

The decomission failure is irrelevant to this one.

KubernetesSuite:
- Run SparkPi with no resources
- Run SparkPi with a very long application name.
- Use SparkLauncher.NO_RESOURCE
- Run SparkPi with a master URL without a scheme.
- Run SparkPi with an argument.
- Run SparkPi with custom labels, annotations, and environment variables.
- All pods have the same service account by default
- Run extraJVMOptions check on driver
- Run SparkRemoteFileTest using a remote data file
- Run SparkPi with env and mount secrets.
- Run PySpark on simple pi.py example
- Run PySpark with Python3 to test a pyfiles example
- Run PySpark with memory customization
- Run in client mode.
- Start pod creation from template
- PVs with local storage
- Launcher client dependencies
- Test basic decommissioning *** FAILED ***

@holdenk
Copy link
Contributor

holdenk commented Sep 23, 2020

I agree this PR isn't touching anything in the decommissioning logic. That being said, I'll spend some time today on the decommissioning integration tests.

@dongjoon-hyun
Copy link
Member Author

cc @holdenk , @dbtsai , @viirya , @sunchao

@dongjoon-hyun
Copy link
Member Author

Thanks, @holdenk .

@HyukjinKwon
Copy link
Member

Thanks @holdenk and @dongjoon-hyun.

Copy link
Contributor

@holdenk holdenk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks reasonable at first glance, minor question around if it makes sense to make some more parts of the PVC creation configurable while we're here.


// Add a OwnerReference to the given resources making the pod an owner of them so when
// the pod is deleted, the resources are garbage collected.
def addOwnerReference(pod: Pod, resources: Seq[HasMetadata]): Unit = {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this :)
Initially I was thinking about asking about re-using the PVCs because I know that in some systems PVC creating is slow (for dynamic scaling), but it's probably better to return the resources and not do some weird storage pool like I was thinking of.

Comment on lines +122 to +125
val PVC_ON_DEMAND = "OnDemand"
val PVC = "PersistentVolumeClaim"
val PVC_POSTFIX = "-pvc"
val PVC_ACCESS_MODE = "ReadWriteOnce"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it make sense to have the OnDemand, postfix, and access mode configurable?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ya, it's possible, but I didn't do that in this PR because of the followings.

  • PVC_ON_DEMAND: It's a dummy placeholder because the existing code expects some pre-defined names. We had better recommend a fixed one instead of making a configurable one in this case.
  • PVC_POSTFIX: It can be configurable but doesn't give much benefit to users because this is a part of transient ids and the prefix already guarantees no conflicts.
  • PVC_ACCESS_MODE: Ya. I thought like you at the beginning, but I changed to this form to reduce the problem surface. Although PVC_ACCESS_MODE config makes sense a lot, I leave this PR to focus on a fixed one because this PR aims to generate a new PVC for each executor. In other words, this PR is not suggesting creating a ReadWriteMany PVC and sharing across in multiple executors.

For ReadWriteMany PVC, we don't need to use this PR. The existing Spark PVC feature can mount a single ReadWriteMany PVC into all executors without any problem and there is no burden to maintain ReadWriteMany PVC, because it's always a single one. In addition, we also support NFS (AWS EFS) mounting, additionally.

Comment on lines -166 to -181
// Add a OwnerReference to the given resources making the driver pod an owner of them so when
// the driver pod is deleted, the resources are garbage collected.
private def addDriverOwnerReference(driverPod: Pod, resources: Seq[HasMetadata]): Unit = {
val driverPodOwnerReference = new OwnerReferenceBuilder()
.withName(driverPod.getMetadata.getName)
.withApiVersion(driverPod.getApiVersion)
.withUid(driverPod.getMetadata.getUid)
.withKind(driverPod.getKind)
.withController(true)
.build()
resources.foreach { resource =>
val originalMetadata = resource.getMetadata
originalMetadata.setOwnerReferences(Collections.singletonList(driverPodOwnerReference))
}
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for unifying this code :)

@dongjoon-hyun
Copy link
Member Author

Thank you for review, @holdenk .

@dongjoon-hyun
Copy link
Member Author

Retest this please.

@holdenk
Copy link
Contributor

holdenk commented Sep 23, 2020

Thanks for the clarity @dongjoon-hyun LGTM pending jenkins

@dongjoon-hyun
Copy link
Member Author

Thanks!

@SparkQA
Copy link

SparkQA commented Sep 23, 2020

Test build #129050 has finished for PR 29846 at commit 97d80fc.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Comment on lines +26 to +27
storageClass: Option[String] = None,
size: Option[String] = None)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do storageClass and size must be both defined or empty? Can we only define storageClass but without size?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. We need both. In this PR, if one of them is missing, the fallback operation is the existing PVC mounting behavior. So, it doesn't try to create PVC and assume the PVC exists with the given PVC name.

Copy link
Member

@viirya viirya left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know too much details about K8s, but the change looks making sense.

@SparkQA
Copy link

SparkQA commented Sep 23, 2020

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/33671/

@SparkQA
Copy link

SparkQA commented Sep 23, 2020

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/33671/

@dongjoon-hyun
Copy link
Member Author

All tests passed. Merged to master for Apache Spark 3.1.0 on December 2020.

@dongjoon-hyun dongjoon-hyun deleted the SPARK-32971 branch September 23, 2020 23:47
@dongjoon-hyun
Copy link
Member Author

Thank you, @holdenk and @viirya .

@viirya
Copy link
Member

viirya commented Sep 24, 2020

@dongjoon-hyun Looks like this broke Scala 2.13 build. Would you like to fix it?

Error: ] /home/runner/work/spark/spark/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/features/MountVolumesFeatureStep.scala:117: type mismatch;
 found   : scala.collection.mutable.ArrayBuffer[io.fabric8.kubernetes.api.model.HasMetadata]
 required: Seq[io.fabric8.kubernetes.api.model.HasMetadata]

@dongjoon-hyun
Copy link
Member Author

Oh, thank you for reporting! Sure!

@dongjoon-hyun
Copy link
Member Author

The follow-up PR is ready, #29859 .

holdenk pushed a commit to holdenk/spark that referenced this pull request Oct 27, 2020
…utors

This PR aims to support dynamic PVC creation and deletion for K8s executors. The PVCs are created with executor pods and deleted when the executor pods are deleted.

**Configuration**
Mostly, this PR reuses the existing PVC volume configs and `storageClass` is added.
```
spark.executor.instances=2
spark.kubernetes.executor.volumes.persistentVolumeClaim.spark-local-dir-1.options.claimName=OnDemand
spark.kubernetes.executor.volumes.persistentVolumeClaim.spark-local-dir-1.options.storageClass=gp2
spark.kubernetes.executor.volumes.persistentVolumeClaim.spark-local-dir-1.options.sizeLimit=500Gi
spark.kubernetes.executor.volumes.persistentVolumeClaim.spark-local-dir-1.mount.path=/data
spark.kubernetes.executor.volumes.persistentVolumeClaim.spark-local-dir-1.mount.readOnly=false
```

**Executors**
```
$ kubectl get pod -l spark-role=executor
NAME                               READY   STATUS    RESTARTS   AGE
spark-pi-f4d80574b9bb0941-exec-1   1/1     Running   0          2m6s
spark-pi-f4d80574b9bb0941-exec-2   1/1     Running   0          2m6s
```

**PVCs**
```
$ kubectl get pvc
NAME                                     STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLA
SS   AGE
spark-pi-f4d80574b9bb0941-exec-1-pvc-0   Bound    pvc-7d20173f-278b-4c7b-b7e5-7f0ed414ee64   500Gi      RWO            gp2
     48s
spark-pi-f4d80574b9bb0941-exec-2-pvc-0   Bound    pvc-1138f00d-87f1-47f4-9b58-ce5d13ea0c3a   500Gi      RWO            gp2
     48s
```

**Executor Disk**
```
$ k exec -it spark-pi-f4d80574b9bb0941-exec-1 -- df -h /data
Filesystem      Size  Used Avail Use% Mounted on
/dev/nvme3n1    493G   74M  492G   1% /data
```

```
$ k exec -it spark-pi-f4d80574b9bb0941-exec-1 -- ls /data
blockmgr-81dcebaf-11a7-4d7b-91d6-3c580187d914
lost+found
spark-6be42db8-2c58-4389-b52c-8aeeafe76bd5
```

While SPARK-32655 supports to mount a pre-created PVC, this PR can create PVC itself dynamically and reduce lots of manual efforts.

Yes. This is a new feature.

Pass the newly added test cases.

Closes apache#29846 from dongjoon-hyun/SPARK-32971.

Authored-by: Dongjoon Hyun <dhyun@apple.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
(cherry picked from commit 46ff4164511b9515e8fd9f8eaeeea13f27d32bac)
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
dongjoon-hyun pushed a commit that referenced this pull request May 8, 2022
…dynamic allocation failure

### What changes were proposed in this pull request?

This PR aims to show a directional error message for executor PVC dynamic allocation failure.

### Why are the changes needed?

#29846 supports dynamic PVC creation/deletion for K8s executors.
#29557 support execId placeholder in executor PVC conf.
If not set `spark.kubernetes.executor.volumes.persistentVolumeClaim.spark-local-dir-1.options.claimName` with `onDemand` or `SPARK_EXECUTOR_ID`, spark will continue to try to create the executor pod.
After this PR, spark can show a directional error message for this situation.
```plain
ERROR ExecutorPodsSnapshotsStoreImpl: Going to stop due to IllegalArgumentException
java.lang.IllegalArgumentException: PVC ClaimName should contain OnDemand or SPARK_EXECUTOR_ID when multiple executors are required
```

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

Add unit test.

Closes #36374 from dcoliversun/SPARK-39006.

Authored-by: Qian.Sun <qian.sun2020@gmail.com>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
5 participants