Skip to content
This repository has been archived by the owner on Mar 3, 2023. It is now read-only.

Add support for Persistent Volumes for stateful storage #3723

Closed
nicknezis opened this issue Oct 25, 2021 · 4 comments · Fixed by #3725, #3747 or #3752
Closed

Add support for Persistent Volumes for stateful storage #3723

nicknezis opened this issue Oct 25, 2021 · 4 comments · Fixed by #3725, #3747 or #3752

Comments

@nicknezis
Copy link
Contributor

We would like to add a set of submit parameters that allow for specifying PersistentVolumeClaims and mount points similar to the feature found in Spark (described here: https://spark.apache.org/docs/latest/running-on-kubernetes.html#using-kubernetes-volumes)

@nicknezis nicknezis created this issue from a note in Kubernetes Scheduler Improvements (To do) Oct 25, 2021
@surahman
Copy link
Member

surahman commented Oct 27, 2021

This should be rather straightforward to implement for the CLI. We can use KubernetesController.getConfigItemsByPrefix to collect all the relevant options from the config. As per our discussions, we should be using dynamic provisioning?

Suggested commands are below based on the Spark example, anything else we should add?

--config-property heron.kubernetes.persistentVolumeClaim.[volume name].options.claimName=OnDemand
--config-property heron.kubernetes.persistentVolumeClaim.[volume name].options.storageClass=gp
--config-property heron.kubernetes.persistentVolumeClaim.[volume name].options.sizeLimit=500Gi
--config-property heron.kubernetes.persistentVolumeClaim.[volume name].mount.path=/data
--config-property heron.kubernetes.persistentVolumeClaim.[volume name].mount.readOnly=false

I can start work on this but I can only take it as far as where it needs to be wired into #3710. I will then need to rebase onto that PR and wire it in. I will need also to clean up the test suite and perform some other general merge-conflict like clean-up operations at that point.

@surahman
Copy link
Member

An idea that I had is a workflow where users put all their K8s configs, including the pod template, into a directory and then load them into a ConfigMap. The configs users wish to have loaded into the containers is then provided using --config-property.

@surahman
Copy link
Member

Looking through the Spark documentation there seem to be the following options supported:

  • options: claimName
  • options: storageClass
  • options: sizeLimit
  • mount: path
  • mount: subPath
  • mount: readOnly

There is a multitude of options available on the K8s API.

@surahman
Copy link
Member

I have the PVC assembly part of the PR completed and I am now working on wiring all this up to make sure it works correctly with custom Pod Templates.

Commands:

--config-property heron.kubernetes.volumes.persistentVolumeClaim.volumeNameOfChoice.claimName=nameOfVolumeClaim
--config-property heron.kubernetes.volumes.persistentVolumeClaim.volumeNameOfChoice.storageClassName=storageClassNameOfChoice
--config-property heron.kubernetes.volumes.persistentVolumeClaim.volumeNameOfChoice.accessModes=comma,separated,list
--config-property heron.kubernetes.volumes.persistentVolumeClaim.volumeNameOfChoice.sizeLimit=555Gi
--config-property heron.kubernetes.volumes.persistentVolumeClaim.volumeNameOfChoice.volumeMode=volumeModeOfChoice
--config-property heron.kubernetes.volumes.persistentVolumeClaim.volumeNameOfChoice.path=path/to/mount
--config-property heron.kubernetes.volumes.persistentVolumeClaim.volumeNameOfChoice.subPath=sub/path/to/mount

Will generate the PVC:

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: nameOfVolumeClaim
spec:
  volumeName: volumeNameOfChoice
  accessModes:
    - comma
    - separated
    - list
  volumeMode: volumeModeOfChoice
  resources:
    requests:
      storage: 555Gi
  storageClassName: storageClassNameOfChoice

Entries will be made in the Pod for a Volume and in the executor container for the VolumeMount with the path as well as the subPath, as required.

The commands above are all that I have added for now but the code is designed so that you can easily add an enum for the PVC property. You would then need to add an entry to the switch statement which adds it to the actual PVC. This should make things more maintainable and significantly more extensible.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.