Skip to content
DriveScale Kubernetes Volume plugins
Go Makefile Dockerfile Shell
Branch: master
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
build
cmd
deploy
helm
kubernetes
pkg
.gitignore
Jenkinsfile
LICENSE
Makefile
README.md
go.mod
go.sum

README.md

DriveScale Kubernetes volume driver

This document describes the steps required to install the DriveScale CSI driver on Kubernetes and gives directions as to how to approach debugging issues. For details on CSI in Kubernetes, you can check https://kubernetes-csi.github.io/docs/

Step-by-step guide

This document assumes you have a working Kubernetes or Openshift setup

  • Ensure that you're running a Kubernetes 1.13 or above (1.14+ recommended)
  • Ensure that you're running DriveScale composable platform 3.4.1 or above
  • Download the deployment files from jenkins (yaml files):
    • csi-attacher.yaml
    • csi-driver.yaml
    • csi-namespace.yaml
    • csi-provisioner.yaml
    • csi-rbac.yaml
    • drivescale-secret.yaml
    • podwatcher.yaml
    • podwatcher-rbac.yaml
  • Update the drivescale-secret.yaml with your DMS username, password, fqdn, CSI cluster name in base64 format
  • Apply the yaml files in the following order
    • csi-namespace.yaml
    • csi-rbac.yaml
    • podwatcher-rbac.yaml
    • drivescale-secret.yaml
    • csi-driver.yaml
    • csi-attacher.yaml
    • csi-provisioner.yaml
    • podwatcher.yaml

You should be done, you now have a working CSI driver

Storage class parameters

The storage class accepts the following parameters:

  • encrypt (bool - default false): should the volume be encrypted
  • type (string - hdd / ssd / slice - default hdd): type of volume (hard drive, ssd drive, slice of ssd drive)
  • fsType (string - ext4 / xfs - default ext4): the filesystem type to use on the volume
  • striping (bool - default true): if the plugin cannot fit the volume size on one drive, it will try create a RAID10 array that can accommodate the total size requested. This is not supported for slices.
  • redundancy (bool - default false): the plugin will create the volume as a RAID1 array to ensure the volume can survive a drive failure
  • maxVolumeSize (int): the maximum size allowed for a volume, the plugin will return an error if a volume larger than this value is requested with that storage class
  • minVolumeSize (int): the minimum size allowed for a volume, the plugin will return an error if a volume smaller than this value is requested with that storage class
  • softDomains (bool - default false): when allocating drives for a volumes, do we allow to use drives not meeting the bandwidth domains requirements
  • requiredTags (string): comma separated list of all the tags that the drives / slices used in the volume must have set
  • excludedTags (string): comma separated list of any tag that would exclude a drive from being used in the volume
  • storageGroup (string): comma separated list of storage groups to choose the drives from
  • rpm (int): minimum RPM value of drives used in the cluster (ignored for ssd / slice)
  • raidMaxDrives (int - default 16): how many drives to use in a RAID array at most
  • sgResiliency (bool - default true): should the plugin ensure that RAID volumes will survive a storage group failure (volume creation will fail if not possible). If false, the plugin will still try to provide storage group failure resiliency if it can 
  • jbodResiliency (bool - default true): should the plugin ensure that RAID volumes will survive a JBOD failure (volume creation will fail if not possible). If false, the plugin will still try to provide JBOD failure resiliency if it can
  • networkTransportsAllowed (string): comma separated list of network transports allowed for the volume (allowed values are iscsi, nvmetcp, roce). If left empty, the system will use the most performant transport available.

Topology support

The CSI driver support the Topology feature of CSI. The behavior is as follow:

  • A volume or node topology string is a concatenation of its sorted bandwidth domain. The topology object for Kubernetes is a dictionary with a single key "com.drivescale/bwdomain" set to the concatenated value.
  • The plugin only looks at the required topologies, it does not do anything with the preferred topologies request.
  • When creating a volume, it will try to find drives that can access every single bandwidth domain listed in the required topologies

For some more details about Kubernetes and CSI topology, you can check https://kubernetes-csi.github.io/docs/topology.html

How to debug issues

Kubernetes is not always very convenient when it comes to debugging issues on volumes. It requires to dig in logs in a lot of different places. This section is an attempt at describing how to identify which step failed and where to find logs to diagnose the issue.

Initial running state

Before trying to use volumes, you should ensure that all the elements required by the driver are running properly. You should have one csi-driver-xxx pod per "worker" node (defined through a DaemonSet). Each of those pod contains 2 containers.

  • csi-driver-registrar logs should show that it was able to connect to the unix socket of the csi-drivescale-driver pod and that it could call GetPluginInfo.
  • csi-drivescale-driver should display correct parameters for the username, hostname and cluster name used to connect to the DMS (password is not displayed). It should also display the initial call by the driver registrar.

You should have one (three on k8s 1.14+) csi-provisioner-xxx pods. Each one contains 2 containers:

  • csi-provisioner should show that it probed the driver for readiness over the shared unix socket. And there should be an elected leader on k8s 1.14+
  • csi-drivescale-driver should display correct parameters for the DMS and should display the initial csi-provisioner calls

You should have three csi-attacher-xxx pods. Each one contains 2 containers:

  • csi-attacher should show that it probed the driver for readiness over the shared unix socket. And there should be an elected leader
  • csi-drivescale-driver should display correct parameters for the DMS and should display the initial csi-attacher calls

Note on CSI timeouts

Until Kubernetes 1.14, CSI calls timeout were statically set to an aggressive 15s. While this works fine for most calls, some steps (like attaching a volume for the first time) can take many minutes (formatting ext4 on a 1TB drive can easily take 5min). Kubernetes will tolerate this as a call that failed because of a timeout will be reissued after an exponential backoff. However, this can mean that in some cases, it can take a long time before a volume is actually attached because the exponential fall back became pretty high (it can go up to more than 10min...). On Kubernetes 1.14 and later, we are able to set the length of the CSI timeout on the attacher and provisioner. This allows for better reactivity since we don't hit those high duration fall back timer anymore.

Volume creation

When using volumes, you'll usually start from a PersistentVolumeClaim (PVC).

The provisioner (csi-provisioner-xxx pods) contain 2 containers (csi-provisioner and csi-drivescale-driver). The csi-provisioner pod is a bridge provided by Kubernetes between the CSI driver and the Kubernetes API. It listens for PVC that are not bound and whose provisioner name is set to the name of the CSI driver (csi.drivescale.com in our case).

When it sees an unbound PVC, it will contact the CSI driver over their shared unix socket and issue a CreateVolume call. This should result in the create of a PersistentVolume (PV) and this PV will be bound to the PVC.

On the DriveScale side of things, when a volume is created, you should see a new Logical Node (LN) created in the cluster you defined for Kubernetes whose name is the name of the PV (usually pvc-{uuid}). This node will contain the drives meant to be used for that volume and its RAID configuration if any will already be set.

If the PVC is bound to a PV and you can see the LN in the DMS, then the volume creation happened successfully.

In case, this does not happen, you should look at the logs of the elected (k8s 1.14+) csi-provisioner-xxx pod. Start with the csi-provisioner container which should contains logs stating it's trying to create the volume and the error message returned by the csi-drivescale-driver if any. This should already give some details.

For additional details, check the corresponding csi-drivescale-driver container logs and the audit logs on the DMS (look for failed cluster change or resource proposal queries). The volume creation calls will start with a "CSI CreateVolume ..." log line.

In case the csi-provisioner container does not show any activity; first ensure it is the elected provisioner and if so, you'll need to start looking at the logs of the k8s controller (depending on your deployment, it will either be a pod in the kube-system namespace named kube-controller-manager or a service running on the master. This could give you the explanation why the provisioner was not triggered.

Volume attach

When Kubernetes decides to use a volume with a pod, it goes through a 3 step process in CSI:

  • Controller publish volume (handled by the elected csi-attacher-xxx pod). During this step, the plugin will attach the node hosting the pod using the volume to the logical node. There won't be any mount point set yet (using the _nomount option). However, if it's the first time the volume is used, it will be formatted at that time (which can take multiple minutes with ext4 on large volumes).
  • Node stage volume (handled by the cis-driver-xxx pod running on the node hosting the pod using the volume). During this step, the plugin will change the mount point of the LN to a "global" mount point for the pod.
  • Node publish volume (handled by the cis-driver-xxx pod running on the node hosting the pod using the volume). During this step, the plugin will bind mount the "global" mount point to a "container" mount point.

A successful attach will result in the LN for the PV being attached to the node hosting the pod and having its mount point set to the "global" mount point. On the node hosting the pod, you'll see that mount point as well as as many bind mount as there are container in the pod using the volume.

To debug any issue, you must first identify which step failed. To do so, you'll have to retrace the steps from the start.

Start with the elected csi-attacher-xxx pod. It will contain 2 containers, csi-attacher and csi-drivescale-driver.

Similar to the provisioner, start with the csi-attacher container and look for volume attach log line and potential error message. For more details, check the csi-drivescale-driver logs. Those steps will start with a "CSI ControllerPublishVolume" log line and if successful, end with a "CSI ControllerPublishVolume result" log line. On the DMS, looks for cluster config changes that failed (look for add/attach server with the server id of the node hosting the pod).

During the attach phase, the plugin waits until the state of the volume on the DMS is "FORMATTED" or later. So beyond errors on config changes, one reason for failure is if the DMS never reports that the LN reached that state. In that case, you should investigate on the node why the node was unable to get there (start with /var/logs/drivescale/ds_agent_zk.log).

In case the csi-attacher container does not show any activity; first ensure it is the elected attacher and if so, you'll need to start looking at the logs of the k8s controller (depending on your deployment, it will either be a pod in the kube-system namespace named kube-controller-manager or a service running on the master. This could give you the explanation why the provisioner was not triggered.

If this step succeeded, move to the csi-driver-xxx pod running on the node hosting the pod.

On the pod's node, you can ignore the csi-driver-registrar container in the pod, it's only used to register the driver at start time. Check the csi-drivescale-driver logs for the plugin calls. You should see "CSI NodeStageVolume ..." lines.

During the node stage phase, the plugin will ask the DMS to set the mount point of the LN to the "global" mount point and will wait until the state of the volume on the DMS is "MOUNTED". So failure can be either due to a config change failure (check the audit log on the DMS for full details as to why the config change failed) or if the volume never reaches the "MOUNTED" state. In that case, you should investigate on the node why the node was unable to get there (start with /var/logs/drivescale/ds_agent_zk.log).

If there is a "CSI NodeStageVolume result ..." log line, then the stage phase succeeded and you should start looking for log lines related to the node publish phase. 

You should see "CSI NodePublishVolume" lines in the logs of csi-drivescale-driver.

During the node publish phase, there is no interaction with the DMS. The only operations are local to the node (making the bind mount and remounting it read only if the volume is in read only mode). Any debugging of issues in that phase has to be based on the logs of csi-drivescale-driver.

If there is "CSI NodePublishVolume result ..." log line, then the publish phase succeeded and the volume should be available 

If you cannot find any activity in the driver, ensure you're looking at the correct node, and if so, you'll need to check the kubelet logs on that node to understand why the plugin is not being called.

Volume detach

When Kubernetes decides to stop using a volume with a pod, it goes through a 3 step process in CSI:

  • Node unpublish volume (handled by the cis-driver-xxx pod running on the node hosting the pod using the volume). During this step, the plugin will unmount the "container" bind mount and remove the directory
  • Node unstage volume (handled by the cis-driver-xxx pod running on the node hosting the pod using the volume). During this step, the plugin will change the mount point of the LN in the DMS to the "_nomount" mount point
  • Controller unpublish volume (handled by the elected csi-attacher-xxx pod). During this step, the plugin will detach the node hosting the pod from the volume's LN.

A successful detach will result in the LN for the PV being detached from the node it was previously attached to.

Debugging of issues is similar to the attach phase except that you should trace back your steps. The log lines will have similar text "CSI NodeUnpublishVolume ...", "CSI NodeUnstageVolume ..." and "CSI ControllerUnpublishVolume".

Volume deletion

A PV is automatically deleted if its reclaim policy is set to DELETE and the PVC it is bound to is deleted or if it is explicitly deleted. In that case, the provisioner will be issued with a Delete Volume call.

A successful deletion should remove the PV LN from the cluster.

Debugging is similar to volume creation

You can’t perform that action at this time.