Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

nfs: add basic provisioner with create/delete procedures #2948

Merged
merged 8 commits into from Mar 28, 2022

Conversation

nixpanic
Copy link
Member

These NFS Controller and Identity servers are the base for the new
provisioner. The functionality is currently extremely limited, follow-up
PRs will implement various CSI procedures.

CreateVolume is implemented with the bare minimum. This makes it
possible to create a volume, and mount it with the
kubernetes-csi/csi-driver-nfs NodePlugin.

DeleteVolume unexports the volume from the Ceph managed NFS-Ganesha
service. In case the Ceph cluster provides multiple NFS-Ganesha
deployments, things might not work as expected. This is going to be
addressed in follow-up improvements.

Lots of TODO comments need to be resolved before this can be declared
"production ready". Unit- and e2e-tests are missing as well.

Updates: #2913


Show available bot commands

These commands are normally not required, but in case of issues, leave any of
the following bot commands in an otherwise empty comment in this PR:

  • /retest ci/centos/<job-name>: retest the <job-name> after unrelated
    failure (please report the failure too!)
  • /retest all: run this in case the CentOS CI failed to start/report any test
    progress or results

@nixpanic nixpanic added the component/nfs Issues related to NFS label Mar 17, 2022
@nixpanic nixpanic requested a review from a team March 17, 2022 13:35
@Rakshith-R
Copy link
Contributor

For first few glances, it looks good to me,
few questions:

  • Will we add locks to the Create/Delete funcs like rbd and cephfs ?
  • Will there be repeated logs indicating Request and Response
  • No validation for presence of server ips in volume context that is received/being returned?

Comment on lines 7 to 10
attachRequired: false
volumeLifecycleModes:
- Persistent
- Ephemeral
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

attachRequired is false and do we plan to support Ephemeral also?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the default configuration from the csi-driver-nfs (NodePlugin)... Not sure if there are any special things needed for Ephemeral support?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think For Ephemeral the CreateVolue needs to be taken care of in NodePublish https://kubernetes-csi.github.io/docs/ephemeral-local-volumes.html#implementing-csi-ephemeral-inline-support.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

right, in that case, we should probably not support Ephemeral volumes from the start


backend := res.Volume

log.DebugLog(ctx, "CephFS volume created: %s", backend)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

debug logging of volume volume is not required?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, this can probably be removed

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, I think it is useful to have this as DebugLog(), each step in the process has such a log message. Will only log the volume-id instead of the whole volume.

return nil, status.Error(codes.InvalidArgument, err.Error())
}

err = nfsVolume.Connect(cr)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

calling Destroy is missing?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there was no go-ceph connection yet. With the updated PR go-ceph is used, and Destroy() has been added.

return nil
}

func (nv *NFSVolume) GetExportPath() string {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add comment for all the exported functions?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, definitely need to do this!

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

still need to be addressed?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, indeed!

}

// TODO: use new go-ceph API
_, _, err := util.ExecCommand(context.TODO(), "ceph", args...)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

open go-ceph issue to track this one?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it is at ceph/go-ceph#655

@@ -0,0 +1,74 @@
/*
Copyright 2022 The Ceph-CSI Authors.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we have this vendor change?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that is mentioned in the commit message, as tools/yamlgen uses the API to generate files under deploy/, we are vendoring our own API (I would like to prevent that, but don't know how).

if err != nil {
log.ErrorLog(ctx, "failed to retrieve admin credentials: %v", err)

return nil, status.Error(codes.InvalidArgument, err.Error())
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in case of failures we need to take create of cleaning up the OMAP and the cephfs subvolume?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, failures should still be handled better. Calls are probably not idempotent at the moment.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Planning to address this in a followup PR?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not in this PR, it needs a lot of manual testing to address all possible cases. I prefer to get this merged soon, and then iterate on the improvements with smaller PRs.

@nixpanic
Copy link
Member Author

For first few glances, it looks good to me, few questions:

* Will we add locks to the Create/Delete funcs like rbd and cephfs ?

I do not think that is required. The calls should be made idempotent (they probably are not yet), and the actual complex volume creation is done in the CephFS subcomponent, which has locking already.

* Will there be repeated logs indicating Request and Response

no, this is what the logs currently look like:

CreateVolume

I0324 13:03:22.688920       1 utils.go:191] ID: 5 Req-ID: pvc-03913617-ba49-476b-97b0-b9c1aa415bb6 GRPC call: /csi.v1.Controller/CreateVolume
I0324 13:03:22.689196       1 utils.go:195] ID: 5 Req-ID: pvc-03913617-ba49-476b-97b0-b9c1aa415bb6 GRPC request: {"capacity_range":{"required_bytes":1073741824},"name":"pvc-03913617-ba49-476b-97b0-b9c1aa415bb6","parameters":{"cephNFS":"my-nfs","clusterID":"openshift-storage","fsName":"ocs-storagecluster-cephfilesystem","server":"rook-ceph-nfs-my-nfs-a.openshift-storage.svc.cluster.local","volumeNamePrefix":"nfs-export-"},"secrets":"***stripped***","volume_capabilities":[{"AccessType":{"Mount":{}},"access_mode":{"mode":5}}]}
I0324 13:03:22.716063       1 omap.go:87] ID: 5 Req-ID: pvc-03913617-ba49-476b-97b0-b9c1aa415bb6 got omap values: (pool="ocs-storagecluster-cephfilesystem-metadata", namespace="csi", name="csi.volumes.default"): map[]
I0324 13:03:22.727461       1 omap.go:155] ID: 5 Req-ID: pvc-03913617-ba49-476b-97b0-b9c1aa415bb6 set omap keys (pool="ocs-storagecluster-cephfilesystem-metadata", namespace="csi", name="csi.volumes.default"): map[csi.volume.pvc-03913617-ba49-476b-97b0-b9c1aa415bb6:c8ebd059-ab72-11ec-b874-0a580a810215])
I0324 13:03:22.731652       1 omap.go:155] ID: 5 Req-ID: pvc-03913617-ba49-476b-97b0-b9c1aa415bb6 set omap keys (pool="ocs-storagecluster-cephfilesystem-metadata", namespace="csi", name="csi.volume.c8ebd059-ab72-11ec-b874-0a580a810215"): map[csi.imagename:nfs-export-c8ebd059-ab72-11ec-b874-0a580a810215 csi.volname:pvc-03913617-ba49-476b-97b0-b9c1aa415bb6])
I0324 13:03:22.731679       1 fsjournal.go:284] ID: 5 Req-ID: pvc-03913617-ba49-476b-97b0-b9c1aa415bb6 Generated Volume ID (0001-0011-openshift-storage-0000000000000001-c8ebd059-ab72-11ec-b874-0a580a810215) and subvolume name (nfs-export-c8ebd059-ab72-11ec-b874-0a580a810215) for request name (pvc-03913617-ba49-476b-97b0-b9c1aa415bb6)
I0324 13:03:22.756543       1 volume.go:228] ID: 5 Req-ID: pvc-03913617-ba49-476b-97b0-b9c1aa415bb6 cephfs: created subvolume group csi
I0324 13:03:22.816809       1 controllerserver.go:343] ID: 5 Req-ID: pvc-03913617-ba49-476b-97b0-b9c1aa415bb6 cephfs: successfully created backing volume named nfs-export-c8ebd059-ab72-11ec-b874-0a580a810215 for request name pvc-03913617-ba49-476b-97b0-b9c1aa415bb6
I0324 13:03:22.816887       1 controllerserver.go:83] ID: 5 Req-ID: pvc-03913617-ba49-476b-97b0-b9c1aa415bb6 CephFS volume created: 0001-0011-openshift-storage-0000000000000001-c8ebd059-ab72-11ec-b874-0a580a810215
I0324 13:03:22.822595       1 omap.go:155] ID: 5 Req-ID: pvc-03913617-ba49-476b-97b0-b9c1aa415bb6 set omap keys (pool="ocs-storagecluster-cephfilesystem-metadata", namespace="csi", name="csi.volume.c8ebd059-ab72-11ec-b874-0a580a810215"): map[csi.nfs.cluster:my-nfs])
I0324 13:03:23.301861       1 cephcmds.go:63] ID: 5 Req-ID: pvc-03913617-ba49-476b-97b0-b9c1aa415bb6 command succeeded: ceph [--id csi-cephfs-provisioner --keyfile=***stripped*** -m 172.30.252.210:6789,172.30.58.127:6789,172.30.62.161:6789 nfs export create cephfs ocs-storagecluster-cephfilesystem my-nfs /0001-0011-openshift-storage-0000000000000001-c8ebd059-ab72-11ec-b874-0a580a810215 /volumes/csi/nfs-export-c8ebd059-ab72-11ec-b874-0a580a810215/c668735f-3626-46f4-ab77-44e64e324111]
I0324 13:03:23.301907       1 controllerserver.go:110] ID: 5 Req-ID: pvc-03913617-ba49-476b-97b0-b9c1aa415bb6 published NFS-export: 0001-0011-openshift-storage-0000000000000001-c8ebd059-ab72-11ec-b874-0a580a810215
I0324 13:03:23.302065       1 utils.go:202] ID: 5 Req-ID: pvc-03913617-ba49-476b-97b0-b9c1aa415bb6 GRPC response: {"volume":{"capacity_bytes":1073741824,"volume_context":{"cephNFS":"my-nfs","clusterID":"openshift-storage","fsName":"ocs-storagecluster-cephfilesystem","server":"rook-ceph-nfs-my-nfs-a.openshift-storage.svc.cluster.local","share":"/0001-0011-openshift-storage-0000000000000001-c8ebd059-ab72-11ec-b874-0a580a810215","subvolumeName":"nfs-export-c8ebd059-ab72-11ec-b874-0a580a810215","subvolumePath":"/volumes/csi/nfs-export-c8ebd059-ab72-11ec-b874-0a580a810215/c668735f-3626-46f4-ab77-44e64e324111","volumeNamePrefix":"nfs-export-"},"volume_id":"0001-0011-openshift-storage-0000000000000001-c8ebd059-ab72-11ec-b874-0a580a810215"}}

DeleteVolume

I0324 13:03:53.585022       1 utils.go:191] ID: 6 Req-ID: 0001-0011-openshift-storage-0000000000000001-c8ebd059-ab72-11ec-b874-0a580a810215 GRPC call: /csi.v1.Controller/DeleteVolume
I0324 13:03:53.585165       1 utils.go:195] ID: 6 Req-ID: 0001-0011-openshift-storage-0000000000000001-c8ebd059-ab72-11ec-b874-0a580a810215 GRPC request: {"secrets":"***stripped***","volume_id":"0001-0011-openshift-storage-0000000000000001-c8ebd059-ab72-11ec-b874-0a580a810215"}
I0324 13:03:53.588279       1 omap.go:87] ID: 6 Req-ID: 0001-0011-openshift-storage-0000000000000001-c8ebd059-ab72-11ec-b874-0a580a810215 got omap values: (pool="ocs-storagecluster-cephfilesystem-metadata", namespace="csi", name="csi.volume.c8ebd059-ab72-11ec-b874-0a580a810215"): map[csi.nfs.cluster:my-nfs]
I0324 13:03:54.061648       1 cephcmds.go:63] ID: 6 Req-ID: 0001-0011-openshift-storage-0000000000000001-c8ebd059-ab72-11ec-b874-0a580a810215 command succeeded: ceph [--id csi-cephfs-provisioner --keyfile=***stripped*** -m 172.30.252.210:6789,172.30.58.127:6789,172.30.62.161:6789 nfs export delete my-nfs /0001-0011-openshift-storage-0000000000000001-c8ebd059-ab72-11ec-b874-0a580a810215]
I0324 13:03:54.061695       1 controllerserver.go:149] ID: 6 Req-ID: 0001-0011-openshift-storage-0000000000000001-c8ebd059-ab72-11ec-b874-0a580a810215 deleted NFS-export: 0001-0011-openshift-storage-0000000000000001-c8ebd059-ab72-11ec-b874-0a580a810215
I0324 13:03:54.064215       1 omap.go:87] ID: 6 Req-ID: 0001-0011-openshift-storage-0000000000000001-c8ebd059-ab72-11ec-b874-0a580a810215 got omap values: (pool="ocs-storagecluster-cephfilesystem-metadata", namespace="csi", name="csi.volume.c8ebd059-ab72-11ec-b874-0a580a810215"): map[csi.imagename:nfs-export-c8ebd059-ab72-11ec-b874-0a580a810215 csi.volname:pvc-03913617-ba49-476b-97b0-b9c1aa415bb6]
I0324 13:03:54.102219       1 omap.go:123] ID: 6 Req-ID: 0001-0011-openshift-storage-0000000000000001-c8ebd059-ab72-11ec-b874-0a580a810215 removed omap keys (pool="ocs-storagecluster-cephfilesystem-metadata", namespace="csi", name="csi.volumes.default"): [csi.volume.pvc-03913617-ba49-476b-97b0-b9c1aa415bb6]
I0324 13:03:54.102280       1 controllerserver.go:468] ID: 6 Req-ID: 0001-0011-openshift-storage-0000000000000001-c8ebd059-ab72-11ec-b874-0a580a810215 cephfs: successfully deleted volume 0001-0011-openshift-storage-0000000000000001-c8ebd059-ab72-11ec-b874-0a580a810215
I0324 13:03:54.102348       1 utils.go:202] ID: 6 Req-ID: 0001-0011-openshift-storage-0000000000000001-c8ebd059-ab72-11ec-b874-0a580a810215 GRPC response: {}
* No validation for presence of `server` ips in volume context that is received/being returned?

It should not be IP's, it ideally is a hostname (but can be an IP-address too). Not sure it makes sense to validate this. Mounting the volume will fail in the kubernetes-csi/csi-driver-nfs in that case, hopefully with a useful error message.

Copy link
Collaborator

@Madhu-1 Madhu-1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM left a few questions.

Comment on lines 7 to 10
attachRequired: false
volumeLifecycleModes:
- Persistent
- Ephemeral
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think For Ephemeral the CreateVolue needs to be taken care of in NodePublish https://kubernetes-csi.github.io/docs/ephemeral-local-volumes.html#implementing-csi-ephemeral-inline-support.

if err != nil {
log.ErrorLog(ctx, "failed to retrieve admin credentials: %v", err)

return nil, status.Error(codes.InvalidArgument, err.Error())
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Planning to address this in a followup PR?

return nil
}

func (nv *NFSVolume) GetExportPath() string {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

still need to be addressed?

// TODO: use new go-ceph API
_, stderr, err := util.ExecCommand(nv.ctx, "ceph", args...)
if err != nil {
return fmt.Errorf("executing ceph export command failed (%w): %s", err, stderr)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need to handle the already exported errors if we get any? or this call is idempotent?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This returns an error like EEXISTS, but as this is a CLI and not go-ceph, error checking is not as clean. I plan to use go-ceph in a follow-up PR, and then improved error-checking can be added too.

// TODO: use new go-ceph API
_, stderr, err := util.ExecCommand(nv.ctx, "ceph", args...)
if err != nil {
return fmt.Errorf("executing ceph export command failed (%w): %s", err, stderr)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need to handle the already deleted exported errors if we get any? or this call is idempotent?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this needs to be checked once we use go-ceph for these calls

@nixpanic
Copy link
Member Author

@Madhu-1 @Rakshith-R , I think all comments have been addressed now. Please have a look again. Thanks!

@nixpanic
Copy link
Member Author

FWIW, partial instructions for setting up are available in #2963

api/deploy/kubernetes/nfs/csidriver.yaml Outdated Show resolved Hide resolved
internal/nfs/driver/driver.go Show resolved Hide resolved
Move the printing of the version and other information to its own
function. This reduces the complexity enough so that golang-ci does not
complain about it anymore.

Signed-off-by: Niels de Vos <ndevos@redhat.com>
Signed-off-by: Niels de Vos <ndevos@redhat.com>
The API is extended for generation of the NFS CSIDriver object. The
YAML file under deploy/ was created by `yamlgen`.

The contents of the csidriver.yaml file is heavily based on the upstream
CSIDriver from the Kubernetes csi-driver-nfs project.

Because ./tools/yamlgen uses the API, it gets copied under vendor/ .
This causes two copies of the API to be included in the repository, but
that can not be prevented, it seems.

See-also: https://github.com/kubernetes-csi/csi-driver-nfs/blob/master/deploy/csi-nfs-driverinfo.yaml
Signed-off-by: Niels de Vos <ndevos@redhat.com>
These NFS Controller and Identity servers are the base for the new
provisioner. The functionality is currently extremely limited, follow-up
PRs will implement various CSI procedures.

CreateVolume is implemented with the bare minimum. This makes it
possible to create a volume, and mount it with the
kubernetes-csi/csi-driver-nfs NodePlugin.

DeleteVolume unexports the volume from the Ceph managed NFS-Ganesha
service. In case the Ceph cluster provides multiple NFS-Ganesha
deployments, things might not work as expected. This is going to be
addressed in follow-up improvements.

Lots of TODO comments need to be resolved before this can be declared
"production ready". Unit- and e2e-tests are missing as well.

Signed-off-by: Niels de Vos <ndevos@redhat.com>
Deployments can use --type=nfs to deploy the NFS Controller Server
(provisioner).

Signed-off-by: Niels de Vos <ndevos@redhat.com>
NFSVolume instances are short lived, they only extist for a certain gRPC
procedure. It is easier to store the calling Context in the NFSVolume
struct, than to pass it to some of the functions that require it.

Signed-off-by: Niels de Vos <ndevos@redhat.com>
Signed-off-by: Niels de Vos <ndevos@redhat.com>
Signed-off-by: Niels de Vos <ndevos@redhat.com>
@Rakshith-R Rakshith-R requested a review from a team March 28, 2022 10:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
component/nfs Issues related to NFS
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants