-
Notifications
You must be signed in to change notification settings - Fork 524
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
nfs: add basic provisioner with create/delete procedures #2948
Conversation
For first few glances, it looks good to me,
|
attachRequired: false | ||
volumeLifecycleModes: | ||
- Persistent | ||
- Ephemeral |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
attachRequired is false and do we plan to support Ephemeral also?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the default configuration from the csi-driver-nfs (NodePlugin)... Not sure if there are any special things needed for Ephemeral support?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think For Ephemeral the CreateVolue needs to be taken care of in NodePublish https://kubernetes-csi.github.io/docs/ephemeral-local-volumes.html#implementing-csi-ephemeral-inline-support.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
right, in that case, we should probably not support Ephemeral volumes from the start
|
||
backend := res.Volume | ||
|
||
log.DebugLog(ctx, "CephFS volume created: %s", backend) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
debug logging of volume volume is not required?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, this can probably be removed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, I think it is useful to have this as DebugLog()
, each step in the process has such a log message. Will only log the volume-id instead of the whole volume.
return nil, status.Error(codes.InvalidArgument, err.Error()) | ||
} | ||
|
||
err = nfsVolume.Connect(cr) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
calling Destroy is missing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there was no go-ceph connection yet. With the updated PR go-ceph is used, and Destroy()
has been added.
return nil | ||
} | ||
|
||
func (nv *NFSVolume) GetExportPath() string { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add comment for all the exported functions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, definitely need to do this!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
still need to be addressed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, indeed!
internal/nfs/controller/volume.go
Outdated
} | ||
|
||
// TODO: use new go-ceph API | ||
_, _, err := util.ExecCommand(context.TODO(), "ceph", args...) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
open go-ceph issue to track this one?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it is at ceph/go-ceph#655
@@ -0,0 +1,74 @@ | |||
/* | |||
Copyright 2022 The Ceph-CSI Authors. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why do we have this vendor change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that is mentioned in the commit message, as tools/yamlgen
uses the API to generate files under deploy/
, we are vendoring our own API (I would like to prevent that, but don't know how).
if err != nil { | ||
log.ErrorLog(ctx, "failed to retrieve admin credentials: %v", err) | ||
|
||
return nil, status.Error(codes.InvalidArgument, err.Error()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in case of failures we need to take create of cleaning up the OMAP and the cephfs subvolume?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, failures should still be handled better. Calls are probably not idempotent at the moment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Planning to address this in a followup PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not in this PR, it needs a lot of manual testing to address all possible cases. I prefer to get this merged soon, and then iterate on the improvements with smaller PRs.
8454f46
to
04728c4
Compare
I do not think that is required. The calls should be made idempotent (they probably are not yet), and the actual complex volume creation is done in the CephFS subcomponent, which has locking already.
no, this is what the logs currently look like: CreateVolume
DeleteVolume
It should not be IP's, it ideally is a hostname (but can be an IP-address too). Not sure it makes sense to validate this. Mounting the volume will fail in the kubernetes-csi/csi-driver-nfs in that case, hopefully with a useful error message. |
04728c4
to
dd5f878
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM left a few questions.
attachRequired: false | ||
volumeLifecycleModes: | ||
- Persistent | ||
- Ephemeral |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think For Ephemeral the CreateVolue needs to be taken care of in NodePublish https://kubernetes-csi.github.io/docs/ephemeral-local-volumes.html#implementing-csi-ephemeral-inline-support.
if err != nil { | ||
log.ErrorLog(ctx, "failed to retrieve admin credentials: %v", err) | ||
|
||
return nil, status.Error(codes.InvalidArgument, err.Error()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Planning to address this in a followup PR?
return nil | ||
} | ||
|
||
func (nv *NFSVolume) GetExportPath() string { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
still need to be addressed?
// TODO: use new go-ceph API | ||
_, stderr, err := util.ExecCommand(nv.ctx, "ceph", args...) | ||
if err != nil { | ||
return fmt.Errorf("executing ceph export command failed (%w): %s", err, stderr) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we need to handle the already exported errors if we get any? or this call is idempotent?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This returns an error like EEXISTS
, but as this is a CLI and not go-ceph, error checking is not as clean. I plan to use go-ceph in a follow-up PR, and then improved error-checking can be added too.
// TODO: use new go-ceph API | ||
_, stderr, err := util.ExecCommand(nv.ctx, "ceph", args...) | ||
if err != nil { | ||
return fmt.Errorf("executing ceph export command failed (%w): %s", err, stderr) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we need to handle the already deleted exported errors if we get any? or this call is idempotent?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this needs to be checked once we use go-ceph for these calls
dd5f878
to
223bba1
Compare
@Madhu-1 @Rakshith-R , I think all comments have been addressed now. Please have a look again. Thanks! |
223bba1
to
1a1cf7d
Compare
FWIW, partial instructions for setting up are available in #2963 |
Move the printing of the version and other information to its own function. This reduces the complexity enough so that golang-ci does not complain about it anymore. Signed-off-by: Niels de Vos <ndevos@redhat.com>
Signed-off-by: Niels de Vos <ndevos@redhat.com>
The API is extended for generation of the NFS CSIDriver object. The YAML file under deploy/ was created by `yamlgen`. The contents of the csidriver.yaml file is heavily based on the upstream CSIDriver from the Kubernetes csi-driver-nfs project. Because ./tools/yamlgen uses the API, it gets copied under vendor/ . This causes two copies of the API to be included in the repository, but that can not be prevented, it seems. See-also: https://github.com/kubernetes-csi/csi-driver-nfs/blob/master/deploy/csi-nfs-driverinfo.yaml Signed-off-by: Niels de Vos <ndevos@redhat.com>
These NFS Controller and Identity servers are the base for the new provisioner. The functionality is currently extremely limited, follow-up PRs will implement various CSI procedures. CreateVolume is implemented with the bare minimum. This makes it possible to create a volume, and mount it with the kubernetes-csi/csi-driver-nfs NodePlugin. DeleteVolume unexports the volume from the Ceph managed NFS-Ganesha service. In case the Ceph cluster provides multiple NFS-Ganesha deployments, things might not work as expected. This is going to be addressed in follow-up improvements. Lots of TODO comments need to be resolved before this can be declared "production ready". Unit- and e2e-tests are missing as well. Signed-off-by: Niels de Vos <ndevos@redhat.com>
Deployments can use --type=nfs to deploy the NFS Controller Server (provisioner). Signed-off-by: Niels de Vos <ndevos@redhat.com>
NFSVolume instances are short lived, they only extist for a certain gRPC procedure. It is easier to store the calling Context in the NFSVolume struct, than to pass it to some of the functions that require it. Signed-off-by: Niels de Vos <ndevos@redhat.com>
Signed-off-by: Niels de Vos <ndevos@redhat.com>
Signed-off-by: Niels de Vos <ndevos@redhat.com>
1a1cf7d
to
f611e98
Compare
These NFS Controller and Identity servers are the base for the new
provisioner. The functionality is currently extremely limited, follow-up
PRs will implement various CSI procedures.
CreateVolume is implemented with the bare minimum. This makes it
possible to create a volume, and mount it with the
kubernetes-csi/csi-driver-nfs NodePlugin.
DeleteVolume unexports the volume from the Ceph managed NFS-Ganesha
service. In case the Ceph cluster provides multiple NFS-Ganesha
deployments, things might not work as expected. This is going to be
addressed in follow-up improvements.
Lots of TODO comments need to be resolved before this can be declared
"production ready". Unit- and e2e-tests are missing as well.
Updates: #2913
Show available bot commands
These commands are normally not required, but in case of issues, leave any of
the following bot commands in an otherwise empty comment in this PR:
/retest ci/centos/<job-name>
: retest the<job-name>
after unrelatedfailure (please report the failure too!)
/retest all
: run this in case the CentOS CI failed to start/report any testprogress or results