PV deletion of cephfsplugin on PVC deletion fails #48

mkimuram · 2018-07-19T15:39:05Z

If a PV is dynamic-provisioned and has delete reclaim-policy, it should be deleted when PVC for it is deleted. However, the deletion fails with the following error messages, when tested with kubernetes/kubernetes#66388.

[Error messages in cephfsplugin]

I0718 23:43:29.512515       1 identityserver-default.go:32] Using default GetPluginInnfo
I0718 23:43:29.512539       1 utils.go:102] GRPC response: name:"csi-cephfsplugin" vendor_version:"0.2.0"
I0718 23:43:29.521353       1 utils.go:96] GRPC call: /csi.v0.Controller/DeleteVolume
I0718 23:43:29.521424       1 utils.go:97] GRPC request: volume_id:"csi-cephfs-dyn-5ab6b2f6-8ae4-11e8-bb12-0242ac110004" control
ler_delete_secrets:<key:"adminID" value:"admin" > controller_delete_secrets:<key:"adminKey" value:"AQDRrKNVbEevChAAEmRC+pW/KBVHx
a0w/POILA==" >
E0718 23:43:29.984564       1 controllerserver.go:171] failed to delete ceph user csi-user-5ab6b2f6-8ae4-11e8-bb12-0242ac110004:
 cephfs: ceph failed with following error: exit status 1
cephfs: ceph output: 2018-07-18 23:43:29.957718 7f4403592700  0 librados: client.admin authentication error (95) Operation not supported
[errno 95] error connecting to the cluster
I0718 23:43:38.727506       1 utils.go:96] GRPC call: /csi.v0.Identity/GetPluginCapabilities
I0718 23:43:38.727607       1 utils.go:97] GRPC request: 
I0718 23:43:38.727725       1 utils.go:102] GRPC response: capabilities:<service:<type:CONTROLLER_SERVICE > > 
I0718 23:43:38.728846       1 utils.go:96] GRPC call: /csi.v0.Controller/ControllerGetCapabilities
I0718 23:43:38.728904       1 utils.go:97] GRPC request: 
I0718 23:43:38.728991       1 controllerserver-default.go:82] Using default ControllerGetCapabilities
I0718 23:43:38.729007       1 utils.go:102] GRPC response: capabilities:<rpc:<type:CREATE_DELETE_VOLUME > > 
I0718 23:43:38.729940       1 utils.go:96] GRPC call: /csi.v0.Identity/GetPluginInfo
I0718 23:43:38.729974       1 utils.go:97] GRPC request: 
I0718 23:43:38.730034       1 identityserver-default.go:32] Using default GetPluginInnfo
I0718 23:43:38.730046       1 utils.go:102] GRPC response: name:"csi-cephfsplugin" vendor_version:"0.2.0" 
I0718 23:43:38.737483       1 utils.go:96] GRPC call: /csi.v0.Controller/DeleteVolume
I0718 23:43:38.737520       1 utils.go:97] GRPC request: volume_id:"csi-cephfs-dyn-5ab6b2f6-8ae4-11e8-bb12-0242ac110004" controller_delete_secrets:<key:"adminID" value:"admin" > controller_delete_secrets:<key:"adminKey" value:"AQDRrKNVbEevChAAEmRC+pW/KBVHxa0w/POILA==" > 
E0718 23:43:38.749523       1 controllerserver.go:166] failed to delete volume csi-cephfs-dyn-5ab6b2f6-8ae4-11e8-bb12-0242ac110004: cephfs: mount failed with following error: exit status 16
cephfs: mount output: mount error 16 = Device or resource busy
E0718 23:43:38.749679       1 utils.go:100] GRPC error: rpc error: code = Internal desc = cephfs: mount failed with following error: exit status 16
cephfs: mount output: mount error 16 = Device or resource busy

The text was updated successfully, but these errors were encountered:

rootfs · 2018-07-19T17:35:21Z

this is the problem

librados: client.admin authentication error (95) Operation not supported
[errno 95] error connecting to the cluster

At the time the user is to be deleted, the volume should have already be deleted, confirming the credential and cluster are working.

@mkimuram what ceph cluster you are using? could you please get the ceph mon logs and see if there is any message about deleting user?

@gman0 have you seen this error before?

gman0 · 2018-07-19T18:08:07Z

@rootfs no. Indeed, ceph logs could help.

mkimuram · 2018-07-19T18:41:08Z

@gman0 @rootfs

what ceph cluster you are using?

I'm using ceph docker image used in e2e testing.
(Created by using https://github.com/kubernetes/kubernetes/tree/master/test/images/volumes-tester/rbd .)

could you please get the ceph mon logs and see if there is any message about deleting user?

Logs are extracted below way. Please check it.
ceph_logs.txt

# kubectl exec -it ceph-server -n e2e-tests-csi-mock-plugin-6hphk -- find /var/log/ceph/ -name "*log"
/var/log/ceph/ceph-osd.1.log
/var/log/ceph/ceph-mon.a.log
/var/log/ceph/ceph-client.admin.log
/var/log/ceph/ceph.log
/var/log/ceph/ceph-osd.0.log
/var/log/ceph/ceph.audit.log
/var/log/ceph/ceph-mds.cephfs.log
# kubectl exec -it ceph-server -n e2e-tests-csi-mock-plugin-6hphk -- find /var/log/ceph/ -name "*log" -print -exec cat {} \; > /tmp/ceph_logs.txt

rootfs · 2018-07-19T19:40:16Z

there is no auth rm in ceph-audit.log, last msgs are:

2018-07-19 18:20:48.820491 mon.0 172.17.0.3:6789/0 78 : audit [INF] from='client.? 172.17.0.3:0/386396941' entity='client.admin' cmd=[{"prefix": "fs new", "data": "cephfs_data", "fs_name": "cephfs", "metadata": "cephfs_metadata"}]: dispatch
2018-07-19 18:20:51.959515 mon.0 172.17.0.3:6789/0 81 : audit [INF] from='client.? 172.17.0.3:0/386396941' entity='client.admin' cmd='[{"prefix": "fs new", "data": "cephfs_data", "fs_name": "cephfs", "metadata": "cephfs_metadata"}]': finished

rootfs · 2018-07-19T19:41:28Z

@mkimuram can you post the relevant e2e test console messages too? I wonder if the ceph cluster was (in the process of) teardown before the PV was deleted in the background.

rootfs · 2018-07-19T19:46:09Z

@mkimuram this is what message i would look for to ensure the PV is deleted before the cluster is teared down.
https://github.com/mkimuram/kubernetes/blob/b461cb7acad0a0912fda198b8df46cd5ae452f0a/test/e2e/storage/volume_provisioning.go#L166

mkimuram · 2018-07-19T20:30:42Z

@rootfs

I run e2e test again with saving console log(e2e-console.txt) and wait until "Waiting up to 20m0s for PersistentVolume pvc-xxx to get deleted" message shows up.
After that, I got the logs (all_logs.txt) with the below script.

However, I still can't find auth rm.
This time, I also include logs for csi-external-provisioner and cephfs-driver for the relevant execution.
I hope this will be of your help.

# cat getlog.sh 
#! /bin/bash -x 
date
kubectl logs ceph-server -n $namespace
kubectl logs csi-cephfs-controller-0 -n $namespace -c csi-external-provisioner
kubectl logs $cephfsnode -n $namespace -c cephfs-driver
kubectl exec -it ceph-server -n $namespace -- find /var/log/ceph/ -name "*log"
kubectl exec -it ceph-server -n $namespace -- find /var/log/ceph/ -name "*log" -print -exec cat {} \;

# export namespace=e2e-tests-csi-mock-plugin-9sbdt
# export cephfsnode=csi-cephfs-node-jkxxx
# ./getlog.sh > all_logs.txt 2>&1

rootfs · 2018-07-19T21:23:59Z

right, the timeline matches: in CSI logs, delete volume happened before cluster was teared down.

mkimuram · 2018-07-19T23:01:18Z

Sharing logs with debug messages:
all_logs5.txt
e2e-console5.txt

rootfs · 2018-07-23T19:44:53Z

This is what happened.

The e2e test just creates/deletes volume, it doesn't publish the volume, thus the volume is never mounted during the test.

However, the cephfs controller service doesn't create the user; the node service creates it here. So by the time volume is to be deleted by controller service, the user doesn't exist.

@gman0

gman0 · 2018-07-25T09:04:54Z

@rootfs very nice find, thanks! I'll move
this to NodeStageVolume/NodeUnstageVolume.

Sync openshift/master to ceph-csi/devel

Sync rhs:release-4.10 with rhs:devel

In case creating the CSIAddonsNode for a Pod fails, a retry is done. There are several causes that can cause failures, these seem to be the most common ones: - User cannot create resource "csiaddonsnodes" in API group "csiaddons.openshift.io" - no kind "CSIAddonsNode" is registered for the internal version of group "csiaddons.openshift.io" In the above cases, either the RBAC is limiting access to the API, or the CRD is not available in the Kubernetes cluster. Updates: #48 Signed-off-by: Niels de Vos <ndevos@redhat.com>

mkimuram mentioned this issue Jul 30, 2018

Add e2e tests for ceph-csi(rbd/cephfs) volume kubernetes/kubernetes#66388

Closed

mkimuram added a commit to mkimuram/kubernetes that referenced this issue Aug 7, 2018

Skip cephfs test until ceph/ceph-csi#48 is fixed

f0b93da

mkimuram mentioned this issue Aug 7, 2018

Add e2e tests for ceph-csi(rbd/cephfs) volume kubernetes/kubernetes#67088

Closed

gman0 mentioned this issue Aug 8, 2018

[CSI 0.3.0 3/4] csi-cephfs 0.3.0 #62

Merged

rootfs closed this as completed in #62 Aug 8, 2018

agarwal-mudit pushed a commit to agarwal-mudit/ceph-csi that referenced this issue Jul 1, 2021

Merge pull request ceph#48 from ceph/devel

8670d79

Sync openshift/master to ceph-csi/devel

agarwal-mudit pushed a commit to agarwal-mudit/ceph-csi that referenced this issue Jan 27, 2022

Merge pull request ceph#48 from red-hat-storage/devel

ff04717

Sync rhs:release-4.10 with rhs:devel

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PV deletion of cephfsplugin on PVC deletion fails #48

PV deletion of cephfsplugin on PVC deletion fails #48

mkimuram commented Jul 19, 2018

rootfs commented Jul 19, 2018

gman0 commented Jul 19, 2018

mkimuram commented Jul 19, 2018

rootfs commented Jul 19, 2018

rootfs commented Jul 19, 2018

rootfs commented Jul 19, 2018

mkimuram commented Jul 19, 2018

rootfs commented Jul 19, 2018

mkimuram commented Jul 19, 2018

rootfs commented Jul 23, 2018

gman0 commented Jul 25, 2018

PV deletion of cephfsplugin on PVC deletion fails #48

PV deletion of cephfsplugin on PVC deletion fails #48

Comments

mkimuram commented Jul 19, 2018

rootfs commented Jul 19, 2018

gman0 commented Jul 19, 2018

mkimuram commented Jul 19, 2018

rootfs commented Jul 19, 2018

rootfs commented Jul 19, 2018

rootfs commented Jul 19, 2018

mkimuram commented Jul 19, 2018

rootfs commented Jul 19, 2018

mkimuram commented Jul 19, 2018

rootfs commented Jul 23, 2018

gman0 commented Jul 25, 2018