-
Notifications
You must be signed in to change notification settings - Fork 591
Description
Describe the bug
I upgraded ceph-csi in my clusters to 3.14.0 last week, for both rbd and cephfs. I began seeing cephfs volumes failing to mount with an error from nodeplugin stating invalid argument.
I'm not sure what the invalid argument actually is, the logs don't quite make that clear. But I noted this old Issue exhibited a similar logline #2254
And I noted in the 3.14.0 release this PR #5090 changed the cephfs mount syntax. Not sure if it's related, but seemed suspicious.
Environment details
- Image/version of Ceph CSI driver : 3.14.0
- Helm chart version : 3.14.0
- Kernel version :
5.15.0-135-genericon the k8s nodes - Mounter used for mounting PVC (for cephFS its
fuseorkernel. for rbd its
krbdorrbd-nbd) : we didn't specify it, so admittedly I'm not sure. I'll update this when I find out - Kubernetes cluster version : 1.32.3
- Ceph cluster version : 18.2.2
Steps to reproduce
Steps to reproduce the behavior:
- Setup details: 'Deploy the latest release (3.14.0) of ceph-csi for cephfs with mostly default settings', values file below:
csiConfig:
- clusterID: $CEPH_CLUSTER_ID
monitors: $CEPH_MONITORS
readAffinity:
enabled: true
crushLocationLabels:
- topology.kubernetes.io/region
- topology.kubernetes.io/zone
readAffinity:
enabled: true
crushLocationLabels:
- topology.kubernetes.io/region
- topology.kubernetes.io/zone
nodeplugin:
httpMetrics:
containerPort: 8187
storageClass:
create: true
name: cephfs-nvme
clusterID: $CEPH_CLUSTER_ID
fsName: $CEPHFS_FS
pool: $CEPHFS_POOL
secret:
create: true
adminID: $CEPHFS_USERNAME
adminKey: $CEPHFS_KEY
kubeletDir: /var/lib/k0s/kubelet- Create a PVC with ReadWriteMany, and a Pod that mounts it.
- See error
Actual results
ceph-csi failed to mount these volumes, receiving an invalid argument error.
Expected behavior
cephfs would mount my volumes.
Logs
I started seeing some of my Pods that mount cephfs volumes failing to start. The event on the Pod says MountVolume.MountDevice failed for volume "pvc-28619aba-591a-4ba1-9288-17c9e266b5cc" : rpc error: code = Internal desc = rpc error: code = Internal desc = failed to get connection: connecting failed: rados: ret=-22, Invalid argument
I viewed my ceph-csi logs for errors, and saw the following error show up: ID: 11448 Req-ID: 0001-0024-7e7d8f94-38e4-454e-9f86-caf6f2708cd9-0000000000000002-4d521645-f903-440c-b501-bd6b672dfaa1 GRPC error: rpc error: code = Internal desc = rpc error: code = Internal desc = failed to get connection: connecting failed: rados: ret=-22, Invalid argument
That ceph-csi log comes from the nodeplugin component in the csi-cephfsplugin container.
Additional context
The arguments given to the csi-cephfsplugin container are: --nodeid=$(NODE_ID) --type=cephfs --nodeserver=true --pidlimit=-1 --kernelmountoptions= --fusemountoptions= --endpoint=$(CSI_ENDPOINT) --v=5 --drivername=$(DRIVER_NAME) --enable-read-affinity=true --crush-location-labels=topology.kubernetes.io/region,topology.kubernetes.io/zone --logslowopinterval=30s
I noticed that there are no kernelmountoptions or fusemountoptions listed. Not sure if that's important. We just updated from 3.13.1 to 3.14.0 without any values file changes and began seeing this issue.
For example:
Any existing bug report which describe about the similar issue/behavior