Skip to content

Failing to mount cephfs volumes with invalid argument #5282

@Starttoaster

Description

@Starttoaster

Describe the bug

I upgraded ceph-csi in my clusters to 3.14.0 last week, for both rbd and cephfs. I began seeing cephfs volumes failing to mount with an error from nodeplugin stating invalid argument.

I'm not sure what the invalid argument actually is, the logs don't quite make that clear. But I noted this old Issue exhibited a similar logline #2254

And I noted in the 3.14.0 release this PR #5090 changed the cephfs mount syntax. Not sure if it's related, but seemed suspicious.

Environment details

  • Image/version of Ceph CSI driver : 3.14.0
  • Helm chart version : 3.14.0
  • Kernel version : 5.15.0-135-generic on the k8s nodes
  • Mounter used for mounting PVC (for cephFS its fuse or kernel. for rbd its
    krbd or rbd-nbd) : we didn't specify it, so admittedly I'm not sure. I'll update this when I find out
  • Kubernetes cluster version : 1.32.3
  • Ceph cluster version : 18.2.2

Steps to reproduce

Steps to reproduce the behavior:

  1. Setup details: 'Deploy the latest release (3.14.0) of ceph-csi for cephfs with mostly default settings', values file below:
csiConfig:
  - clusterID: $CEPH_CLUSTER_ID
    monitors: $CEPH_MONITORS
    readAffinity:
      enabled: true
      crushLocationLabels:
        - topology.kubernetes.io/region
        - topology.kubernetes.io/zone

readAffinity:
  enabled: true
  crushLocationLabels:
    - topology.kubernetes.io/region
    - topology.kubernetes.io/zone

nodeplugin:
  httpMetrics:
    containerPort: 8187

storageClass:
  create: true
  name: cephfs-nvme
  clusterID: $CEPH_CLUSTER_ID
  fsName: $CEPHFS_FS
  pool: $CEPHFS_POOL

secret:
  create: true
  adminID: $CEPHFS_USERNAME
  adminKey: $CEPHFS_KEY

kubeletDir: /var/lib/k0s/kubelet
  1. Create a PVC with ReadWriteMany, and a Pod that mounts it.
  2. See error

Actual results

ceph-csi failed to mount these volumes, receiving an invalid argument error.

Expected behavior

cephfs would mount my volumes.

Logs

I started seeing some of my Pods that mount cephfs volumes failing to start. The event on the Pod says MountVolume.MountDevice failed for volume "pvc-28619aba-591a-4ba1-9288-17c9e266b5cc" : rpc error: code = Internal desc = rpc error: code = Internal desc = failed to get connection: connecting failed: rados: ret=-22, Invalid argument

I viewed my ceph-csi logs for errors, and saw the following error show up: ID: 11448 Req-ID: 0001-0024-7e7d8f94-38e4-454e-9f86-caf6f2708cd9-0000000000000002-4d521645-f903-440c-b501-bd6b672dfaa1 GRPC error: rpc error: code = Internal desc = rpc error: code = Internal desc = failed to get connection: connecting failed: rados: ret=-22, Invalid argument

That ceph-csi log comes from the nodeplugin component in the csi-cephfsplugin container.

Additional context

The arguments given to the csi-cephfsplugin container are: --nodeid=$(NODE_ID) --type=cephfs --nodeserver=true --pidlimit=-1 --kernelmountoptions= --fusemountoptions= --endpoint=$(CSI_ENDPOINT) --v=5 --drivername=$(DRIVER_NAME) --enable-read-affinity=true --crush-location-labels=topology.kubernetes.io/region,topology.kubernetes.io/zone --logslowopinterval=30s

I noticed that there are no kernelmountoptions or fusemountoptions listed. Not sure if that's important. We just updated from 3.13.1 to 3.14.0 without any values file changes and began seeing this issue.

For example:

Any existing bug report which describe about the similar issue/behavior

Metadata

Metadata

Labels

component/cephfsIssues related to CephFScomponent/deploymentHelm chart, kubernetes templates and configuration Issues/PRswontfixThis will not be worked on

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions