Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

failed to mount cephfs inside pod #3927

Closed
dbenduga opened this issue Jun 21, 2023 · 11 comments
Closed

failed to mount cephfs inside pod #3927

dbenduga opened this issue Jun 21, 2023 · 11 comments
Labels
component/cephfs Issues related to CephFS component/deployment Helm chart, kubernetes templates and configuration Issues/PRs wontfix This will not be worked on

Comments

@dbenduga
Copy link

Pod failed to run with below error. On other worker node cephfs getting mounted without any issue. Problem is with one worker node only.

[Wed Jun 21 16:00:45 2023] libceph: bad option at 'debug'
[Wed Jun 21 16:00:46 2023] libceph: bad option at 'debug'
[Wed Jun 21 16:00:49 2023] libceph: bad option at 'debug'
[Wed Jun 21 16:00:51 2023] libceph: bad option at 'debug'
[Wed Jun 21 16:00:55 2023] libceph: bad option at 'debug'
[Wed Jun 21 16:01:03 2023] libceph: bad option at 'debug'
[Wed Jun 21 16:01:20 2023] libceph: bad option at 'debug'
[Wed Jun 21 16:01:52 2023] libceph: bad option at 'debug'
[Wed Jun 21 16:02:56 2023] libceph: bad option at 'debug'
[Wed Jun 21 16:04:58 2023] libceph: bad option at 'debug'
[Wed Jun 21 16:07:00 2023] libceph: bad option at 'debug'
[Wed Jun 21 16:09:02 2023] libceph: bad option at 'debug'
[Wed Jun 21 16:11:04 2023] libceph: bad option at 'debug'
[Wed Jun 21 16:13:07 2023] libceph: bad option at 'debug'

Events:
Type Reason Age From Message


Normal SuccessfulAttachVolume 29s attachdetach-controller AttachVolume.Attach succeeded for volume "pvc-3fb608ca-e402-4d64-b3f3-7d7b4c619a47"
Warning FailedMount 22s kubelet MountVolume.MountDevice failed for volume "pvc-3fb608ca-e402-4d64-b3f3-7d7b4c619a47" : rpc error: code = Internal desc = an error (exit status 32) occurred while running mount args: [-t ceph 10.x.xx:6789,10.x.x.x:6789,10.x.x.x:6789:/volumes/cephfs_data/csi-vol-b5a94d6a-191d-4083-ad70-37bee074ae53/5cd83e54-89c5-4214-8c71-6c91ff5bdd52 /var/lib/kubelet/plugins/kubernetes.io/csi/pv/pvc-3fb608ca-e402-4d64-b3f3-7d7b4c619a47/globalmount -o name=admin,secretfile=/tmp/csi/keys/keyfile-2256350680,mds_namespace=cephfs,debug,_netdev] stderr: unable to get monitor info from DNS SRV with service name: ceph-mon
2023-06-21T15:17:02.442+0000 7fe8e3f060c0 -1 failed for service _ceph-mon._tcp
mount error 22 = Invalid argument

@Rakshith-R
Copy link
Contributor

@dbenduga
Remove debug from pv.spec.MountOptions.

Delete and recreate cephfs sc too to remove debug option.

Are you using cephcsi canary image?

@nixpanic nixpanic added component/cephfs Issues related to CephFS component/deployment Helm chart, kubernetes templates and configuration Issues/PRs labels Jun 22, 2023
@dbenduga
Copy link
Author

@Rakshith-R Yes, We are using cephcsi canary image.

Stoarge class is created with debug mountOptions. Cephfs is getting mounted without any issue only on the pod schedule on particular node having this issue.

We have cordoned problematic node and restart pod and pod started.

Unable to find anything on that node. worker nodes are getting added using node pool and using same configuration and OS setting.

@Rakshith-R
Copy link
Contributor

@Rakshith-R Yes, We are using cephcsi canary image.

Stoarge class is created with debug mountOptions. Cephfs is getting mounted without any issue only on the pod schedule on particular node having this issue.

We have cordoned problematic node and restart pod and pod started.

Unable to find anything on that node. worker nodes are getting added using node pool and using same configuration and OS setting.

@dbenduga
I think only on that node cephcsi was updated to latest canary image ( can check image hash).
Please remove debug from sc and pv.mountoptions.
That should resolve your problem

@dbenduga
Copy link
Author

@Rakshith-R is it safe to remove mountoptions from SC? Or shall update the Cephfs plugins daemon set and update image from canary to specific version

@Rakshith-R
Copy link
Contributor

@Rakshith-R is it safe to remove mountoptions from SC? Or shall update the Cephfs plugins daemon set and update image from canary to specific version

@dbenduga ^
Yes, it is safe to delete and recreate cephfs SC removing only the debug mountOption.
You'll also need to edit existing cephfs pv s to remove debug from pv.spec.mountOptions.

Sc.MountOptions should only contain options that acceptable to both
Cephfs mount during nodestage and bind mount during node publish.

This is a breaking change that will be in release 3.9.
It's done in order to accommodate #3855

@dbenduga
Copy link
Author

@Rakshith-R Thanks for your inputs :)
Can just edit storage and comment moutpoint instead of deleting SC and recreating it ?


apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: csi-fs-sc
allowVolumeExpansion: true
provisioner: cephfs.csi.ceph.com
parameters:
clusterID: dbf6c837-1024-4a85-8609-9e88333a7447
fsName: cephfs
csi.storage.k8s.io/controller-expand-secret-name: csi-fs-secret
csi.storage.k8s.io/controller-expand-secret-namespace: ceph
csi.storage.k8s.io/provisioner-secret-name: csi-fs-secret
csi.storage.k8s.io/provisioner-secret-namespace: ceph
csi.storage.k8s.io/node-stage-secret-name: csi-fs-secret
csi.storage.k8s.io/node-stage-secret-namespace: ceph
reclaimPolicy: Delete
#mountOptions:

- debug

@Rakshith-R
Copy link
Contributor

@Rakshith-R Thanks for your inputs :) Can just edit storage and comment moutpoint instead of deleting SC and recreating it ?

apiVersion: storage.k8s.io/v1 kind: StorageClass metadata: name: csi-fs-sc allowVolumeExpansion: true provisioner: cephfs.csi.ceph.com parameters: clusterID: dbf6c837-1024-4a85-8609-9e88333a7447 fsName: cephfs csi.storage.k8s.io/controller-expand-secret-name: csi-fs-secret csi.storage.k8s.io/controller-expand-secret-namespace: ceph csi.storage.k8s.io/provisioner-secret-name: csi-fs-secret csi.storage.k8s.io/provisioner-secret-namespace: ceph csi.storage.k8s.io/node-stage-secret-name: csi-fs-secret csi.storage.k8s.io/node-stage-secret-namespace: ceph reclaimPolicy: Delete #mountOptions:

- debug

SC is immutable, you need to delete and recreate.

@github-actions
Copy link

This issue has been automatically marked as stale because it has not had recent activity. It will be closed in a week if no further activity occurs. Thank you for your contributions.

@github-actions github-actions bot added the wontfix This will not be worked on label Jul 23, 2023
@github-actions
Copy link

This issue has been automatically closed due to inactivity. Please re-open if this still requires investigation.

@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Jul 31, 2023
@pehlert
Copy link

pehlert commented Aug 7, 2023

Just in case someone needs to update all their PVs like I had to:

#!/bin/bash

# Get a list of PVs with the csi-cephfs-sc storage class. Adjust to your needs.
PVs=$(kubectl get pv -o=jsonpath='{.items[?(@.spec.storageClassName=="csi-cephfs-sc")].metadata.name}')

for pv in $PVs; do
  # Use kubectl patch to remove the spec.mountOptions field.
  kubectl patch pv $pv --type=json -p='[{"op": "remove", "path": "/spec/mountOptions"}]'
done

@Tlmonko
Copy link

Tlmonko commented Feb 20, 2024

I had the same issue.
Solved by removing debug from mountOptions in my storage class configuration

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
component/cephfs Issues related to CephFS component/deployment Helm chart, kubernetes templates and configuration Issues/PRs wontfix This will not be worked on
Projects
None yet
Development

No branches or pull requests

5 participants