FailedMount: Unable to attach or mount volumes

**Describe the bug**
In a openshift 4.4.5 bare mental install with trident, we have seen several cases where a pod getting unscheduled from one worker, and scheduled to another, will get in a state where the new pod cannot attach or mount the volume.

The current work around is to restart the kubelets. This is for large scale install, and this instability is getting to be critical.


The logs look like the following:

time="2020-11-03T14:21:03Z" level=debug msg="GRPC call: /csi.v1.Node/NodeUnpublishVolume"
time="2020-11-03T14:21:03Z" level=debug msg="GRPC request: volume_id:\"pvc-9455721f-3a5e-49e1-9844-713bbcb3f5a8\" target_path:\"/var/lib/kubelet/pods/0a61b028-214e-4074-8443-258f74b8b91b/volumes/kubernetes.io~csi/pvc-9455721f-3a5e-49e1-9844-713bbcb3f5a8/mount\" "
time="2020-11-03T14:21:03Z" level=debug msg="Attempting to acquire shared lock (NodeUnpublishVolume-pvc-9455721f-3a5e-49e1-9844-713bbcb3f5a8)." lock=csi_node_server
time="2020-11-03T14:21:03Z" level=debug msg="Acquired shared lock (NodeUnpublishVolume-pvc-9455721f-3a5e-49e1-9844-713bbcb3f5a8)." lock=csi_node_server
time="2020-11-03T14:21:03Z" level=debug msg=">>>> NodeUnpublishVolume" Method=NodeUnpublishVolume Type=CSI_Node
time="2020-11-03T14:21:03Z" level=debug msg="<<<< NodeUnpublishVolume" Method=NodeUnpublishVolume Type=CSI_Node
time="2020-11-03T14:21:03Z" level=debug msg="Released shared lock (NodeUnpublishVolume-pvc-9455721f-3a5e-49e1-9844-713bbcb3f5a8)." lock=csi_node_server
time="2020-11-03T14:21:03Z" level=error msg="GRPC error: rpc error: code = Internal desc = could not check if the target path (/var/lib/kubelet/pods/0a61b028-214e-4074-8443-258f74b8b91b/volumes/kubernetes.io~csi/pvc-9455721f-3a5e-49e1-9844-713bbcb3f5a8/mount) is a directory; stat /var/lib/kubelet/pods/0a61b028-214e-4074-8443-258f74b8b91b/volumes/kubernetes.io~csi/pvc-9455721f-3a5e-49e1-9844-713bbcb3f5a8/mount: stale NFS file handle"
A clear and concise description of what the bug is.

**Environment**
Provide accurate information about the environment to help us reproduce the issue.

- Trident version: [e.g. 19.10]
20.07.01
- Trident installation flags used: [e.g. -d -n trident --use-custom-yaml]

OpenShift 4.4.5
- Container runtime: [e.g. Docker 19.03.1-CE]
- Kubernetes version: [e.g. 1.15.1]
- Kubernetes orchestrator: [e.g. OpenShift v3.11, Rancher v2.3.3]
- Kubernetes enabled feature gates: [e.g. CSINodeInfo]
- OS: [e.g. RHEL 7.6, Ubuntu 16.04]
RHCOS
- NetApp backend types: [e.g. CVS for AWS, ONTAP AFF 9.5, HCI 1.7]
- Other:

**To Reproduce**
Steps to reproduce the behavior:
This appears to happen at random. 
Im currently doing a specific test case of a container that switches between nodes to be able to reproduce it easier.

**Expected behavior**
A clear and concise description of what you expected to happen.
Pods have no trouble moving between nodes.

**Additional context**
Add any other context about the problem here.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

FailedMount: Unable to attach or mount volumes #483

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

FailedMount: Unable to attach or mount volumes #483

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions