[BUG]: Resiliency: Occasional failure unmounting Unity volume for raw block devices via iSCSI #237
Labels
area/csm-resiliency
Issue pertains to the CSM Resiliency module
type/bug
Something isn't working. This is the default label associated with a bug issue.
Milestone
Bug Description
This issue was discovered running longevity on Unity for CSM Resiliency (est. 20%-30% of the time). Longevity testing is a looped execution of the integration tests. The integration tests were run regularly in a single, non-looped, iteration with no failures during development in Q1.
Expected behavior:
After inducing an error condition on a node, the managed pods and resources are supposed to be removed and recreated on a healthy node.
Observed behavior:
Occasionally, the unmount operation of a Unity raw block volume attached via iSCSI fails. This has been detected during longevity testing after running this scenario 10 times
Workaround:
See item 2 in the design limitations [known design limitations|https://dell.github.io/csm-docs/docs/resiliency/design/]
The recommended workaround is for the administrator to reboot the affected node.
Logs
NodeUnstageVolume: REP 0621: rpc error: code = Internal desc = runid=621 Volume podmonvol-xxxxxx-iSCSI-apmnnnnnnnnnnn-sv_61383 has been mounted outside the provided target path /var/lib/kubelet/plugins/kubernetes.io/csi/volumeDevices/staging/podmonvol-nnnnnnnncb"
level=error msg="NodeUnstageVolume failed:
level=error msg="Could not Unmount private block device.
level=info msg="Couldn't completely cleanup node- taint not removed- cleanup will be retried, or a manual reboot is advised"
Screenshots
No response
Additional Environment Information
No response
Steps to Reproduce
--> PV unmount fails with "device is busy" after pod has been deleted
Expected Behavior
PV should be unmounted successfully after pod has been deleted
CSM Driver(s)
CSI Driver for Unity 2.2
Installation Type
Helm
Container Storage Modules Enabled
Resiliency Podmon 1.1
Container Orchestrator
Kubernetes 1.23
Operating System
CentOS 7
The text was updated successfully, but these errors were encountered: