Skip to content

Mount 32 Failure During Upgrade from Trident 22.10 to Trident 23.01 #844

@rohitssingh

Description

@rohitssingh

We have encountered an issue during the upgrade of our environment that results in specific upgraded pods being unable to mount volumes intermittently (~50% of the time).

Describe the bug
After software on a kubernetes node is upgraded (where Trident goes from 22.10 to 23.01), we encountered an issue
whereby the underlying volume is unable to mount.

This issue is encountered on a PV that is "multi-mounted" - that is, two separate pods running on the same node are accessing the same PV in filesystem mode.

Here's the specific message that is printed when the upgrade bug occurs:

requestID=f3c8025d-dad5-4cb3-84a1-ad0406182858 requestSource=CSI
time="2023-06-23T17:39:46Z" level=error msg="GRPC error: rpc
error: code = Internal desc = unable to mount device; exit status 32" 

We believe we have root caused the underlying problem to a regression that was introduced by this change:
aa3e565

Environment

  • Trident version: Going from 22.10 to 23.01.
  • OS: Ubuntu
  • NetApp backend types: OTS & ONTAP AFF

To Reproduce
Steps to reproduce the behavior:

  • Set up a pair of pods that share an underlying PV. Ensure that both pods are bound to the same node.
  • Upgrade Trident from 22.10 to 23.01
  • Take a look at the Trident tracking information
  • You'll note that there is a missing field for certain volumes:
root@node0:/var/lib/trident/tracking# cat pvc-7a90e5d5-3b73-45ec-a08c-9963fe04933c.json | jq
{
  "localhost": true,
  "fstype": "ext4",
  "sharedTarget": true,
  "LUKSEncryption": "false",
  "iscsiTargetPortal": "172.0.0.14",
  "iscsiPortals": [
    "172.0.0.5",
    "172.0.0.6",
    "172.0.0.7"
  ],
  "iscsiTargetIqn": "iqn.1992-08.com.netapp:sn.e90faeca0ff711eea04c005056acda88:vs.4",
  "iscsiLunNumber": 5,
  "iscsiInterface": "default",
  "iscsiIgroup": "node0-b3135d6a-2cbf-4383-abea-235403b560e8",
  "useCHAP": true,
  "iscsiUsername": "dude-initiator",
  "iscsiInitiatorSecret": "IAaPKlD6ygOf0AhC",
  "iscsiTargetUsername": "dude-iscsi-target",
  "iscsiTargetSecret": "ZiYqVKFGDN4ouieZ",
  "VolumeTrackingInfoPath": "",
  "stagingTargetPath": "/var/lib/kubelet/plugins/kubernetes.io/csi/csi.trident.netapp.io/8e2c5043cdde8eef0e3d303ef5eaacafa803b3671810b8e46bcb5e3e7fa12964/globalmount",
  "publishedTargetPaths": {
    "/var/lib/kubelet/pods/8a7775fe-14ae-4089-95b9-8e764cef43fc/volumes/kubernetes.io~csi/pvc-7a90e5d5-3b73-45ec-a08c-9963fe04933c/mount": {}
  }
}
  • When we see this bug manifest, rawDevicePath is not populated
  • This results in a exit 32 error on the next attempt to mount the underlying PV.

Expected behavior

  • On upgrade, rawDevicePath should be present

Metadata

Metadata

Assignees

No one assigned

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions