Skip to content

upgrade to 22.10.0 - trident crashes when a volume has state = upgrading  #787

@nitnatsnocER

Description

@nitnatsnocER

Describe the bug
During update from 22.07.0 to 22.10.0 I face a segfault error in the log. We use trident to manage volumes on solidfire storage via ISCSI from kubernetes. We do not use the operator, we have our own helm chart.

here is the log:

trident-6cff996fdb-khggp trident time="2022-12-05T14:31:21+01:00" level=info msg="Running Trident storage orchestrator." binary=/bin/trident build_time="Mon Oct 31 16:03:20 EDT 2022" version=22.10.0 trident-6cff996fdb-khggp trident time="2022-12-05T14:31:22+01:00" level=info msg="Created Kubernetes clients." namespace=kube-system version=v1.24.4 trident-6cff996fdb-khggp trident time="2022-12-05T14:31:22+01:00" level=info msg="Initializing metrics frontend." address=":8001" trident-6cff996fdb-khggp trident time="2022-12-05T14:31:22+01:00" level=info msg="Added frontend." name=metrics trident-6cff996fdb-khggp trident time="2022-12-05T14:31:22+01:00" level=info msg="Initializing K8S helper frontend." requestID=63ca2515-8ac1-4f6b-9b88-3b8036e46cc9 requestSource=Internal trident-6cff996fdb-khggp trident time="2022-12-05T14:31:22+01:00" level=info msg="K8S helper determined the container orchestrator version." gitVersion=v1.24.4 requestID=63ca2515-8ac1-4f6b-9b88-3b8036e46cc9 requestSource=Internal version=1.24 trident-6cff996fdb-khggp trident time="2022-12-05T14:31:22+01:00" level=info msg="Added frontend." name=k8s_csi_helper trident-6cff996fdb-khggp trident time="2022-12-05T14:31:22+01:00" level=info msg="Initializing CSI frontend." name=XXXpc1XXX version=22.10.0 trident-6cff996fdb-khggp trident time="2022-12-05T14:31:22+01:00" level=info msg="Enabling controller service capability." capability=CREATE_DELETE_VOLUME trident-6cff996fdb-khggp trident time="2022-12-05T14:31:22+01:00" level=info msg="Enabling controller service capability." capability=PUBLISH_UNPUBLISH_VOLUME trident-6cff996fdb-khggp trident time="2022-12-05T14:31:22+01:00" level=info msg="Enabling controller service capability." capability=LIST_VOLUMES trident-6cff996fdb-khggp trident time="2022-12-05T14:31:22+01:00" level=info msg="Enabling controller service capability." capability=CREATE_DELETE_SNAPSHOT trident-6cff996fdb-khggp trident time="2022-12-05T14:31:22+01:00" level=info msg="Enabling controller service capability." capability=LIST_SNAPSHOTS trident-6cff996fdb-khggp trident time="2022-12-05T14:31:22+01:00" level=info msg="Enabling controller service capability." capability=EXPAND_VOLUME trident-6cff996fdb-khggp trident time="2022-12-05T14:31:22+01:00" level=info msg="Enabling controller service capability." capability=CLONE_VOLUME trident-6cff996fdb-khggp trident time="2022-12-05T14:31:22+01:00" level=info msg="Enabling controller service capability." capability=LIST_VOLUMES_PUBLISHED_NODES trident-6cff996fdb-khggp trident time="2022-12-05T14:31:22+01:00" level=info msg="Enabling volume access mode." mode=SINGLE_NODE_WRITER trident-6cff996fdb-khggp trident time="2022-12-05T14:31:22+01:00" level=info msg="Enabling volume access mode." mode=SINGLE_NODE_READER_ONLY trident-6cff996fdb-khggp trident time="2022-12-05T14:31:22+01:00" level=info msg="Enabling volume access mode." mode=MULTI_NODE_READER_ONLY trident-6cff996fdb-khggp trident time="2022-12-05T14:31:22+01:00" level=info msg="Enabling volume access mode." mode=MULTI_NODE_SINGLE_WRITER trident-6cff996fdb-khggp trident time="2022-12-05T14:31:22+01:00" level=info msg="Enabling volume access mode." mode=MULTI_NODE_MULTI_WRITER trident-6cff996fdb-khggp trident time="2022-12-05T14:31:22+01:00" level=info msg="Added frontend." name=csi trident-6cff996fdb-khggp trident time="2022-12-05T14:31:22+01:00" level=info msg="Initializing Trident CRD controller frontend." namespace=kube-system trident-6cff996fdb-khggp trident time="2022-12-05T14:31:22+01:00" level=info msg="Creating event broadcaster." trident-6cff996fdb-khggp trident time="2022-12-05T14:31:22+01:00" level=info msg="Setting up CRD controller event handlers." trident-6cff996fdb-khggp trident time="2022-12-05T14:31:22+01:00" level=info msg="Added frontend." name=crd trident-6cff996fdb-khggp trident time="2022-12-05T14:31:22+01:00" level=info msg="Initializing HTTP REST frontend." address="127.0.0.1:8000" trident-6cff996fdb-khggp trident time="2022-12-05T14:31:22+01:00" level=info msg="Added frontend." name="HTTP REST" trident-6cff996fdb-khggp trident time="2022-12-05T14:31:22+01:00" level=info msg="Initializing HTTPS REST frontend." address=":9443" trident-6cff996fdb-khggp trident time="2022-12-05T14:31:22+01:00" level=info msg="Added frontend." name="HTTPS REST" trident-6cff996fdb-khggp trident time="2022-12-05T14:31:22+01:00" level=info msg="Activating metrics frontend." address=":8001" trident-6cff996fdb-khggp trident time="2022-12-05T14:31:22+01:00" level=info msg="Activating HTTP REST frontend." address="127.0.0.1:8000" trident-6cff996fdb-khggp trident time="2022-12-05T14:31:22+01:00" level=info msg="Activating HTTPS REST frontend." address=":9443" trident-6cff996fdb-khggp trident time="2022-12-05T14:31:22+01:00" level=info msg="Storage driver initialized." driver=solidfire-san requestID=dc24f3fa-1193-415a-981d-1f3f2f16251c requestSource=Internal trident-6cff996fdb-khggp trident time="2022-12-05T14:31:22+01:00" level=info msg="Created new storage backend." backend="&{0xc000707680 solidfire_XXX.XXX.XXX.XXX true online map[default:0xc000952de0 fast:0xc000952e40 slow:0xc000952d80] map[] false}" requestID=dc24f3fa-1193-415a-981d-1f3f 2f16251c requestSource=Internal trident-6cff996fdb-khggp trident time="2022-12-05T14:31:22+01:00" level=info msg="Newly added backend satisfies no storage classes." backend=solidfire_XXX.XXX.XXX.XXX requestID=dc24f3fa-1193-415a-981d-1f3f2f16251c requestSource=Internal trident-6cff996fdb-khggp trident time="2022-12-05T14:31:22+01:00" level=info msg="Added an existing backend." backend=solidfire_XXX.XXX.XXX.XXX backendUUID=5ce563b7-6f62-4d8b-9c22-65d816c74938 configRef= handler=Bootstrap online=true persistentBackends.BackendUUID=5ce563b7-6f62-4d8b-9c22-6 5d816c74938 requestID=dc24f3fa-1193-415a-981d-1f3f2f16251c requestSource=Internal state=online trident-6cff996fdb-khggp trident time="2022-12-05T14:31:22+01:00" level=info msg="Added an existing storage class." handler=Bootstrap requestID=dc24f3fa-1193-415a-981d-1f3f2f16251c requestSource=Internal storageClass=solidfire-default trident-6cff996fdb-khggp trident time="2022-12-05T14:31:22+01:00" level=info msg="Added an existing storage class." handler=Bootstrap requestID=dc24f3fa-1193-415a-981d-1f3f2f16251c requestSource=Internal storageClass=solidfire-fast trident-6cff996fdb-khggp trident time="2022-12-05T14:31:22+01:00" level=info msg="Added an existing storage class." handler=Bootstrap requestID=dc24f3fa-1193-415a-981d-1f3f2f16251c requestSource=Internal storageClass=solidfire-slow trident-6cff996fdb-khggp trident time="2022-12-05T14:31:22+01:00" level=info msg="Added 93 existing volume(s)" requestID=dc24f3fa-1193-415a-981d-1f3f2f16251c requestSource=Internal trident-6cff996fdb-khggp trident time="2022-12-05T14:31:22+01:00" level=info msg="Added an existing node." handler=Bootstrap node=XXXpc1XXX requestID=dc24f3fa-1193-415a-981d-1f3f2f16251c requestSource=Internal trident-6cff996fdb-khggp trident time="2022-12-05T14:31:22+01:00" level=info msg="Added an existing node." handler=Bootstrap node=XXXpc2XXX requestID=dc24f3fa-1193-415a-981d-1f3f2f16251c requestSource=Internal trident-6cff996fdb-khggp trident time="2022-12-05T14:31:22+01:00" level=info msg="Added an existing node." handler=Bootstrap node=XXXpc3XXX requestID=dc24f3fa-1193-415a-981d-1f3f2f16251c requestSource=Internal trident-6cff996fdb-khggp trident time="2022-12-05T14:31:22+01:00" level=info msg="Added an existing node." handler=Bootstrap node=XXXpc4XXX requestID=dc24f3fa-1193-415a-981d-1f3f2f16251c requestSource=Internal trident-6cff996fdb-khggp trident time="2022-12-05T14:31:22+01:00" level=info msg="Added an existing node." handler=Bootstrap node=XXXpc5XXX requestID=dc24f3fa-1193-415a-981d-1f3f2f16251c requestSource=Internal trident-6cff996fdb-khggp trident time="2022-12-05T14:31:22+01:00" level=info msg="Added an existing node." handler=Bootstrap node=XXXpc6XXX requestID=dc24f3fa-1193-415a-981d-1f3f2f16251c requestSource=Internal trident-6cff996fdb-khggp trident time="2022-12-05T14:31:22+01:00" level=info msg="Added an existing node." handler=Bootstrap node=XXXpc7XXX requestID=dc24f3fa-1193-415a-981d-1f3f2f16251c requestSource=Internal trident-6cff996fdb-khggp trident time="2022-12-05T14:31:22+01:00" level=info msg="Added an existing node." handler=Bootstrap node=XXXpc8XXX requestID=dc24f3fa-1193-415a-981d-1f3f2f16251c requestSource=Internal trident-6cff996fdb-khggp trident time="2022-12-05T14:31:22+01:00" level=info msg="Added an existing volume publication." handler=Bootstrap node=XXXpc6XXX requestID=dc24f3fa-1193-415a-981d-1f3f2f16251c requestSource=Internal volume=XXX . . . trident-6cff996fdb-khggp trident time="2022-12-05T14:31:22+01:00" level=info msg="Added an existing volume publication." handler=Bootstrap node=XXXpc4XXX requestID=dc24f3fa-1193-415a-981d-1f3f2f16251c requestSource=Internal volume=XXX trident-6cff996fdb-khggp trident time="2022-12-05T14:31:22+01:00" level=info msg="Added an existing volume publication." handler=Bootstrap node=XXXpc8XXX requestID=dc24f3fa-1193-415a-981d-1f3f2f16251c requestSource=Internal volume=XXX trident-6cff996fdb-khggp trident time="2022-12-05T14:31:22+01:00" level=info msg="Added an existing volume publication." handler=Bootstrap node=XXXpc7XXX requestID=dc24f3fa-1193-415a-981d-1f3f2f16251c requestSource=Internal volume=XXX trident-6cff996fdb-khggp trident time="2022-12-05T14:31:22+01:00" level=info msg="Added an existing volume publication." handler=Bootstrap node=XXXpc8XXX requestID=dc24f3fa-1193-415a-981d-1f3f2f16251c requestSource=Internal volume=XXX trident-6cff996fdb-khggp trident time="2022-12-05T14:31:22+01:00" level=error msg="Transaction monitor blocked by bootstrap error." error="Trident is initializing, please try again later" requestID=dc24f3fa-1193-415a-981d-1f3f2f16251c requestSource=Internal trident-6cff996fdb-khggp trident time="2022-12-05T14:31:22+01:00" level=info msg="Trident bootstrapped successfully." trident-6cff996fdb-khggp trident time="2022-12-05T14:31:22+01:00" level=info msg="Activating K8S helper frontend." requestID=f58b0253-a1a2-405b-9b0b-0809e5fad859 requestSource=Internal trident-6cff996fdb-khggp trident time="2022-12-05T14:31:22+01:00" level=warning msg="K8S helper could not add a storage class: storage class solidfire-default already exists" name=solidfire-default parameters="map[IOPS:5000 backendType:solidfire-san csi.storage.k8s.io/fstype:xfs provision ingType:thin snapshots:false]" provisioner=csi.trident.netapp.io requestID=14f9356d-42d2-40b4-96a8-b8f6441dc1db requestSource=Kubernetes trident-6cff996fdb-khggp trident time="2022-12-05T14:31:22+01:00" level=warning msg="K8S helper could not add a storage class: storage class solidfire-fast already exists" name=solidfire-fast parameters="map[IOPS:7000 backendType:solidfire-san csi.storage.k8s.io/fstype:xfs provisioningTyp e:thin snapshots:false]" provisioner=csi.trident.netapp.io requestID=f135f125-747f-40c4-8702-8a5c955b2f55 requestSource=Kubernetes trident-6cff996fdb-khggp trident time="2022-12-05T14:31:22+01:00" level=warning msg="K8S helper could not add a storage class: storage class solidfire-slow already exists" name=solidfire-slow parameters="map[IOPS:1500 backendType:solidfire-san csi.storage.k8s.io/fstype:xfs provisioningTyp e:thin snapshots:false]" provisioner=csi.trident.netapp.io requestID=e586a9f1-0210-4b4f-9278-0091a7f5e3ed requestSource=Kubernetes trident-6cff996fdb-khggp trident panic: runtime error: invalid memory address or nil pointer dereference trident-6cff996fdb-khggp trident [signal SIGSEGV: segmentation violation code=0x1 addr=0x20 pc=0x2c5d9c9] trident-6cff996fdb-khggp trident trident-6cff996fdb-khggp trident goroutine 1 [running]: trident-6cff996fdb-khggp trident github.com/netapp/trident/frontend/csi/helpers/kubernetes.(*Plugin).handleFailedPVUpgrades(0xc000600800, {0x3bc4218, 0xc000398750}) trident-6cff996fdb-khggp trident /go/src/github.com/netapp/trident/frontend/csi/helpers/kubernetes/upgrade_pv.go:949 +0x129 trident-6cff996fdb-khggp trident github.com/netapp/trident/frontend/csi/helpers/kubernetes.(*Plugin).Activate(0xc000600800) trident-6cff996fdb-khggp trident /go/src/github.com/netapp/trident/frontend/csi/helpers/kubernetes/plugin.go:398 +0x4da trident-6cff996fdb-khggp trident main.main() trident-6cff996fdb-khggp trident /go/src/github.com/netapp/trident/main.go:433 +0x231d
then the pod crashes. A downgrade to previous version 22.07.0 is possible and trident works fine.

Environment

  • Trident version: 22.10.0
  • Trident installation flags used: [e.g. -d -n trident --use-custom-yaml]
  • Container runtime: containerd://1.6.8
  • Kubernetes version: v1.24.4
  • Kubernetes orchestrator: own distro
  • Kubernetes enabled feature gates: [e.g. CSINodeInfo]
  • OS: centOS7, kernel-ml-5.17.0
  • NetApp backend types: solidfire 12.3.0.958
  • Other:

To Reproduce
Upgrade from 22.07.0 to 22.10.0 and have at least one volume in state = upgrading in tridentvolume, see here:

kubectl get tridentvolume <volume_name> -ojson

{ "apiVersion": "trident.netapp.io/v1", "backendUUID": "5ce563b7-6f62-4d8b-9c22-65d816c74938", "config": { "accessInformation": { "iscsiInterface": "default", "iscsiTargetIqn": "iqn.2010-01.com.solidfire:py74.<volume_name>.xxxx", "iscsiTargetPortal": "XXX.XXX.XXX.XXX:xxxx", "iscsiVags": [ 7 ] }, "accessMode": "ReadWriteOnce", "blockSize": "4096", "cloneSourceSnapshot": "", "cloneSourceVolume": "", "cloneSourceVolumeInternal": "", "encryption": "", "fileSystem": "ext4", "internalName": "<volume_name>", "name": "<volume_name>", "protocol": "block", "securityStyle": "", "size": "21474836480", "spaceReserve": "", "splitOnClone": "", "storageClass": "solidfire-slow", "version": "1" }, "kind": "TridentVolume", "metadata": { "creationTimestamp": "2020-06-04T09:00:21Z", "finalizers": [ "trident.netapp.io" ], "generation": 2, "name": "<volume_name>", "namespace": "kube-system", "resourceVersion": "508606680", "uid": "c9a3640e-5582-40a2-972f-fcc8978a4df1" }, "orphaned": false, "pool": "slow", "state": "upgrading" }

Expected behavior
The application should throw an error message or handle this in a better way.

Additional context
My guess is that this has something to do with volumes that have "state": "upgrading" when looking into tridentvolume with
kubectl get tridentvolume <volume_name> -o json|yaml. The upgrade works in our dev cluster where we don't have any volume in state upgrading. No issue there with trident 22.10.0.
I also want to know what the state = upgrading means? Does it mean this volume was not migrated well to csi? Can I somehow "stop" the process when a volume is in state = upgrading and restart this process?

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions