Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG]: CSI Driver - issue with creation volume from 1 of the worker nodes #1057

Closed
deshab opened this issue Nov 27, 2023 · 28 comments
Closed
Assignees
Labels
area/csi-powerflex Issue pertains to the CSI Driver for Dell EMC PowerFlex type/bug Something isn't working. This is the default label associated with a bug issue.
Milestone

Comments

@deshab
Copy link

deshab commented Nov 27, 2023

Bug Description

issue with creation volume from 1 of the worker nodes, unable to create volumes in this node. We still have the environment as is if you want to troubleshoot the issue. Please let us know.

Logs

Error log:
kob-elastic-system 31m Warning ProvisioningFailed persistentvolumeclaim/elasticsearch-data-es-kob-es-hot-8 failed to provision volume with StorageClass "vxflexos-xfs": error generating accessibility requirements: topology map[csi-vxflexos.dellemc.com/0000000000000000:csi-vxflexos.dellemc.com] from selected node "wrk-10-x-x-x" is not in requisite: [map[csi-vxflexos.dellemc.com/187e850d57b03e0f:csi-vxflexos.dellemc.com]]

More info:

I saw some extra fields added to the label, In what scenario this was added??? csi-vxflexos.dellemc.com/0000000000000000=csi-vxflexos.dellemc.com,

BAD Node:
beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,csi-vxflexos.dellemc.com/0000000000000000=csi-vxflexos.dellemc.com,csi-vxflexos.dellemc.com/187e850d57b03e0f=csi-vxflexos.dellemc.com,kubernetes.io/arch=amd64,kubernetes.io/hostname=wrk-10-x-x-x,kubernetes.io/os=linux,route-reflector=,topology.kubernetes.io/zone=AZ2

Good Node:
beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,csi-vxflexos.dellemc.com/187e850d57b03e0f=csi-vxflexos.dellemc.com,kubernetes.io/arch=amd64,kubernetes.io/hostname=wrk-10-x-x-x,kubernetes.io/os=linux,route-reflector=,topology.kubernetes.io/zone=AZ3

Screenshots

No response

Additional Environment Information

No response

Steps to Reproduce

unknown, the cluster is in STABLE.

Expected Behavior

User should be able to create volumes in all nodes part of the k8s cluster.The user

CSM Driver(s)

CSI Driver PowerFlex

Installation Type

No response

Container Storage Modules Enabled

No response

Container Orchestrator

Kubernetes 1.26.6

Operating System

Ubuntu

@deshab deshab added needs-triage Issue requires triage. type/bug Something isn't working. This is the default label associated with a bug issue. labels Nov 27, 2023
@deshab deshab changed the title [BUG]: issue with creation volume from 1 of the worker nodes [BUG]: CSI Driver - issue with creation volume from 1 of the worker nodes Nov 27, 2023
@csmbot
Copy link
Collaborator

csmbot commented Nov 27, 2023

@deshab: Thank you for submitting this issue!

The issue is currently awaiting triage. Please make sure you have given us as much context as possible.

If the maintainers determine this is a relevant issue, they will remove the needs-triage label and respond appropriately.


We want your feedback! If you have any questions or suggestions regarding our contributing process/workflow, please reach out to us at container.storage.modules@dell.com.

@deshab
Copy link
Author

deshab commented Nov 29, 2023

Any update on this issue?

@shanmydell
Copy link
Collaborator

@deshab : Team will look into this issue shortly

@shanmydell
Copy link
Collaborator

@deshab : As it is reporting all zeros, it seems like an SDC issue. Please refer to KB article here. https://www.dell.com/support/kbdoc/000213824.
Could you please check and add your observations?

@deshab
Copy link
Author

deshab commented Nov 29, 2023

I'm Dell Employee and getting this message when accessing the KB: "This article is permission based. Find another article."

@bharathsreekanth
Copy link
Contributor

@deshab Can you share the details on the version of PowerFlex and the CSI driver version?
@suryagupta4 FYI

@suryagupta4
Copy link

Hi @deshab, can you also share how many arrays were configured in the secret and the sdc version you installed?

@deshab
Copy link
Author

deshab commented Nov 29, 2023

PowerFlex - 3.6-700.013
CSI Driver - 2.7.0

scli --query_all_sdc | grep xxx
OS Type: LINUX Loaded Version: 3.6.700 Installed Version: 3.6.700

@bharathsreekanth
Copy link
Contributor

@deshab can you provide @suryagupta4 with the details on how your secrets were configured. Did you use one PowerFlex array or more than one as part of the secret creation?

@shanmydell
Copy link
Collaborator

@deshab : Requesting your inputs

@deshab
Copy link
Author

deshab commented Dec 6, 2023

@shanmydell are you referring to K8s secrets or the PowerFlex server side itself? Could you please confirm on this?
K8s Secrets:
k -n kob-vxflexos get secrets
NAME TYPE DATA AGE
sh.helm.release.v1.vxflexos.v1 helm.sh/release.v1 1 148d
sh.helm.release.v1.vxflexos.v2 helm.sh/release.v1 1 134d
sh.helm.release.v1.vxflexos.v3 helm.sh/release.v1 1 134d
vxflexos-config Opaque 2 149d

@shanmydell
Copy link
Collaborator

@suryagupta4 : Please look into the inputs provided above

@deshab
Copy link
Author

deshab commented Dec 6, 2023

@suryagupta4 are you referring to K8s secrets or the PowerFlex server side itself? Could you please confirm this?

@suryagupta4
Copy link

@deshab the secret yaml from which you created the secret vxflexos-config. Similar issue came in which there's all 0's present in the systemID and this is resolved as part of this git issue: #1020
Also, can you verify whether you are getting the correct systemID on the worker nodes using this command: /opt/emc/scaleio/sdc/bin/drv_cfg --query_mdm

@suryagupta4
Copy link

link: 18734

@deshab
Copy link
Author

deshab commented Dec 6, 2023

Missing link for 18734, everything is looks good here.

Here is output:
sudo /opt/emc/scaleio/sdc/bin/drv_cfg --query_mdm
Retrieved 1 mdm(s)
MDM-ID 187e850d57b03e0f SDC ID 10129d110000000b INSTALLATION ID 1314313d3c64cea4 IPs [0]-198.19.60.28 [1]-198.19.56.28

MDM: "198.19.60.28,198.19.56.28"

  • username: "k8suser"
    password: "redacted"
    systemID: "187e850d57b03e0f"
    endpoint: "https://redacted"
    skipCertificateValidation: true
    isDefault: true
    mdm: "198.19.60.28,198.19.56.28"

@suryagupta4
Copy link

@deshab are you getting same output on both the worker nodes?

sudo /opt/emc/scaleio/sdc/bin/drv_cfg --query_mdm
Retrieved 1 mdm(s)
MDM-ID 187e850d57b03e0f SDC ID 10129d110000000b INSTALLATION ID 1314313d3c64cea4 IPs [0]-198.19.60.28 [1]-198.19.56.28

@deshab
Copy link
Author

deshab commented Dec 6, 2023

here is the output:
This is from a Bad node where PVC is failing.
SDC ID is different:

sudo /opt/emc/scaleio/sdc/bin/drv_cfg --query_mdm
Retrieved 1 mdm(s)
MDM-ID 187e850d57b03e0f SDC ID 10129d0600000000 INSTALLATION ID 1314313d3c64cea4 IPs [0]-198.19.60.28 [1]-198.19.56.28

@suryagupta4
Copy link

@deshab can you share both the node pod logs for the driver container.
command: kubectl logs <node-pod-name> -n <namespace> -c driver
Also, share the command output of: /opt/emc/scaleio/sdc/bin/drv_cfg --query_version just to make sure what's the sdc version present on both the worker nodes.

@deshab
Copy link
Author

deshab commented Dec 6, 2023

Working Node: 89-14

sudo /opt/emc/scaleio/sdc/bin/drv_cfg --query_mdm
Retrieved 1 mdm(s)
MDM-ID 187e850d57b03e0f SDC ID 10129d0800000002 INSTALLATION ID 1314313d3c64cea4 IPs [0]-198.19.60.28 [1]-198.19.56.28

sudo /opt/emc/scaleio/sdc/bin/drv_cfg --query_version
DellEMC PowerFlex Version: R3_6.700.103

kubectl logs vxflexos-node-zdmp8 -n kob-vxflexos -c driver
time="2023-12-06T18:12:36Z" level=info msg="/csi.v1.Node/NodeGetCapabilities: REQ 118818: XXX_NoUnkeyedLiteral={}, XXX_sizecache=0"
time="2023-12-06T18:12:36Z" level=info msg="/csi.v1.Node/NodeGetCapabilities: REP 118818: Capabilities=[rpc:<type:EXPAND_VOLUME > rpc:<type:SINGLE_NODE_MULTI_WRITER > ], XXX_NoUnkeyedLiteral={}, XXX_sizecache=0"

Bad Node: 89-15

sudo /opt/emc/scaleio/sdc/bin/drv_cfg --query_mdm
Retrieved 1 mdm(s)
MDM-ID 187e850d57b03e0f SDC ID 10129d0600000000 INSTALLATION ID 1314313d3c64cea4 IPs [0]-198.19.60.28 [1]-198.19.56.28

sudo /opt/emc/scaleio/sdc/bin/drv_cfg --query_version
DellEMC PowerFlex Version: R3_6.700.103

kubectl logs vxflexos-node-pk92r -n kob-vxflexos -c driver
time="2023-11-09T05:31:16Z" level=info msg="/csi.v1.Node/NodeGetCapabilities: REQ 46692: XXX_NoUnkeyedLiteral={}, XXX_sizecache=0"
time="2023-11-09T05:31:16Z" level=info msg="/csi.v1.Node/NodeGetCapabilities: REP 46692: Capabilities=[rpc:<type:EXPAND_VOLUME > rpc:<type:SINGLE_NODE_MULTI_WRITER > ], XXX_NoUnkeyedLiteral={}, XXX_sizecache=0"
time="2023-11-09T05:32:39Z" level=info msg="/csi.v1.Node/NodeUnpublishVolume: REQ 46693: VolumeId=187e850d57b03e0f-eff0738e00000004, TargetPath=/var/lib/kubelet/pods/b9f06af4-e405-436e-b9ab-5f9d0541ad74/volumes/kubernetes.iocsi/k8s-62ed9ae573/mount, XXX_NoUnkeyedLiteral={}, XXX_sizecache=0"
time="2023-11-09T05:32:39Z" level=info msg="NodeUnpublishVolume volumeID: eff0738e00000004"
time="2023-11-09T05:32:39Z" level=info msg="NodeUnpublishVolume systemID: 187e850d57b03e0f"
time="2023-11-09T05:32:39Z" level=info msg="Volume ID: 187e850d57b03e0f-eff0738e00000004 contains system ID: 187e850d57b03e0f. checkVolumesMap passed"
time="2023-11-09T05:32:39Z" level=info msg="Found matching SDC mapped volume &{187e850d57b03e0f eff0738e00000004 /dev/scinib}"
time="2023-11-09T05:32:39Z" level=info msg="Found private mount for device &service.Device{FullPath:"/dev/scinib", Name:"scinib", RealDev:"/dev/scinib"}, private mount path: /var/lib/kubelet/plugins/vxflexos.emc.dell.com/disks/187e850d57b03e0f-eff0738e00000004 ."
time="2023-11-09T05:32:39Z" level=info msg="Found target mount for device &service.Device{FullPath:"/dev/scinib", Name:"scinib", RealDev:"/dev/scinib"}, target mount path: /var/lib/kubelet/pods/b9f06af4-e405-436e-b9ab-5f9d0541ad74/volumes/kubernetes.io
csi/k8s-62ed9ae573/mount ."
time="2023-11-09T05:32:39Z" level=debug msg="Unmounting /var/lib/kubelet/pods/b9f06af4-e405-436e-b9ab-5f9d0541ad74/volumes/kubernetes.iocsi/k8s-62ed9ae573/mount" CSIRequestID=46693 device=/dev/scinib privTgt=/var/lib/kubelet/plugins/vxflexos.emc.dell.com/disks/187e850d57b03e0f-eff0738e00000004 target="/var/lib/kubelet/pods/b9f06af4-e405-436e-b9ab-5f9d0541ad74/volumes/kubernetes.iocsi/k8s-62ed9ae573/mount"
time="2023-11-09T05:32:39Z" level=info msg="unmount syscall" cmd=umount path="/var/lib/kubelet/pods/b9f06af4-e405-436e-b9ab-5f9d0541ad74/volumes/kubernetes.iocsi/k8s-62ed9ae573/mount"
time="2023-11-09T05:32:39Z" level=debug msg="Unmounting /var/lib/kubelet/plugins/vxflexos.emc.dell.com/disks/187e850d57b03e0f-eff0738e00000004" CSIRequestID=46693 device=/dev/scinib privTgt=/var/lib/kubelet/plugins/vxflexos.emc.dell.com/disks/187e850d57b03e0f-eff0738e00000004 target="/var/lib/kubelet/pods/b9f06af4-e405-436e-b9ab-5f9d0541ad74/volumes/kubernetes.io
csi/k8s-62ed9ae573/mount"
time="2023-11-09T05:32:39Z" level=info msg="unmount syscall" cmd=umount path=/var/lib/kubelet/plugins/vxflexos.emc.dell.com/disks/187e850d57b03e0f-eff0738e00000004
time="2023-11-09T05:32:39Z" level=debug msg="removing directory" directory=/var/lib/kubelet/plugins/vxflexos.emc.dell.com/disks/187e850d57b03e0f-eff0738e00000004
time="2023-11-09T05:32:39Z" level=info msg="/csi.v1.Node/NodeUnpublishVolume: REP 46693: XXX_NoUnkeyedLiteral={}, XXX_sizecache=0"
time="2023-11-09T05:32:39Z" level=info msg="/csi.v1.Node/NodeGetCapabilities: REQ 46694: XXX_NoUnkeyedLiteral={}, XXX_sizecache=0"
time="2023-11-09T05:32:39Z" level=info msg="/csi.v1.Node/NodeGetCapabilities: REP 46694: Capabilities=[rpc:<type:EXPAND_VOLUME > rpc:<type:SINGLE_NODE_MULTI_WRITER > ], XXX_NoUnkeyedLiteral={}, XXX_sizecache=0"

@suryagupta4
Copy link

@deshab looks like the driver was installed a long time ago and the thing I want to see in the logs is refreshed. Can you please clean up the pod, pvc's and reinstall the driver and share the logs again? Please include output from these commands:

  • k logs vxflexos-node-dsvrs -n vxflexos -c driver | grep "connected" -A 5 -B 5
    This command will help us know whether the correct system id is fetched from the worker sdc or not. You can do this for both the node pods and share a txt file since the logs may be huge.

  • k describe nodes | grep Labels: -A 10 -B 10
    This command will help us know what are the existing labels/topology added to the nodes.

  • k describe csinodes
    This command will help us know the topologies added to the csinodes after the driver is installed.

@gallacher gallacher added area/csi-powerflex Issue pertains to the CSI Driver for Dell EMC PowerFlex and removed needs-triage Issue requires triage. labels Dec 6, 2023
@gallacher gallacher added this to the v1.9.0 milestone Dec 6, 2023
@gallacher
Copy link
Contributor

/sync

1 similar comment
@hoppea2
Copy link
Collaborator

hoppea2 commented Dec 6, 2023

/sync

@hoppea2
Copy link
Collaborator

hoppea2 commented Dec 6, 2023

link:19485

@deshab
Copy link
Author

deshab commented Dec 6, 2023

@suryagupta4 logs are not available for these commands
k logs vxflexos-node-zdmp8 -n kob-vxflexos -c driver | grep "connected" -A 5 -B 5
k logs vxflexos-node-pk92r -n kob-vxflexos -c driver | grep "connected" -A 5 -B 5

➜ k describe node wrk-89-14 | grep Labels: -A 10 -B 10
Name: wrk-89-14
Roles:
Labels: beta.kubernetes.io/arch=amd64
beta.kubernetes.io/os=linux
csi-vxflexos.dellemc.com/187e850d57b03e0f=csi-vxflexos.dellemc.com
kubernetes.io/arch=amd64
kubernetes.io/hostname=wrk-89-14
kubernetes.io/os=linux
route-reflector=
topology.kubernetes.io/zone=AZ2
Annotations: csi.volume.kubernetes.io/nodeid:
{"csi-vxflexos.dellemc.com":"4C4C4544-0036-4310-804C-C3C04F435733","csi.oneagent.dynatrace.com":"wrk-89-14"}
kubeadm.alpha.kubernetes.io/cri-socket: unix:///var/run/containerd/containerd.sock
⚡➜ k describe node wrk-89-15 | grep Labels: -A 10 -B 10
Name: wrk-89-15
Roles:
Labels: beta.kubernetes.io/arch=amd64
beta.kubernetes.io/os=linux
csi-vxflexos.dellemc.com/0000000000000000=csi-vxflexos.dellemc.com
csi-vxflexos.dellemc.com/187e850d57b03e0f=csi-vxflexos.dellemc.com
kubernetes.io/arch=amd64
kubernetes.io/hostname=wrk-89-15
kubernetes.io/os=linux
route-reflector=
topology.kubernetes.io/zone=AZ2
Annotations: csi.volume.kubernetes.io/nodeid:
{"csi-vxflexos.dellemc.com":"4C4C4544-0036-4310-804C-B7C04F435733","csi.oneagent.dynatrace.com":"wrk-89-15"}

@suryagupta4
Copy link

@deshab since you didn't provide the output of kubectl describe csinodes, I assume you are seeing correct systemID's and not all 0's in the topology keys. Coming to the label csi-vxflexos.dellemc.com/0000000000000000=csi-vxflexos.dellemc.com, it gets added when the sdc is not able to connect to the powerflex and couldn't fetch the correct system ID, in that case this label gets added to the worker node and it remains there. Next installation of the driver will not remove this label, it only adds the correct labels.
You can remove this label from the node and create pvc, pod again.
Command: k label node <worker-node-name> csi-vxflexos.dellemc.com/0000000000000000-

@suryagupta4
Copy link

suryagupta4 commented Dec 7, 2023

@bharathsreekanth any inputs here?

@suryagupta4 suryagupta4 self-assigned this Dec 7, 2023
@suryagupta4
Copy link

Had a customer call on this. The sdc was reporting the correct system id but since the driver node pod was not re-deployed, the topology key was still reporting all 0's, re-spinning the node pod added the correct topology key to the csi node.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/csi-powerflex Issue pertains to the CSI Driver for Dell EMC PowerFlex type/bug Something isn't working. This is the default label associated with a bug issue.
Projects
None yet
Development

No branches or pull requests

7 participants