Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG]: PowerFlex driver fails to start on RKE #1086

Closed
suryagupta4 opened this issue Dec 20, 2023 · 6 comments
Closed

[BUG]: PowerFlex driver fails to start on RKE #1086

suryagupta4 opened this issue Dec 20, 2023 · 6 comments
Assignees
Labels
area/csi-powerflex Issue pertains to the CSI Driver for Dell EMC PowerFlex type/bug Something isn't working. This is the default label associated with a bug issue.
Milestone

Comments

@suryagupta4
Copy link

suryagupta4 commented Dec 20, 2023

Bug Description

When PowerFlex driver is deployed on an environment having different hostname and kubernetes node names, registrar sidecar in the driver node pod is crashing with error Unable to fetch the node labels. Error: nodes \"ip-10-x-x-x.ec2.internal\" not found" and eventually ends up with error 401 Unauthorized after multiple restarts.

Logs

I1211 13:30:09.455665       1 main.go:167] Version: v2.8.0
I1211 13:30:09.455763       1 main.go:168] Running node-driver-registrar in mode=registration
I1211 13:30:09.456406       1 main.go:192] Attempting to open a gRPC connection with: "/csi/csi_sock"
I1211 13:30:09.456464       1 connection.go:164] Connecting to unix:///csi/csi_sock
I1211 13:30:09.457431       1 main.go:199] Calling CSI driver to discover driver name
I1211 13:30:09.457455       1 connection.go:193] GRPC call: /csi.v1.Identity/GetPluginInfo
I1211 13:30:09.457461       1 connection.go:194] GRPC request: {}
I1211 13:30:09.468532       1 connection.go:200] GRPC response: {"manifest":{"commit":"4a8a0ab90c4d56ca0ed00a8dd0aabdd7b526423c","formed":"Wed, 20 Sep 2023 06:15:43 UTC","semver":"2.8.0+dirty","url":"http://github.com/dell/csi-vxflexos"},"name":"csi-vxflexos.dellemc.com","vendor_version":"2.8.0+dirty"}
I1211 13:30:09.468549       1 connection.go:201] GRPC error: <nil>
I1211 13:30:09.468561       1 main.go:209] CSI driver name: "csi-vxflexos.dellemc.com"
I1211 13:30:09.468642       1 node_register.go:53] Starting Registration Server at: /registration/csi-vxflexos.dellemc.com-reg.sock
I1211 13:30:09.468798       1 node_register.go:62] Registration Server started at: /registration/csi-vxflexos.dellemc.com-reg.sock
I1211 13:30:09.468990       1 node_register.go:92] Skipping HTTP server because endpoint is set to: ""
I1211 13:30:11.346796       1 main.go:102] Received GetInfo call: &InfoRequest{}
I1211 13:30:11.347079       1 main.go:109] "Kubelet registration probe created" path="/var/lib/kubelet/plugins/vxflexos.emc.dell.com/registration"
I1211 13:30:11.371780       1 main.go:121] Received NotifyRegistrationStatus call: &RegistrationStatus{PluginRegistered:false,Error:RegisterPlugin error -- plugin registration failed with err: rpc error: code = Unknown desc = 401 Unauthorized,}
E1211 13:30:11.371805       1 main.go:123] Registration process failed with error: RegisterPlugin error -- plugin registration failed with err: rpc error: code = Unknown desc = 401 Unauthorized, restarting registration container.

Screenshots

No response

Additional Environment Information

No response

Steps to Reproduce

Deploy PowerFlex v2.8.0 driver with an environment having different hostnames and kubernetes node names.

Expected Behavior

Driver node pod should be in Running state.

CSM Driver(s)

CSI Driver for PowerFlex v2.8.0

Installation Type

No response

Container Storage Modules Enabled

No response

Container Orchestrator

RKE

Operating System

RHEL

@suryagupta4 suryagupta4 added type/bug Something isn't working. This is the default label associated with a bug issue. area/csi-powerflex Issue pertains to the CSI Driver for Dell EMC PowerFlex labels Dec 20, 2023
@suryagupta4 suryagupta4 added this to the v1.10.0 milestone Dec 20, 2023
@suryagupta4 suryagupta4 self-assigned this Dec 20, 2023
@suryagupta4
Copy link
Author

link: 19702

@thecloudgarage
Copy link

I am still having this issue. No problems till I used v2.7.0 version., however issue crops up when I used 2.9.1 and 2.9.2

NOTE: I have no problems when I deploy the CSI with 2.7.0. It works flawlessly!

kubectl get pods -n vxflexos
NAME                                  READY   STATUS             RESTARTS         AGE
vxflexos-controller-c9999ff98-5t8ff   5/5     Running            0                27m
vxflexos-controller-c9999ff98-xgnkt   5/5     Running            0                27m
vxflexos-node-2mcpz                   1/2     CrashLoopBackOff   10 (32s ago)     27m
vxflexos-node-zgvw8                   1/2     Error              10 (5m28s ago)   27m

LAST LINE OF THE ERROR LOG SAYS UNABLE TO FETCH NODE LABELS

time="2024-03-02T05:22:40Z" level=info msg="/csi.v1.Node/NodeGetInfo: REQ 0006: XXX_NoUnkeyedLiteral={}, XXX_sizecache=0"
time="2024-03-02T05:22:40Z" level=info msg="Probing all arrays. Number of arrays: 1"
time="2024-03-02T05:22:40Z" level=info msg="default array is set to array ID: d7f6c6427c56ab0f"
time="2024-03-02T05:22:40Z" level=info msg="d7f6c6427c56ab0f is the default array, skipping VolumePrefixToSystems map update. \n"
time="2024-03-02T05:22:40Z" level=info msg="array d7f6c6427c56ab0f probed successfully"
time="2024-03-02T05:22:40Z" level=info msg="configured d7f6c6427c56ab0f" allSystemNames= endpoint="https://10.204.111.71" isDefault=true nasName=0xc00028b570 password="********" skipCertificateValidation=true systemID=d7f6c6427c56ab0f user=admin
time="2024-03-02T05:22:40Z" level=info msg="/csi.v1.Node/NodeGetInfo: REP 0006: rpc error: code = Internal desc = Unable to fetch the node labels. Error: nodes \"ip-172-26-2-141\" not found"

I am seeing in the thread a particular ENV variable setting X_CSI_POWERFLEX_KUBE_NODE_NAME has been done as a fix. I am not sure, where is this value to be supplied as an end user. I do not see this setting anywhere in the values template documentation https://dell.github.io/csm-docs/docs/csidriver/installation/helm/powerflex/

Kindly suggest what should be done.

@thecloudgarage
Copy link

Also adding the kubectl get nodes output
kubectl get nodes
NAME STATUS ROLES AGE VERSION
ip-172-26-2-141.eu-west-1.compute.internal Ready 28h v1.27.6
ip-172-26-2-42.eu-west-1.compute.internal Ready 28h v1.27.6

@adarsh-dell
Copy link
Contributor

Hi @thecloudgarage ,

According to the CSM 1.10 milestone associated with this issue, the resolution will be implemented in the v2.10.0 driver and subsequent releases of the csi-powerflex driver. Appreciate your comprehensive insights.

Regards,
Adarsh

@wallnerryan
Copy link

I am running into this as well. Also had no issue with 2.7.0, wondering if there are workaround for making this work as 2.7.0 doesnt seem like an option within https://github.com/dell/csm-operator/tree/v1.5.0/samples

@suryagupta4
Copy link
Author

I am running into this as well. Also had no issue with 2.7.0, wondering if there are workaround for making this work as 2.7.0 doesnt seem like an option within https://github.com/dell/csm-operator/tree/v1.5.0/samples

Hi, there's no workaround that would work here, please switch to v2.10.0 driver. Thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/csi-powerflex Issue pertains to the CSI Driver for Dell EMC PowerFlex type/bug Something isn't working. This is the default label associated with a bug issue.
Projects
None yet
Development

No branches or pull requests

7 participants