Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Failed to get CSI client: driver name secrets-store.csi.k8s.io not found in the list of registered CSI drivers #798

Closed
luckerby opened this issue Feb 15, 2022 · 10 comments
Labels
bug Something isn't working stale

Comments

@luckerby
Copy link

What steps did you take and what happened:
We are having issues occasionally - once every few weeks - whereby pods on our single Windows node in an AKS cluster fail to start with MountVolume.SetUp failed for volume "secrets-store-inline" : kubernetes.io/csi: mounter.SetUpAt failed to get CSI client: driver name secrets-store.csi.k8s.io not found in the list of registered CSI drivers.

What did you expect to happen:

Anything else you would like to add:
Following the troubleshooting guide the secrets store driver is successfully registered:

C:\Users\mihai.albert>kubectl get csidriver
NAME                       ATTACHREQUIRED   PODINFOONMOUNT   STORAGECAPACITY   TOKENREQUESTS   REQUIRESREPUBLISH   MODES                  AGE
disk.csi.azure.com         true             false            false             <unset>         false               Persistent             125d
file.csi.azure.com         false            true             false             <unset>         false               Persistent,Ephemeral   125d
secrets-store.csi.k8s.io   false            true             false             <unset>         false               Ephemeral              177d

Logs for the 3 containers inside the CSI driver pod are below.

C:\Users\mihai.albert>kubectl logs -n csi secrets-store-csi-driver-windows-xqpbr -c node-driver-registrar
I0214 22:14:48.843834   13304 main.go:164] Version: v2.3.0-0-gde03348b
I0214 22:14:52.463764   13304 main.go:165] Running node-driver-registrar in mode=registration
I0214 22:14:52.463764   13304 main.go:189] Attempting to open a gRPC connection with: "unix://C:\\csi\\csi.sock"
I0214 22:14:52.463764   13304 connection.go:154] Connecting to unix://C:\csi\csi.sock
I0214 22:14:53.585020   13304 main.go:196] Calling CSI driver to discover driver name
I0214 22:14:53.585020   13304 connection.go:183] GRPC call: /csi.v1.Identity/GetPluginInfo
I0214 22:14:53.585020   13304 connection.go:184] GRPC request: {}
I0214 22:14:53.632158   13304 connection.go:186] GRPC response: {"name":"secrets-store.csi.k8s.io","vendor_version":"v0.2.0"}
I0214 22:14:53.632158   13304 connection.go:187] GRPC error: <nil>
I0214 22:14:53.632158   13304 main.go:206] CSI driver name: "secrets-store.csi.k8s.io"
I0214 22:14:53.632158   13304 node_register.go:52] Starting Registration Server at: /registration/secrets-store.csi.k8s.io-reg.sock
I0214 22:14:53.637605   13304 node_register.go:61] Registration Server started at: /registration/secrets-store.csi.k8s.io-reg.sock
I0214 22:14:54.195782   13304 node_register.go:91] Skipping healthz server because HTTP endpoint is set to: ""
C:\Users\mihai.albert>kubectl logs -n csi secrets-store-csi-driver-windows-xqpbr -c secrets-store
I0214 22:14:14.334801    7960 main.go:99] Filtered watch for nodePublishSecretRef secret based on secrets-store.csi.k8s.io/used=true label enabled
I0214 22:14:15.295312    7960 exporter.go:33] metrics backend: prometheus
I0214 22:14:18.889502    7960 main.go:172] starting manager
I0214 22:14:19.421932    7960 secrets-store.go:74] Driver: secrets-store.csi.k8s.io
I0214 22:14:19.421932    7960 secrets-store.go:75] Version: v0.2.0, BuildTime: 2021-08-12-19:06
I0214 22:14:19.421932    7960 secrets-store.go:76] Provider Volume Path: C:\k\secrets-store-csi-providers
I0214 22:14:19.421932    7960 secrets-store.go:77] GRPC supported providers will be dynamically created
I0214 22:14:19.443409    7960 driver.go:80] "Enabling controller service capability" capability="CREATE_DELETE_VOLUME"
I0214 22:14:19.443409    7960 driver.go:90] "Enabling volume access mode" mode="SINGLE_NODE_READER_ONLY"
I0214 22:14:19.443409    7960 driver.go:90] "Enabling volume access mode" mode="MULTI_NODE_READER_ONLY"
I0214 22:14:19.530648    7960 reconciler.go:131] starting rotation reconciler with poll interval: 2m0s
I0214 22:14:19.764279    7960 server.go:111] Listening for connections on address: C:\\csi\\csi.sock
I0214 22:14:26.148721    7960 request.go:668] Waited for 1.0556426s due to client-side throttling, not priority and fairness, request: GET:https://10.176.192.1:443/apis/templates.gatekeeper.sh/v1alpha1?timeout=32s
I0214 22:14:27.532110    7960 secretproviderclasspodstatus_controller.go:227] "reconcile started" spcps="<pod-name>"
<multiple reconcile messages show up here>
.................
C:\Users\mihai.albert>kubectl logs -n csi secrets-store-csi-driver-windows-xqpbr -c liveness-probe
E0214 22:14:26.306661    8212 processstarttime.go:43] Could not get process start time, could not read "/proc": CreateFile /proc: The system cannot find the file specified.
E0214 22:14:28.136104    8212 processstarttime.go:43] Could not get process start time, could not read "/proc": CreateFile /proc: The system cannot find the file specified.
I0214 22:14:28.966210    8212 main.go:149] calling CSI driver to discover driver name
I0214 22:14:28.974609    8212 main.go:155] CSI driver name: "secrets-store.csi.k8s.io"
I0214 22:14:28.974609    8212 main.go:183] ServeMux listening at "0.0.0.0:9808"

Logs for the Azure Key Vault provider pod:

C:\Users\mihai.albert>kubectl logs -n csi key-vault-provider-csi-secrets-store-provider-azure-window6pr7d
I0214 22:13:54.825807    5900 main.go:61] Starting Azure Key Vault Provider version: v0.2.0
I0214 22:13:59.012249    5900 main.go:72] construct pem chain feature enabled
I0214 22:13:59.012249    5900 main.go:75] secrets will be written to filesystem by the CSI driver
I0214 22:13:59.012249    5900 main.go:109] Listening for connections on address: C:\\provider\\azure.sock

Events for one of our pods that's using secrets from an Azure Key Vault is below. When the issue occurs, all the pods that need to be scheduled on Windows and need secrets injects from Azure KV are in the same state.

C:\Users\mihai.albert>kubectl get events -n dev --field-selector involvedObject.name=<redacted-pod-name>
LAST SEEN   TYPE      REASON        OBJECT                                 MESSAGE
49m         Normal    Scheduled     pod/<redacted-pod-name>   Successfully assigned dev/<redacted-pod-name> to akswinapp000004
18m         Warning   FailedMount   pod/<redacted-pod-name>   MountVolume.SetUp failed for volume "secrets-store-inline" : kubernetes.io/csi: mounter.SetUpAt failed to get CSI client: driver name secrets-store.csi.k8s.io not found in the list of registered CSI drivers
4m22s       Warning   FailedMount   pod/<redacted-pod-name>   Unable to attach or mount volumes: unmounted volumes=[secrets-store-inline], unattached volumes=[secrets-store-inline kube-api-access-4pfdp]: timed out waiting for the condition
40m         Warning   FailedMount   pod/<redacted-pod-name>   Unable to attach or mount volumes: unmounted volumes=[secrets-store-inline], unattached volumes=[kube-api-access-4pfdp secrets-store-inline]: timed out waiting for the condition

I did look into this and this issue but the problems there seem different.

Which access mode did you use to access the Azure Key Vault instance:
[e.g. Service Principal, Pod Identity, User Assigned Managed Identity, System Assigned Managed Identity]
Using the instructions here we can see the cluster is using the System Assigned Managed Identity:

C:\Users\mihai.albert>az aks show -g <resource-group> -n <cluster-name> --query "servicePrincipalProfile"
{
  "clientId": "msi"
}

C:\Users\mihai.albert>az aks show -g <resource-group> -n <cluster-name> --query "identity"
{
  "principalId": "dfcd54e3-5dd2-4651-9d56-31c8ff2e3a8a",
  "tenantId": "bce02b33-298d-4c84-8712-e25caaf73bda",
  "type": "SystemAssigned",
  "userAssignedIdentities": null
}

Environment:

  • Secrets Store CSI Driver version: (use the image tag): mcr.microsoft.com/oss/kubernetes-csi/csi-node-driver-registrar:v2.3.0
  • Azure Key Vault provider version: (use the image tag): mcr.microsoft.com/oss/azure/secrets-store/provider-azure:v0.2.0
  • Kubernetes version: (use kubectl version and kubectl get nodes -o wide): 1.21.2
  • Cluster type: (e.g. AKS, aks-engine, etc): AKS
@luckerby luckerby added the bug Something isn't working label Feb 15, 2022
@luckerby
Copy link
Author

Deleting the secrets store CSI driver pod doesn't have any impact on the problem. However, deploying the Helm chart and then deleting the secrets store CSI driver - and waiting for a new one to be automatically created - was found to be an effective temporary workaround.

The logs for the node-driver-registrar container look to slightly change once things start working again. There's extra text at the end referring to a "Kubelet registration probe":

C:\Users\mihai.albert>kubectl logs -n csi secrets-store-csi-driver-windows-p8lbk -c node-driver-registrar
I0215 05:11:00.213354   11600 main.go:164] Version: v2.3.0-0-gde03348b
I0215 05:11:05.455237   11600 main.go:165] Running node-driver-registrar in mode=registration
I0215 05:11:05.455237   11600 main.go:189] Attempting to open a gRPC connection with: "unix://C:\\csi\\csi.sock"
I0215 05:11:05.455237   11600 connection.go:154] Connecting to unix://C:\csi\csi.sock
I0215 05:11:07.765693   11600 main.go:196] Calling CSI driver to discover driver name
I0215 05:11:07.765693   11600 connection.go:183] GRPC call: /csi.v1.Identity/GetPluginInfo
I0215 05:11:07.765693   11600 connection.go:184] GRPC request: {}
I0215 05:11:08.330472   11600 connection.go:186] GRPC response: {"name":"secrets-store.csi.k8s.io","vendor_version":"v0.2.0"}
I0215 05:11:08.330472   11600 connection.go:187] GRPC error: <nil>
I0215 05:11:08.330472   11600 main.go:206] CSI driver name: "secrets-store.csi.k8s.io"
I0215 05:11:08.330472   11600 node_register.go:52] Starting Registration Server at: /registration/secrets-store.csi.k8s.io-reg.sock
I0215 05:11:08.332130   11600 node_register.go:61] Registration Server started at: /registration/secrets-store.csi.k8s.io-reg.sock
I0215 05:11:08.447802   11600 node_register.go:91] Skipping healthz server because HTTP endpoint is set to: ""
I0215 05:11:10.836387   11600 main.go:100] Received GetInfo call: &InfoRequest{}
I0215 05:11:10.838386   11600 main.go:107] "Kubelet registration probe created" path="C:\\var\\lib\\kubelet\\plugins\\csi-secrets-store\\registration"
I0215 05:11:11.689962   11600 main.go:100] Received GetInfo call: &InfoRequest{}
I0215 05:11:11.689962   11600 main.go:107] "Kubelet registration probe created" path="C:\\var\\lib\\kubelet\\plugins\\csi-secrets-store\\registration"
I0215 05:11:11.718140   11600 main.go:118] Received NotifyRegistrationStatus call: &RegistrationStatus{PluginRegistered:true,Error:,}

@nilekhc
Copy link
Contributor

nilekhc commented Feb 15, 2022

Hey @luckerby We are tracking an issue #639 for windows node. Basically node driver registrar on windows nodes is not working as expected.

Meanwhile could you do couple things:

  • Since you are on AKS are you using managed AKS add-on for the same? It's the preferred way for using Secret Store CSI driver on AKS cluster.
  • If not, could you upgrade the provider to stable v1.0.0 release?

@github-actions
Copy link

github-actions bot commented Mar 2, 2022

This issue is stale because it has been open 14 days with no activity. Please comment or this will be closed in 7 days.

@github-actions github-actions bot added the stale label Mar 2, 2022
@luckerby
Copy link
Author

luckerby commented Mar 7, 2022

@nilekhc We've looked at using the managed add-on, but in terms of actual code it appears to be the same thing we're running now.

@nilekhc
Copy link
Contributor

nilekhc commented Mar 7, 2022

Sure.. in terms of code they are same. But with managed add-on you get additional benefit like MS support, use of latest version etc.

Nonetheless, did you try running with latest version of driver?

Also issue kubernetes/kubernetes#104584 related to node driver registrar on windows is still being investigated in upstream k8s.

@luckerby
Copy link
Author

luckerby commented Mar 7, 2022

@nilekhc As for #639, the liveness probe workaround referenced there isn't working for us - as in there's nothing restarting the node-driver-registrar once it "stalls".

As for upgrading, I've noticed v1.1.0 is now available for the provider for a few days now. But given that the underlying kubernetes/kubernetes#104584 is with the Kubernetes repo - as you very well pointed out - wouldn't we have to wait for that to be fixed first? Or is there some workaround that made it in v1+ of the driver that would bypass that problem?

@nilekhc
Copy link
Contributor

nilekhc commented Mar 7, 2022

No. For now there is no workaround other than liveness probe.

@github-actions github-actions bot removed the stale label Mar 8, 2022
@github-actions
Copy link

This issue is stale because it has been open 14 days with no activity. Please comment or this will be closed in 7 days.

@github-actions github-actions bot added the stale label Mar 22, 2022
@luckerby
Copy link
Author

It turns out that the liveness probe wasn't defined as we were using the wrong Helm chart version as per kubernetes-sigs/secrets-store-csi-driver#911.

I'll close this as it doesn't add any value at this point.

@PradeepKrSi
Copy link

I faced the similar issue in linux nodes, but issue resolve after using the workaround suggested by luckerby #798 (comment). I think it must be in code to unregister the old driver and register the new one when we deploy the new version of the csi driver.

Warning FailedMount 13s (x6 over 28s) kubelet MountVolume.SetUp failed for volume "secrets-store-inline01" : kubernetes.io/csi: mounter.SetUpAt failed to get CSI client: driver name secrets-store.csi.k8s.io not found in the list of registered CSI drivers

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working stale
Projects
None yet
Development

No branches or pull requests

3 participants