trident-csi pods stuck in ContainerCreating after node reboots #585

gorantornqvist · 2021-06-11T13:19:53Z

Describe the bug
multiple trident-csi pods are stuck in ContainerCreating after node reboots with the error:
Generated from kubelet on node:
2 times in the last 3 minutes
(combined from similar events): Unable to attach or mount volumes: unmounted volumes=[trident-csi-token-z59nn], unattached volumes=[pods-mount-dir dev-dir host-dir trident-tracking-dir plugin-dir sys-dir certs trident-csi-token-z59nn plugins-mount-dir registration-dir]: timed out waiting for the condition

Generated from kubelet on node
19 times in the last 23 minutes
MountVolume.SetUp failed for volume "trident-csi-token-z59nn" : secret "trident-csi-token-z59nn" not found

If I delete the pod a new trident-csi pod is created and started ok but without manual intervention the original pod hangs forever and other pods on that node that use trident persistent storage fails to start.

I also noted that when it hangs it references a secret trident-csi-token-z59nn that doesnt exist and after I manually delete the pod and it starts up that pod references another secret that actually exists.

Environment
Openshift 4.7.13

Trident version: 21.04.0
Trident installation flags used: default install using helm
Kubernetes version: v1.20.0+df9c838
Kubernetes orchestrator: Openshift 4.7.13
NetApp backend types: ONTAP-NAS

To Reproduce
Steps to reproduce the behavior:
Reboot node

Expected behavior
The trident-csi pod to start by using the correct secret

Additional context
Add any other context about the problem here.

rohit-arora-dev · 2021-06-11T15:11:30Z

Hello @gorantornqvist

Thanks for reporting this issue. To give you some background, the secret token trident-csi-token-z59nn is created when Trident creates a service account name trident-csi. Trident deployment and daemonset pods use the service account token for API authentication.
The behaviour that exists in Kubernetes is that if a service account is re-created the corresponding token is refreshed but the pods using the old token are not automatically updated. So, what Trident does is automatically re-creates Trident deployment and the daemonset pods on service account recreation.

In your case, I am trying to understand:

If the service account trident-csi was re-created?
a. If yes, was it before node reboot, during node reboot or after the node reboot?
b. If not, can you consistently reproduce the behaviour and does it involves just rebooting the Kubernetes node?
The Trident operator logs may also be useful in getting some insights, if you can share them here or on Slack or via Support case, that would help as well. Using kubectl -n <trident installation namespace> logs <trident_operator_pod>.

Please let us know.

Thank you!

megabreit · 2021-06-11T23:21:32Z

Reincarnation of #444 ?

gnarl · 2021-06-14T14:39:46Z

@gorantornqvist, can you provide more information based on @ntap-arorar's comments?

gorantornqvist · 2021-06-15T12:13:08Z

Hi,
Nothing was really done with the trident configuration before this.
I actually encountered the same issue on 2 different clusters but after the restart of the pods the issue was resolved.
I tried restarting each node in these 2 cluster and the issue didnt occur again - it cant be reproduced.

So I guess this could be hard to troubleshoot.

I am OK with closing this and if it occurrs again I will gather all logs from the trident operator ...

gnarl · 2021-06-21T14:47:39Z

@gorantornqvist, thanks for the feedback. We will reopen this issue if you encounter the problem again.

gorantornqvist · 2021-07-20T11:18:03Z

Hi,
We encountered this issue again today when updating 2 different openshift clusters.
If I deleted a trident-csi pod it started working (no need for a operator pod restart)

Attaching operator logs from one of the clusters

trident-operator-86c5b968cb-gz6p9.log

gnarl · 2021-10-22T19:14:23Z

@gorantornqvist, we looked at the provided logs and it seems that the Cluster was already in a bad state. Our team hasn't been able to reproduce this issue yet. Please let us know if you are still concerned about this issue.

gnarl · 2023-02-21T00:44:31Z

@gorantornqvist, where you able to resolve your issue?

gorantornqvist · 2023-02-22T07:01:20Z

We havent encountered this problem again so this issue can be closed :)

gnarl · 2023-02-22T19:44:32Z

Thanks, for the update!

gorantornqvist added the bug label Jun 11, 2021

gnarl closed this as completed Jun 21, 2021

gnarl reopened this Jul 20, 2021

gorantornqvist closed this as completed Feb 22, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

trident-csi pods stuck in ContainerCreating after node reboots #585

trident-csi pods stuck in ContainerCreating after node reboots #585

gorantornqvist commented Jun 11, 2021

rohit-arora-dev commented Jun 11, 2021 •

edited

Loading

megabreit commented Jun 11, 2021

gnarl commented Jun 14, 2021

gorantornqvist commented Jun 15, 2021

gnarl commented Jun 21, 2021

gorantornqvist commented Jul 20, 2021

gnarl commented Oct 22, 2021

gnarl commented Feb 21, 2023

gorantornqvist commented Feb 22, 2023

gnarl commented Feb 22, 2023

trident-csi pods stuck in ContainerCreating after node reboots #585

trident-csi pods stuck in ContainerCreating after node reboots #585

Comments

gorantornqvist commented Jun 11, 2021

rohit-arora-dev commented Jun 11, 2021 • edited Loading

megabreit commented Jun 11, 2021

gnarl commented Jun 14, 2021

gorantornqvist commented Jun 15, 2021

gnarl commented Jun 21, 2021

gorantornqvist commented Jul 20, 2021

gnarl commented Oct 22, 2021

gnarl commented Feb 21, 2023

gorantornqvist commented Feb 22, 2023

gnarl commented Feb 22, 2023

rohit-arora-dev commented Jun 11, 2021 •

edited

Loading