-
Notifications
You must be signed in to change notification settings - Fork 218
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
trident-csi pods stuck in ContainerCreating after node reboots #585
Comments
Hello @gorantornqvist Thanks for reporting this issue. To give you some background, the secret token In your case, I am trying to understand:
Please let us know. Thank you! |
Reincarnation of #444 ? |
@gorantornqvist, can you provide more information based on @ntap-arorar's comments? |
Hi, So I guess this could be hard to troubleshoot. I am OK with closing this and if it occurrs again I will gather all logs from the trident operator ... |
@gorantornqvist, thanks for the feedback. We will reopen this issue if you encounter the problem again. |
Hi, Attaching operator logs from one of the clusters |
@gorantornqvist, we looked at the provided logs and it seems that the Cluster was already in a bad state. Our team hasn't been able to reproduce this issue yet. Please let us know if you are still concerned about this issue. |
@gorantornqvist, where you able to resolve your issue? |
We havent encountered this problem again so this issue can be closed :) |
Thanks, for the update! |
Describe the bug
multiple trident-csi pods are stuck in ContainerCreating after node reboots with the error:
Generated from kubelet on node:
2 times in the last 3 minutes
(combined from similar events): Unable to attach or mount volumes: unmounted volumes=[trident-csi-token-z59nn], unattached volumes=[pods-mount-dir dev-dir host-dir trident-tracking-dir plugin-dir sys-dir certs trident-csi-token-z59nn plugins-mount-dir registration-dir]: timed out waiting for the condition
Generated from kubelet on node
19 times in the last 23 minutes
MountVolume.SetUp failed for volume "trident-csi-token-z59nn" : secret "trident-csi-token-z59nn" not found
If I delete the pod a new trident-csi pod is created and started ok but without manual intervention the original pod hangs forever and other pods on that node that use trident persistent storage fails to start.
I also noted that when it hangs it references a secret trident-csi-token-z59nn that doesnt exist and after I manually delete the pod and it starts up that pod references another secret that actually exists.
Environment
Openshift 4.7.13
To Reproduce
Steps to reproduce the behavior:
Reboot node
Expected behavior
The trident-csi pod to start by using the correct secret
Additional context
Add any other context about the problem here.
The text was updated successfully, but these errors were encountered: