You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Describe the bug
After updating the tridentprovisioner and add silenceAutosupport: true the configured trident-csi-token went missing. It's unclear how this happened. The trident-csi pods kept running, complaining about the missing secret:
2m6s Warning FailedMount pod/trident-csi-74d6n MountVolume.SetUp failed for volume "trident-csi-token-qkdm2" : secret "trident-csi-token-qkdm2" not found
The token was not present any more, but there were 2 other tokens present. 1 unused.
Turned out that trident-csi-token-29c5f was the "correct" one used after killing the pods later.
To Reproduce
Unsure. Possible causes: Deletion of the secret or deployment. See slack discussion.
Expected behavior
In general: The operator should be able to handle and correct such events.
Suggested in Slack:
Operator should re-create the daemonset as well as the deployment pods if the service account is re-created.
Or in each reconcile loop operator should verify that the secrets in the pods associated with the daemonset and the deployment are correct and matches the service account secrets.
The text was updated successfully, but these errors were encountered:
Just want to add more context here:
If a service account is re-created, its tokens are not automatically updated on an already created pod. So, in an event of a re-creation of a service account, the service account token in never refreshed on the Trident pods, which can lead to the above issue.
I am not sure what lead to the above issue in the customer environment where service account token was not updated on the daemonset but I was able to re-create this issue by deleting the service account, and the operator re-created the service account as part of its auto-heal functionality but did not update the deployment or the daemonset pods.
As part of each reconcile loop, the operator should also start recognizing if the service account secrets do not match the secrets of the daemonset or the deployment pods and act on it.
Describe the bug
After updating the tridentprovisioner and add silenceAutosupport: true the configured trident-csi-token went missing. It's unclear how this happened. The trident-csi pods kept running, complaining about the missing secret:
The token was not present any more, but there were 2 other tokens present. 1 unused.
Turned out that trident-csi-token-29c5f was the "correct" one used after killing the pods later.
See discussion in Netapp slack https://netapppub.slack.com/archives/C1E3QH84C/p1599221156108100
Trident operation was prevented because communication of trident main with the csi pods was not possible any more.
Manual workaround: Deleting all daemonset pods. They were recreated with the correct secret.
Environment
To Reproduce
Unsure. Possible causes: Deletion of the secret or deployment. See slack discussion.
Expected behavior
In general: The operator should be able to handle and correct such events.
Suggested in Slack:
Operator should re-create the daemonset as well as the deployment pods if the service account is re-created.
Or in each reconcile loop operator should verify that the secrets in the pods associated with the daemonset and the deployment are correct and matches the service account secrets.
The text was updated successfully, but these errors were encountered: