-
Notifications
You must be signed in to change notification settings - Fork 107
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unresolvable Multi-Attach error, after resizing node pools. #111
Comments
I'm having a similar problems after resize (I had CSI 0.2.0 and I've updated to 0.3.1 manually with kubectl edit), as in #32 : running this commands give me no log :-/ PS |
Thanks folks for the reports. There were some couple of issues with attachment to dead nodes that were fixed with latest versions of CSI (which I've released yesterday as |
I have same problem with this helm https://github.com/helm/charts/tree/master/stable/traefik
|
@radek-baczynski, do you want to try deleting the running old pod, or setting your That's what I'm using with traefik. |
Did you try to patch the finalizers? e.g. and after delete them. I solved like this. |
I just tried to reproduce it and was able to see the issue. But this is not a CSI-Digitalocean problem unfortunately. The Driver never gets an attach/detach command and if you check the logs you'll see nothing. It seems like the CSI sub-system is not properly handling these kind of issues. I'm trying to check what we can do here or if there is anything I can do at least. These are the errors I see from the POD:
I'll comment more on this. |
change of deploymentStrategy to |
@radek-baczynski just tried with "Recreate" and seems like it's really working well. But I waited couple of minutes and the scheduler had time to resolve the issues:
I wonder if this is the same for |
Alright tested it with
The key is, it's not instantaneously. We need to wait until the reconciler in Kubernetes is aware of the imperfections and try to reconcile to a correct state. In my tests it reconciled in a good state in around 6 minutes. |
I'm on v0.4.0 and have the same scenario as feluxe. Here's how I got there:
This continues whether the volume is attached to the new node or not. How do I resolve this? I left it this way for about 10 hours and have tried scaling the statefulset down/up as well as deleting it. No change to the situation. Cluster info:
|
Thank you @Azuka and @radek-baczynski , it seems I could solve this for traefik (installed with helm) by changing strategy to |
Newer releases of the CSI plugin address a few bugs, most notably one about failure to detach due to a bug in the external-attacher side car. Please update to the latest possible version and file a new issue if the problem continues to happen. Thank you! |
What did you do?
I created a digitalocean (preview) cluster via the DO web interface. In the cluster I installed an app that uses an existing block storage volume with a
PV
/PVC
config similar to the one in the example pod-single-existing-volume.So far it worked fine.
Then I resized the node pools (delete current pool entirly, add new pool) using the DO web interface. Having done so, the node on which the app was running on and to which the volume was attached to, has been deleted. K8s automatically tried to start the pod on another node, but this isn't working. Pod creation fails with this error:
If I delete the app, everything (including
PV/PVC
) is removed from the cluster, like it should. Even the volume is listed as not being attached to any node in the DO web interface. But when I reinstall the app I keep getting the error, even though thePV
is shown as beingbound
:At this point I'm stuck. I can't find a way to attach the volume to the pod.
What did you expect to happen?
After the app (incl
PV/PVC
) was removed from the cluster and the volume was detached,csi-digitalocean
should be able to attach the volume back to a node.Steps to reproduce:
csi-digitalocean
seems to think it's still attached to the deleted node.Configuration:
v0.3.1
12.1
Digitalocean Kubernetes (Preview)
The text was updated successfully, but these errors were encountered: