Persistent volumes in stuck state after reboot #191

dustinmm80 · 2019-03-28T14:51:01Z

It appears there is a race condition when using persistent volumes, where the pod is deleted and the node is rebooted, but the attached volume is still in the process of detaching. Once this happens, the persistent volume is stuck and must be manually removed and recreated.

I'm seeing this on AWS with EBS volumes.

metalmatze · 2019-03-29T10:34:58Z

I'm not sure hat that relates to CLUO? Are you running its Pods with volumes?

dustinmm80 · 2019-04-02T15:50:58Z

No, not running CLUO with volumes. Is it possible that when the operator terminates the pods, it reboots before PV are properly detached?

embik · 2019-06-26T12:59:49Z

Hey @dustinmm80, are you by chance using the CSI implemention of EBS volumes?

I see something similar with the Cinder CSI driver and I suspect it to be related to VolumeAttachment resources or rather the fact that CSI components might not be fast enough to detach volumes before CLUO reboots the machine.

Just wanted to check in before investigating this.

embik · 2019-06-26T14:05:19Z

Thinking about it, I'm not sure it's only related to CSI. But it's probably a major issue with StatefulSets, because Kubernetes won't create a new statefulset-example-0 on another node before the old one finished deletion.

And only after scheduling the new StatefulSet pod to a new node the CSI components will start churning and update the VolumeAttachment, which will unmount the volume on the old node and mount it on the new node. But that process takes a few second and the old node is already being rebooted.

yannh · 2019-09-09T14:21:37Z

We had the same issue a few weeks ago - it turned out to be an issue with newer, Nitro instances (c5, t3, ...) - AWS investigated and claims to have now solved the problem. Were you seeing that problem with nitro instances? Are you still seeing the problem?

embik mentioned this issue Jun 26, 2019

update-agent: Added reboot-wait parameter #192

Open

invidian mentioned this issue Oct 30, 2020

Ensure update-agent waits for all volumes to be detached before rebooting flatcar/flatcar-linux-update-operator#30

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Persistent volumes in stuck state after reboot #191

Persistent volumes in stuck state after reboot #191

dustinmm80 commented Mar 28, 2019

metalmatze commented Mar 29, 2019

dustinmm80 commented Apr 2, 2019 •

edited

Loading

embik commented Jun 26, 2019

embik commented Jun 26, 2019

yannh commented Sep 9, 2019

Persistent volumes in stuck state after reboot #191

Persistent volumes in stuck state after reboot #191

Comments

dustinmm80 commented Mar 28, 2019

metalmatze commented Mar 29, 2019

dustinmm80 commented Apr 2, 2019 • edited Loading

embik commented Jun 26, 2019

embik commented Jun 26, 2019

yannh commented Sep 9, 2019

dustinmm80 commented Apr 2, 2019 •

edited

Loading