etcd Pending status (volume mount timed out) #280

garloff · 2022-09-18T10:20:34Z

I have seen the etcd container in a Pending status for extended amounts of time (hours) and the event log indicating that the two volumes could not be mounted. (Which is ridiculous for

  etcd-certs:
    Type:          HostPath (bare host directory volume)
    Path:          /etc/kubernetes/pki/etcd
    HostPathType:  DirectoryOrCreate

(and etcd-data which is /var/lib/etcd)).
It does only happen occasionally; if we're lucky, the etcd container does fail a liveness probe and gets restarted and works just fine then. But we can't rely on it.
I'll look into reproducing this, so I can insert proper logs here.

The text was updated successfully, but these errors were encountered:

garloff · 2022-09-18T11:43:56Z

Here's what I see in the error case:

kubectl describe -n kube-system etcd-$PREFIX-${CLUSTER_NAME}-control-plane-genc01-XXXX
[...]
  Type     Reason       Age   From     Message
  ----     ------       ----  ----     -------
  Warning  FailedMount  65s   kubelet  Unable to attach or mount volumes: unmounted volumes=[etcd-data etcd-certs], unattached volumes=[etcd-data etcd-certs]: timed out waiting for the condition

garloff · 2022-09-18T11:59:23Z

Restarting the kubelet after kubeadm init avoids the issue, though this feels more like a workaround than a real fix..
If we tweak the heartbeat intervals on the other hand, this is even required to make the parameter take effect, so in some way it's the right thing to do.

We need to make sure the new parameters take effect. We do it unconditionally for now; as a side-effect, it does also take care of issue #280, where the etcd pod ridiculously remains in pending due to an allegedly failed mount (on a DirectoryOrCreate HostPath!). Signed-off-by: Kurt Garloff <kurt@garloff.de>

garloff · 2022-09-20T15:41:24Z

We no longer tweak etcd's manifest after it has (potentially) started thanks to merging #282, so there is no real reason for restarting the kubelet any longer. We still have the restart however, as workaround for this issue.
TODO:

Is it really still required?
If so, debug the reasoning and search for real fix, likely upstream somewhere ....

garloff mentioned this issue Sep 18, 2022

Restart kubelet after tweaking etcd heartbeat. #281

Closed

garloff added bug Something isn't working Container Issues or pull requests relevant for Team 2: Container Infra and Tooling labels Sep 18, 2022

tibeer mentioned this issue Mar 29, 2023

Stale issues SovereignCloudStack/issues#11

Open

jschoone added the on hold Is on hold label Oct 10, 2023

jschoone closed this as completed Oct 10, 2023

jschoone reopened this Oct 11, 2023

jschoone closed this as not planned Won't fix, can't repro, duplicate, stale Oct 11, 2023

jschoone added the Sprint Montreal Sprint Montreal (2023, cwk 40+41) label Feb 28, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

etcd Pending status (volume mount timed out) #280

etcd Pending status (volume mount timed out) #280

garloff commented Sep 18, 2022

garloff commented Sep 18, 2022 •

edited

Loading

garloff commented Sep 18, 2022

garloff commented Sep 20, 2022

etcd Pending status (volume mount timed out) #280

etcd Pending status (volume mount timed out) #280

Comments

garloff commented Sep 18, 2022

garloff commented Sep 18, 2022 • edited Loading

garloff commented Sep 18, 2022

garloff commented Sep 20, 2022

garloff commented Sep 18, 2022 •

edited

Loading