Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FCOS doesn't send PrepareForShutdown to Inhibitor-Services #821

Closed
betermieux opened this issue May 7, 2021 · 6 comments
Closed

FCOS doesn't send PrepareForShutdown to Inhibitor-Services #821

betermieux opened this issue May 7, 2021 · 6 comments
Labels

Comments

@betermieux
Copy link

Hello,

I am trying the new graceful node shutdown from kubelet 1.21 to drain pods from rebooting fcos nodes. It doesn't work as expected, and while debugging the problem, I could isolate it to a missing PrepareForShutdown dbus event, which is never sent by logind. I have tested a proof of concept inhibitor https://trstringer.com/systemd-inhibitor-locks/ which doesn't receive the event either.

I am out of ideas, I have never really debugged dbus events. Any ideas/pointers?

Executing systemd-inhibit --list

WHO            UID USER PID   COMM           WHAT     WHY                                        MODE
Inhibitor Test 0   root 51742 inhibit        shutdown Testing systemd inhibitors from Go         delay
NetworkManager 0   root 713   NetworkManager sleep    NetworkManager needs to turn off networks  delay
kubelet        0   root 32385 kubelet        shutdown Kubelet needs time to handle node shutdown delay

Watching docker run --rm -ti -v /var/run/dbus/:/var/run/dbus/ --name dbus cmd.cat/dbus-monitor dbus-monitor --system and executing shutdown --reboot +1

…
method call time=1620380639.896423 sender=:1.147 -> destination=org.freedesktop.login1 serial=2 path=/org/freedesktop/login1; interface=org.freedesktop.login1.Manager; member=Inhibit
   string "shutdown"
   string "Inhibitor Test"
   string "Testing systemd inhibitors from Go"
   string "delay"
…
method call time=1620380639.900193 sender=:1.147 -> destination=org.freedesktop.DBus serial=3 path=/org/freedesktop/DBus; interface=org.freedesktop.DBus; member=AddMatch
   string "type='signal',interface='org.freedesktop.login1.Manager',path='/org/freedesktop/login1',member='PrepareForShutdown'"
…
method call time=1620380686.669346 sender=:1.148 -> destination=org.freedesktop.login1 serial=3 path=/org/freedesktop/login1; interface=org.freedesktop.login1.Manager; member=ScheduleShutdown
   string "reboot"
   uint64 1620380746666218
…

Regards,
Stefan

@jlebon
Copy link
Member

jlebon commented May 7, 2021

This might be a better question for the systemd mailing list or IRC channel. It's unlikely to be a Fedora CoreOS specific issue (or do you have reason to believe that's the case?).

@betermieux
Copy link
Author

Well, it seems to work in other distros, so it might be an issue with FCOS. I will try Ubuntu and Fedora Server and will update this issue with my findings.

@betermieux
Copy link
Author

OK, I tested the proof of concept inhibitor on Fedora Core OS 33, Fedora Server 33 and Ubuntu 21.04. PrepareForShutdown ist sent by Ubuntu, but not by the Fedora variants. Do you know any upstream Fedora-Repo where I could post the issue?

Fedora Server 33

May 10 02:12:58 localhost.localdomain inhibit[795]: Starting dbus example
May 10 02:12:58 localhost.localdomain inhibit[795]: Inhibitor file descriptor: 7
May 10 02:12:58 localhost.localdomain inhibit[795]: Waiting for shutdown signal
May 10 02:13:08 localhost.localdomain systemd[1]: Stopping Inhibitor test...
May 10 02:13:08 localhost.localdomain systemd[1]: inhibit.service: Succeeded.
May 10 02:13:08 localhost.localdomain systemd[1]: Stopped Inhibitor test.

Fedora CoreOS 33

May 10 06:06:47 bwcloud4 inhibit[8708]: Starting dbus example
May 10 06:06:47 bwcloud4 inhibit[8708]: Inhibitor file descriptor: 7
May 10 06:06:47 bwcloud4 inhibit[8708]: Waiting for shutdown signal
May 10 06:07:05 bwcloud4 systemd[1]: Stopping Inhibitor test...
May 10 06:07:06 bwcloud4 systemd[1]: inhibit.service: Deactivated successfully.
May 10 06:07:06 bwcloud4 systemd[1]: Stopped Inhibitor test.

Ubuntu Server 21.04

May 07 18:53:45 osboxes inhibit[3001]: Starting dbus example
May 07 18:53:45 osboxes inhibit[3001]: Inhibitor file descriptor: 7
May 07 18:53:45 osboxes inhibit[3001]: Waiting for shutdown signal
May 07 18:57:01 osboxes inhibit[3001]: Signal: &{:1.7 /org/freedesktop/login1 org.freedesktop.login1.Manager.PrepareForShutdown [true] 5}
May 07 18:57:01 osboxes inhibit[3001]: Closing file descriptor
May 07 18:57:01 osboxes systemd[1]: Stopping Inhibitor test...
May 07 18:57:01 osboxes systemd[1]: inhibit.service: Succeeded.
May 07 18:57:01 osboxes systemd[1]: Stopped Inhibitor test.

@lucab
Copy link
Contributor

lucab commented May 10, 2021

@betermieux
Copy link
Author

OK, thanks, we can probably close the issue here.

@jlebon
Copy link
Member

jlebon commented May 10, 2021

Yeah, a new Bugzilla ticket against the systemd component would bring it to the right folks. :)

dghubble added a commit to poseidon/typhoon that referenced this issue Aug 28, 2022
* Configure Kubelet Graceful Node Shutdown to detect system shutdown
events and stop running containers gracefully when possible
* Allow up to 30s for critical pods to gracefully shutdown
* Allow up to 15s for regular pods to gracefully shutdown
* Node will be marked as NotReady promptly, instead of having to
wait for health checks
* Kubelet uses systemd inhibitor locks to delay shutdown for a limited
number of seconds
* Raise the default max inhibitor time from 5s to 45s

Verify systemd inhibitor locks are present:

```
sudo systemd-inhibit --list
WHO     UID USER PID  COMM    WHAT     WHY                                        MODE
kubelet 0   root 4581 kubelet shutdown Kubelet needs time to handle node shutdown delay
```

Tail journal logs and then shutdown a node via systemctl reboot
or via the cloud console to watch container shutdown

Rel:

* https://kubernetes.io/blog/2021/04/21/graceful-node-shutdown-beta/
* https://kubernetes.io/docs/reference/config-api/kubelet-config.v1beta1/
* kubernetes/kubernetes#107043
* coreos/fedora-coreos-tracker#821
* https://www.freedesktop.org/software/systemd/man/systemd-inhibit.html
* https://github.com/kubernetes/kubernetes/blob/release-1.24/pkg/kubelet/nodeshutdown/nodeshutdown_manager_linux.go
* https://github.com/godbus/dbus/blob/master/conn.go
dghubble added a commit to poseidon/typhoon that referenced this issue Aug 28, 2022
* Configure Kubelet Graceful Node Shutdown to detect system shutdown
events and stop running containers gracefully when possible
* Allow up to 30s for critical pods to gracefully shutdown
* Allow up to 15s for regular pods to gracefully shutdown
* Node will be marked as NotReady promptly, instead of having to
wait for health checks
* Kubelet uses systemd inhibitor locks to delay shutdown for a limited
number of seconds
* Raise the default max inhibitor time from 5s to 45s

Verify systemd inhibitor locks are present:

```
sudo systemd-inhibit --list
WHO     UID USER PID  COMM    WHAT     WHY                                        MODE
kubelet 0   root 4581 kubelet shutdown Kubelet needs time to handle node shutdown delay
```

Tail journal logs and then shutdown a node via systemctl reboot
or via the cloud console to watch container shutdown

Rel:

* https://kubernetes.io/blog/2021/04/21/graceful-node-shutdown-beta/
* https://kubernetes.io/docs/reference/config-api/kubelet-config.v1beta1/
* kubernetes/kubernetes#107043
* coreos/fedora-coreos-tracker#821
* https://www.freedesktop.org/software/systemd/man/systemd-inhibit.html
* https://github.com/kubernetes/kubernetes/blob/release-1.24/pkg/kubelet/nodeshutdown/nodeshutdown_manager_linux.go
* https://github.com/godbus/dbus/blob/master/conn.go
dghubble-robot pushed a commit to poseidon/terraform-onprem-kubernetes that referenced this issue Aug 28, 2022
* Configure Kubelet Graceful Node Shutdown to detect system shutdown
events and stop running containers gracefully when possible
* Allow up to 30s for critical pods to gracefully shutdown
* Allow up to 15s for regular pods to gracefully shutdown
* Node will be marked as NotReady promptly, instead of having to
wait for health checks
* Kubelet uses systemd inhibitor locks to delay shutdown for a limited
number of seconds
* Raise the default max inhibitor time from 5s to 45s

Verify systemd inhibitor locks are present:

```
sudo systemd-inhibit --list
WHO     UID USER PID  COMM    WHAT     WHY                                        MODE
kubelet 0   root 4581 kubelet shutdown Kubelet needs time to handle node shutdown delay
```

Tail journal logs and then shutdown a node via systemctl reboot
or via the cloud console to watch container shutdown

Rel:

* https://kubernetes.io/blog/2021/04/21/graceful-node-shutdown-beta/
* https://kubernetes.io/docs/reference/config-api/kubelet-config.v1beta1/
* kubernetes/kubernetes#107043
* coreos/fedora-coreos-tracker#821
* https://www.freedesktop.org/software/systemd/man/systemd-inhibit.html
* https://github.com/kubernetes/kubernetes/blob/release-1.24/pkg/kubelet/nodeshutdown/nodeshutdown_manager_linux.go
* https://github.com/godbus/dbus/blob/master/conn.go
dghubble-robot pushed a commit to poseidon/terraform-google-kubernetes that referenced this issue Aug 28, 2022
* Configure Kubelet Graceful Node Shutdown to detect system shutdown
events and stop running containers gracefully when possible
* Allow up to 30s for critical pods to gracefully shutdown
* Allow up to 15s for regular pods to gracefully shutdown
* Node will be marked as NotReady promptly, instead of having to
wait for health checks
* Kubelet uses systemd inhibitor locks to delay shutdown for a limited
number of seconds
* Raise the default max inhibitor time from 5s to 45s

Verify systemd inhibitor locks are present:

```
sudo systemd-inhibit --list
WHO     UID USER PID  COMM    WHAT     WHY                                        MODE
kubelet 0   root 4581 kubelet shutdown Kubelet needs time to handle node shutdown delay
```

Tail journal logs and then shutdown a node via systemctl reboot
or via the cloud console to watch container shutdown

Rel:

* https://kubernetes.io/blog/2021/04/21/graceful-node-shutdown-beta/
* https://kubernetes.io/docs/reference/config-api/kubelet-config.v1beta1/
* kubernetes/kubernetes#107043
* coreos/fedora-coreos-tracker#821
* https://www.freedesktop.org/software/systemd/man/systemd-inhibit.html
* https://github.com/kubernetes/kubernetes/blob/release-1.24/pkg/kubelet/nodeshutdown/nodeshutdown_manager_linux.go
* https://github.com/godbus/dbus/blob/master/conn.go
dghubble-robot pushed a commit to poseidon/terraform-azure-kubernetes that referenced this issue Aug 28, 2022
* Configure Kubelet Graceful Node Shutdown to detect system shutdown
events and stop running containers gracefully when possible
* Allow up to 30s for critical pods to gracefully shutdown
* Allow up to 15s for regular pods to gracefully shutdown
* Node will be marked as NotReady promptly, instead of having to
wait for health checks
* Kubelet uses systemd inhibitor locks to delay shutdown for a limited
number of seconds
* Raise the default max inhibitor time from 5s to 45s

Verify systemd inhibitor locks are present:

```
sudo systemd-inhibit --list
WHO     UID USER PID  COMM    WHAT     WHY                                        MODE
kubelet 0   root 4581 kubelet shutdown Kubelet needs time to handle node shutdown delay
```

Tail journal logs and then shutdown a node via systemctl reboot
or via the cloud console to watch container shutdown

Rel:

* https://kubernetes.io/blog/2021/04/21/graceful-node-shutdown-beta/
* https://kubernetes.io/docs/reference/config-api/kubelet-config.v1beta1/
* kubernetes/kubernetes#107043
* coreos/fedora-coreos-tracker#821
* https://www.freedesktop.org/software/systemd/man/systemd-inhibit.html
* https://github.com/kubernetes/kubernetes/blob/release-1.24/pkg/kubelet/nodeshutdown/nodeshutdown_manager_linux.go
* https://github.com/godbus/dbus/blob/master/conn.go
dghubble-robot pushed a commit to poseidon/terraform-aws-kubernetes that referenced this issue Aug 28, 2022
* Configure Kubelet Graceful Node Shutdown to detect system shutdown
events and stop running containers gracefully when possible
* Allow up to 30s for critical pods to gracefully shutdown
* Allow up to 15s for regular pods to gracefully shutdown
* Node will be marked as NotReady promptly, instead of having to
wait for health checks
* Kubelet uses systemd inhibitor locks to delay shutdown for a limited
number of seconds
* Raise the default max inhibitor time from 5s to 45s

Verify systemd inhibitor locks are present:

```
sudo systemd-inhibit --list
WHO     UID USER PID  COMM    WHAT     WHY                                        MODE
kubelet 0   root 4581 kubelet shutdown Kubelet needs time to handle node shutdown delay
```

Tail journal logs and then shutdown a node via systemctl reboot
or via the cloud console to watch container shutdown

Rel:

* https://kubernetes.io/blog/2021/04/21/graceful-node-shutdown-beta/
* https://kubernetes.io/docs/reference/config-api/kubelet-config.v1beta1/
* kubernetes/kubernetes#107043
* coreos/fedora-coreos-tracker#821
* https://www.freedesktop.org/software/systemd/man/systemd-inhibit.html
* https://github.com/kubernetes/kubernetes/blob/release-1.24/pkg/kubelet/nodeshutdown/nodeshutdown_manager_linux.go
* https://github.com/godbus/dbus/blob/master/conn.go
dghubble-robot pushed a commit to poseidon/terraform-digitalocean-kubernetes that referenced this issue Aug 28, 2022
* Configure Kubelet Graceful Node Shutdown to detect system shutdown
events and stop running containers gracefully when possible
* Allow up to 30s for critical pods to gracefully shutdown
* Allow up to 15s for regular pods to gracefully shutdown
* Node will be marked as NotReady promptly, instead of having to
wait for health checks
* Kubelet uses systemd inhibitor locks to delay shutdown for a limited
number of seconds
* Raise the default max inhibitor time from 5s to 45s

Verify systemd inhibitor locks are present:

```
sudo systemd-inhibit --list
WHO     UID USER PID  COMM    WHAT     WHY                                        MODE
kubelet 0   root 4581 kubelet shutdown Kubelet needs time to handle node shutdown delay
```

Tail journal logs and then shutdown a node via systemctl reboot
or via the cloud console to watch container shutdown

Rel:

* https://kubernetes.io/blog/2021/04/21/graceful-node-shutdown-beta/
* https://kubernetes.io/docs/reference/config-api/kubelet-config.v1beta1/
* kubernetes/kubernetes#107043
* coreos/fedora-coreos-tracker#821
* https://www.freedesktop.org/software/systemd/man/systemd-inhibit.html
* https://github.com/kubernetes/kubernetes/blob/release-1.24/pkg/kubelet/nodeshutdown/nodeshutdown_manager_linux.go
* https://github.com/godbus/dbus/blob/master/conn.go
Snaipe pushed a commit to aristanetworks/monsoon that referenced this issue Apr 13, 2023
* Configure Kubelet Graceful Node Shutdown to detect system shutdown
events and stop running containers gracefully when possible
* Allow up to 30s for critical pods to gracefully shutdown
* Allow up to 15s for regular pods to gracefully shutdown
* Node will be marked as NotReady promptly, instead of having to
wait for health checks
* Kubelet uses systemd inhibitor locks to delay shutdown for a limited
number of seconds
* Raise the default max inhibitor time from 5s to 45s

Verify systemd inhibitor locks are present:

```
sudo systemd-inhibit --list
WHO     UID USER PID  COMM    WHAT     WHY                                        MODE
kubelet 0   root 4581 kubelet shutdown Kubelet needs time to handle node shutdown delay
```

Tail journal logs and then shutdown a node via systemctl reboot
or via the cloud console to watch container shutdown

Rel:

* https://kubernetes.io/blog/2021/04/21/graceful-node-shutdown-beta/
* https://kubernetes.io/docs/reference/config-api/kubelet-config.v1beta1/
* kubernetes/kubernetes#107043
* coreos/fedora-coreos-tracker#821
* https://www.freedesktop.org/software/systemd/man/systemd-inhibit.html
* https://github.com/kubernetes/kubernetes/blob/release-1.24/pkg/kubelet/nodeshutdown/nodeshutdown_manager_linux.go
* https://github.com/godbus/dbus/blob/master/conn.go
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants