-
-
Notifications
You must be signed in to change notification settings - Fork 12.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add support for soft-reboot #309911
base: master
Are you sure you want to change the base?
add support for soft-reboot #309911
Conversation
98fd965
to
2dabb7a
Compare
b8c0bb7
to
57e4d6c
Compare
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
fc6c978
to
b619ed4
Compare
This needs a VM test. Also this probably can not be merged anymore until the 24.05 cutoff has happened. |
> A "soft reboot" is similar to a regular reboot, except that it affects > userspace only: the service manager shuts down any running services and other > units, then optionally switches into a new root file system (mounted to > /run/nextroot/), and then passes control to a systemd instance in the new file > system which then starts the system up again. The kernel is not rebooted and > neither is the hardware, firmware or boot loader. This provides a fast, > lightweight mechanism to quickly reset or update userspace, without the latency > that a full system reset involves. Moreover, open file descriptors may be > passed across the soft reboot into the new system where they will be passed > back to the originating services. This allows pinning resources across the > reboot, thus minimizing grey-out time further. not only can soft-reboot can be used as an alternative to kexec and reboot, but it can also be used as an alternative to `nixos-rebuild switch` in many cases. Unlike kexec and reboot, soft-reboot can keep certain resources around across the reboot. For example, socket units can be configured to stay around such that reboots can happen without dropping any connections. Furthermore file descriptors can be passed across boots using FDSTORE to even keep existing connections intact. Please see the man-page for more info: https://www.freedesktop.org/software/systemd/man/latest/systemd-soft-reboot.service.html#
switch-to-configuration boot will now create a symlink /run/next-system systemctl soft-reboot will call /run/next-system/activate just before soft-rebooting, making sure that the system we're rebooting into is activated. Thus you can do: nixos-rebuild boot && systemctl soft-reboot now
We want to get rid of specialFileSystems / earlyMountScript eventually and there is no need to run this before systemd anymore now that the wrappers themselves are set up in a systemd unit since NixOS#263203 Also this is needed to make soft-reboot work. We want to make sure that we remount /run/wrappers with the nosuid bit removed on soft-reboot but because @earlyMountScript@ happens in initrd, this wouldn't happen
Neat follow-up; now that Make a version of
|
@@ -110,6 +110,8 @@ | |||
} | |||
|
|||
if ($action eq "boot") { | |||
# /run/next-system/activate is called when systemctl soft-reboot is called | |||
system("@coreutils@/bin/ln", "-sfn", "$toplevel", "/run/next-system") == 0 or exit 1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we do this on "switch"
too?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doesn't seem necessary. The point of next-system
is to have a link to activate on soft-reboot, and a switch
will have already activated the new system. Even on a soft-reboot
, reactivation shouldn't be necessary.
DefaultDependencies = false; | ||
ConditionPathExists = "/run/next-system/activate"; | ||
}; | ||
serviceConfig.ExecStart = "/run/next-system/activate"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this fall back to /run/current-system/activate
if /run/next-system/activate
doesn't exist?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same argument as not creating next-system
on switch
; the current system is already activated so it shouldn't be necessary.
This fixes the long standing issue NixOS#50300
# make sure /run/next-system isn't garbage collected before the next boot | ||
systemd.tmpfiles.rules = [ "L+ /nix/var/nix/gcroots/next-system - - - - /run/next-system" ]; | ||
|
||
# if /run/next-system exists, activate it before rebooting |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this necessary?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Activation happens in the initrd (at least if systemd-initrd is enabled) which soft-reboot doesn't transition into. Same reason why we do activation in the initrd before transitioning into stage-2.
If we don't do this the new system will never get activated
It seems the minimal required part of this PR is the two additional upstream units and the wrapper mount point. It's enough for the |
Using
|
Moving this back to draft. This needs to cook a bit more. But just keeping this PR around for notes and feedback I just realized a pretty serious defect: soft-reboot re-execs into but we set which means soft-reboot doesn't execute the new systemd version but into the old one. Maybe we do need to do the |
Wait no.. We call However I want to have a NixOS Test that asserts this |
echo "Loading NixOS system via kexec." | ||
exec kexec --load $p/kernel --initrd=$p/initrd --append="$(cat $p/kernel-params) init=$p/init" | ||
fi | ||
done |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we should mess with kexec at all in this PR. It's actually got some problems as is and this is only going to make it more confusing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeh I'll drop the commit (and maybe open a separate or for it)
systemd.tmpfiles.rules = [ "L+ /nix/var/nix/gcroots/next-system - - - - /run/next-system" ]; | ||
|
||
# if /run/next-system exists, activate it before rebooting | ||
systemd.services.systemd-soft-reboot-activate-next-system = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What happened to doing this in ExecStartPre
in systemd-soft-reboot.service
with a -
in the front of the command? That seemed quite elegant to me. With a new unit, I'm less sure that the unit ordering is right.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I copied the ordering from prepare-kexec. I'm pretty confident this is right.
Using ExecStartPre= It would complain about it not being able to connect to the journald.
May 07 21:11:02 utm (activate)[6377]: systemd-soft-reboot.service: Failed to connect stdout to the journal socket, ignoring: Connection refused
soft-reboot.target causes systemd-journald.service to exit before it is reached (has a Conflicts on it) to make sure logs are properly flushed before the switch is made. I think that's what causing this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, the ordering from prepare-kexec
isn't exactly relevant, because it isn't really ordered against anything at all except kexec
itself. I would think we would want activation to happen as late as possible, i.e. after as much as possible has shut down. Maybe this doesn't matter; it would just mimic an actual reboot better if activation happened when everything was shut down.
@@ -110,6 +110,8 @@ | |||
} | |||
|
|||
if ($action eq "boot") { | |||
# /run/next-system/activate is called when systemctl soft-reboot is called | |||
system("@coreutils@/bin/ln", "-sfn", "$toplevel", "/run/next-system") == 0 or exit 1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doesn't seem necessary. The point of next-system
is to have a link to activate on soft-reboot, and a switch
will have already activated the new system. Even on a soft-reboot
, reactivation shouldn't be necessary.
DefaultDependencies = false; | ||
ConditionPathExists = "/run/next-system/activate"; | ||
}; | ||
serviceConfig.ExecStart = "/run/next-system/activate"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same argument as not creating next-system
on switch
; the current system is already activated so it shouldn't be necessary.
Description of changes
Usage:
switch-to-configuration boot
will now prepare a symlink/run/next-system
and just before soft-reboot we will automatically call
/run/next-system/activate
to activate the next system to be booted whensystemctl soft-reboot
is called.not only can soft-reboot can be used as an alternative to kexec and reboot, but
it can also be used as an alternative to
nixos-rebuild switch
in many cases.Unlike kexec and reboot, soft-reboot can keep certain resources around across the reboot.
For example, socket units can be configured to stay around such that reboots can happen
without stopping accepting any connections. Furthermore file descriptors can be passed across
boots using FDSTORE to even keep existing connections intact.
Please see the man-page for more info: https://www.freedesktop.org/software/systemd/man/latest/systemd-soft-reboot.service.html#
from Systemd NEWS:
Things done
nix.conf
? (See Nix manual)sandbox = relaxed
sandbox = true
nix-shell -p nixpkgs-review --run "nixpkgs-review rev HEAD"
. Note: all changes have to be committed, also see nixpkgs-review usage./result/bin/
)Add a 👍 reaction to pull requests you find important.