Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

POC/RFC: Support reinstalls via kexec #791

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

jlebon
Copy link
Member

@jlebon jlebon commented Feb 22, 2022

This is a proof-of-concept showing how we can use kexec to support reinstalling assuming we have access to the rootfs image (this is strongly related to or exactly the same as coreos/fedora-coreos-tracker#399 depending on which flavour of "reset" we're talking about).

Demo:

$ coreos-installer reinstall /dev/vda --platform qemu --insecure
Downloading https://releases-art-rhcos.svc.ci.openshift.org/art/storage/releases/rhcos-4.10/410.84.202112091617-0/x86_64/rhcos-410.84.20.
Downloading https://releases-art-rhcos.svc.ci.openshift.org/art/storage/releases/rhcos-4.10/410.84.202112091617-0/x86_64/rhcos-410.84.202112091617-0-live-rootfs.x86_64.img...
Loading kernel and initramfs with arguments:  coreos.inst.install_dev=/dev/vda console=ttyS0 coreos.inst.platform_id=qemu coreos.inst.insecure
Pivoting
[ 1259.444402] kexec_core: Starting new kernel
[    0.000000] Linux version 4.18.0-305.28.1.el8_4.x86_64 (mockbuild@x86-vm-08.build.eng.bos.redhat.com) (gcc version 8.4.1 20200928 (Re1
[    0.000000] Command line:  coreos.inst.install_dev=/dev/vda console=ttyS0 coreos.inst.platform_id=qemu coreos.inst.insecure
[   15.495212] coreos-installer-service[1066]: coreos-installer install /dev/vda --platform qemu --insecure --fetch-retries infinite
[   15.660528] coreos-installer-service[1066]: Installing Red Hat Enterprise Linux CoreOS 410.84.202112091617-0 (Ootpa) x86_64 (512-byte)
[   15.951946]  vda: vda1 vda2 vda3 vda4
[   15.960578]  vda: vda1 vda2 vda3 vda4
[   16.955472] coreos-installer-service[1066]: Read disk 43.7 MiB/3.7 GiB (1%)
[  100.914384] coreos-installer-service[1066]: Read disk 3.7 GiB/3.7 GiB (100%)
[  106.201139] coreos-installer-service[1066]: Install complete.
[  107.902354] reboot: Restarting system
[  107.907561] reboot: machine restart
...
Red Hat Enterprise Linux CoreOS 410.84.202112091617-0 (Ootpa) 4.10
Ignition: ran on 2021/12/10 23:10:41 UTC (this boot)
Ignition: user-provided config was applied

[core@cosa-devsh ~]$ journalctl --list-boots --no-pager
 0 5588b683714241839237799c72ca28c6 Fri 2021-12-10 23:10:33 UTC—Fri 2021-12-10 23:12:01 UTC

(I've cut out a lot of stuff from the logs there to concentrate on the interesting bits; see this attachment for the full output if you're curious.)

It also supports specifying the target Ignition config, image URL, using a local live initramfs and live rootfs, etc...

FCOS is also supported though for there seems to be a bug in the kexec code there.

@jlebon
Copy link
Member Author

jlebon commented Feb 22, 2022

Apologies for the churn on this. I had to delete my fork which closed the original PR because I was hitting dependabot/dependabot-core#2804 and apparently the only hack for now is to delete and refork.

I haven't addressed any of the comments there yet.

@n10u53
Copy link

n10u53 commented Nov 8, 2022

I have played around with the POC last week, built from jlebon:pr/reinstall, promising feature, but did run into some trouble.

After quite some confusion I believe i have traced it back to coreos-installer reinstall failing an installation because of a faulty config combined with the TTY becoming unresponsive in some way.


When you pass coreos-installer reinstall a faulty config, EG: coreos-installer reinstall /dev/vda --ignition-url <path/to/.ign>, where:

  • "/dev/vda" is a nonexisting special block-file. This is handled correctly in "install" command of coreos-installer version "0.16.1".
  • "<path/to/.ign>" is a non-existing file or unavailable url. This too is handled by the latest version of coreos-install correctly.

The installation commences and the screen will eventually simply turn black from which it will never return.

I assume the installer will run into the obvious complications regarding these parameters while being unable to recover from them. Possibly reporting this back which obviously will no longer be of any help. The system, AFAICS in the QEMU resource manager, does never actually reboot when encountering this issue.
Power cycling the machine yourself will resume the system as-if it was only ever rebooted once in a apparent harmless fashion.


I have also tried multiple different "--console" options, none of which have any effect on this issue. I have tried:

  • ttyS0,
  • ttyS1,
  • pts/11 (as reported by QEMU as serial device),
  • All of the above options on their own and with ,115200n8.

@romfreiman
Copy link

isnt kexec is hardware dependent as well?

@jlebon
Copy link
Member Author

jlebon commented Dec 5, 2022

isnt kexec is hardware dependent as well?

Is it? I wasn't aware of that.

But anyway, based on feedback and more experimentation, I'm going to drop the kexec approach in favour of doing a reboot into the live environment. The upside is that it should be more reliable than kexec. The downside is that it becomes specific to CoreOS (whereas with kexec a major advantage was that the technique could work on non-CoreOS too).

@dustymabe
Copy link
Member

isnt kexec is hardware dependent as well?

Is it? I wasn't aware of that.

Yeah. I don't think it is hardware dependent. IIUC kdump uses kexec and we are running kdump tests in CI on every platform we support.

But anyway, based on feedback and more experimentation, I'm going to drop the kexec approach in favour of doing a reboot into the live environment. The upside is that it should be more reliable than kexec. The downside is that it becomes specific to CoreOS (whereas with kexec a major advantage was that the technique could work on non-CoreOS too).

The nice thing about the kexec approach, though, is that we don't have to deal with bootloaders (and all the cross platform problems that come with that too), though this detail from @lucab might cause some headache.

The kexec approach could be pretty simple too. I used a small script here to help someone out in the discussion forum.

@travier
Copy link
Member

travier commented Aug 8, 2023

See https://discussion.fedoraproject.org/t/can-you-pivot-a-fedora-install-to-a-fedora-core-os-install/86546/4 for a "manual" way to do this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants