Skip to content
This repository has been archived by the owner on Aug 11, 2023. It is now read-only.

Wrong kernel output after system update #38

Closed
gilbsgilbs opened this issue Mar 13, 2021 · 18 comments
Closed

Wrong kernel output after system update #38

gilbsgilbs opened this issue Mar 13, 2021 · 18 comments
Labels

Comments

@gilbsgilbs
Copy link

What's the issue:

Sometimes, sbupdate outputs the wrong kernel in /boot partition which results in being unable to boot with an error similar to this:

Warning: /lib/modules/5.8.8-arch1-1/modules.devname not found - ignoring
                      ^ ^ ^ notice old kernel version

See this thread on BBS for more context.

How to reproduce the issue:

(don't try it unless you know what you're doing, it's annoying)

  • Have an old (dangling) kernel version in /var/lib/modules/ along with your current kernel
  • Run pacman -S systemd (I guess it'd also reproduce when running just sbupdate command, but note it does not reproduce with pacman -S linux)
  • Reboot

Additional info:

As I understand the problem, I think the following events happen:

  1. The pacman hook triggers because of the systemd package update
  2. As there's no kernel present in systemd package, sbupdate decides to sign all kernels present in /var/lib/modules/
  3. With some bad luck on the processing order, it gets the old kernel last which results in the old kernel being copied and signed to the output

Some remarks:

  • The issue doesn't reproduce when updating linux package because it catches the new kernel file from standard input, so there's no confusion possible. This trick doesn't work when the hook triggers from a different package such as systemd.
  • Does it really makes sense to process each kernel in a pipeline? As far as I understand, each signed kernel will be overwritten by the next one and eventually, only the last one will remain in the /boot partition. Am I missing something?
  • Not sure what the proper fix to this bug is. Maybe if multiple kernels are found it could pick only the most recent one (based on lstat) and display some warning. And the kernel directory pattern in /var/lib/modules/ could be made configurable somehow.

Thanks.

@Maryse47
Copy link
Contributor

Maryse47 commented Mar 13, 2021

Does it really makes sense to process each kernel in a pipeline? As far as I understand, each signed kernel will be overwritten by the next one and eventually, only the last one will remain in the /boot partition. Am I missing something?

There may be multiple kernels installed (not the old ones but from different packages). You want to process them all.

I wonder how did you managed to have files from old kernel in /var/lib/modules/? Did you copy it manually there instead of using pacman? I think this may the issue to focus on as I doubt sbupdate can guess which one is legitimate and which one is some legacy cruft that slipped through packaging system.

@gilbsgilbs
Copy link
Author

There may be multiple kernels installed (not the old ones but from different packages). You want to process them all.

Yes, thanks, this feature is what I missed.

I wonder how did you managed to have files from old kernel in /var/lib/modules/? Did you copy it manually there instead of using pacman?

Surely not by copying them manually. One possibility is that I used this pacman hook at some point which does this kind of stuff and introduced the issue. IIRC, I uninstalled the hook because I suspected it to be responsible of the issue but it persisted nonetheless (because I didn't think of cleaning up the old kernel modules). AFAIC, I don't think the way that hook works is particularly illegitimate, intrusive or "dirty" and it looks quite heavily used. There was even a topic to move it from AUR to community repo.

I think this may the issue to focus on as I doubt sbupdate can guess which one is legitimate and which one is some legacy cruft that slipped through packaging system.

I concur that it may not be that easy to guess which one is the legitimate kernel, but most importantly the current behavior of sbupdate when it happens is undetermined which surely is not expected: a crash with a clear error message like "you have multiple versions of the same kernel, please remove one" would be better than having potential silent failures that may break the boot. That's why I think it is sbupdate's responsibility to either inform the user that something looks wrong or let the user pick the kernel version they want to sign or pick one version arbitrarily (the most recent one sounds to me like a reasonable trade-off).

I've been struggling with this issue for a while and as the issue is inconsistent (depends on which package you upgrade) and unrelated to signature, it was quite hard to figure out sbupdate was actually involved in it. On top of that, something did go wrong in sbupdate software as it was overwriting the same kernel in one run which looks very much like a bug.

@Maryse47
Copy link
Contributor

AFAIC, I don't think the way that hook works is particularly illegitimate, intrusive or "dirty" and it looks quite heavily used. There was even a topic to move it from AUR to community repo.

I use exact same hook and never had issues like yours so this may be unrelated.

@gilbsgilbs
Copy link
Author

gilbsgilbs commented Mar 13, 2021

I use exact same hook and never had issues like yours so this may be unrelated.

After trying out, this hook does put the two consecutive kernel versions in /usr/lib/modules/ and sbupdate signs them both, overwriting the previous one (this is already dubious). Most of the time it is guaranteed to be harmless because alphabetical order matches versioning order, so the most recent version will be signed last. It starts to get touchy when reaching round versions like 5.9.9 => 5.9.10 or 5.9.14 => 5.10.0 because alphabetical order doesn't match versioning order is such case. When that happens, I reckon sbupdate will make you boot on the old kernel which sounds like a bug. In my case, the dangling kernel (initially created by the hook but never cleaned up as I uninstalled the hook) was very old and wasn't even compatible which is an unfortunate and probably very rare combination of circumstances. But I'd bet you're also affected by the same issue without knowing on round kernel version.

@Maryse47
Copy link
Contributor

Maryse47 commented Mar 13, 2021

But I'd bet you're also affected by the same issue without knowing on round kernel version.

I would rather know if I booted kernel that doesn't have corresponding modules in /usr/lib/modules/. You can't load module from different kernel version so incompatibility isn't an accident but a rule. I'm pretty sure it never happened.

@gilbsgilbs
Copy link
Author

gilbsgilbs commented Mar 13, 2021

Their cleanup service doesn't remove the modules of the currently loaded kernel (thanks to the %v specifier in the SystemD service). That's why you were still able to boot on the old kernel.

@Maryse47
Copy link
Contributor

I also build my own kernels of every single release for several years and I can assure you I know what kernel version I boot because I must test them.

@gilbsgilbs
Copy link
Author

gilbsgilbs commented Mar 13, 2021

I don't know about your specific workflows. As said in the first post, if sbupdate is triggered from the hook following the installation of a kernel package, it'll properly detect the latest kernel and you would fail to reproduce the issue. You need to run sbupdate manually or trigger it from another hook (e.g. upgrade systemd) after having installed the kernel.

From a purely programmatic standpoint, the issue is crystal clear, easy to reproduce and requires no assumption on the user's setup. When run outside of a kernel upgrade hook, sbupdate:

  • uselessly proceeds to sign multiple versions of the same kernel, overwriting the previous one after each pass
  • outputs only the latest kernel in alphabetical order of each kernel kind which doesn't make any sense and is dangerous

This is very easy to reproduce, very easy to understand by just scrolling through the code and I just cannot believe it is the intended or desired behavior. Regardless of the reasons that lead to having multiple versions of the same kernel in /usr/lib/modules.

@Maryse47
Copy link
Contributor

Maryse47 commented Mar 13, 2021

You need to run sbupdate manually or trigger it from another hook (e.g. upgrade systemd) after having installed the kernel.

Which i did hundreds of times for all that years I use both tools.

Their cleanup service doesn't remove the modules of the currently loaded kernel (thanks to the %v specifier in the SystemD service). That's why you were still able to boot on the old kernel.

The error you showed comes from initramfs. Initramfs is created by mkinitcpio (or alternative) which copies modules from latest installed kernel (unless you claim mkinitcpio is affected by the same issue and confuses kernel versions the same way sbupdate does). Therefore it's not possible that old kernel copied by sbupdate will work with initramfs created by mkinitcpio inside the bundled image. It will always fail as you posted in first comment.

Maybe if multiple kernels are found it could pick only the most recent one (based on lstat)

This won't work because after kernel hook restores old kernel backup it will be the latest one (based on lstat) instead of the one coming from package.

From a purely programmatic standpoint, the issue is crystal clear, easy to reproduce and requires no assumption on the user's setup.

It requires assumption about the correct action to do when multiple versions of the same kernel exists on system. Even if you think the issue is obvious the correct action isn't.

It's not always correct to choose the kernel with higher version (even if calculated numerically instead of alphabetically) because user may did a downgrade. It's not always correct to choose the latest timestamp because the older version may be created later as mentioned above in hook backup case. It's possible there is no correct action that will work all the time especially in irregular circumstances.

The question is if scenario with multiple instances of the same kernel can be supported. I'm skeptical if it can, at least not with solutions you provided.

@andreyv
Copy link
Owner

andreyv commented Mar 14, 2021

@gilbsgilbs Thanks for the detailed report.

sbupdate looks specifically for kernel images, like /usr/lib/modules/5.11.6-arch1-1/vmlinuz, which are installed by a corresponding kernel package (linux in this example). The name of the package, which is also used for the combined EFI image, can be found in /usr/lib/modules/5.11.6-arch1-1/pkgbase.

There can be multiple kernels installed, like linux and linux-lts. Each of them installs their own /usr/lib/modules subdirectory and own vmlinuz image. There should be only one vmlinuz per package/pkgbase.

It is expected behavior for sbupdate to process all installed kernels under the above assumptions.

In your case you had two vmlinuz images for one pkgbase. One of them was a stale file which should have been removed by Pacman on upgrade, but wasn't. Of course, you should fix this, so that all vmlinuz files are owned by Pacman.

On the sbupdate side, it makes sense to display an error if multiple vmlinuzes are found for one pkgbase, as it is not clear which one should be used. I'll see how to better implement this.

@andreyv andreyv added the bug label Mar 14, 2021
@gilbsgilbs
Copy link
Author

@andreyv you perfectly summed it up. Just having sbupdate raise an error in such case would look like an intended behavior and would have saved me a lot of hassle.

As progressing through the issue though, I noticed that there is at least one actual real-world use-case where one would have multiple versions of the same "pkgbase" in /usr/lib/modules: kernels-modules-hooks. For users of this hook, supdate behavior is already shaddy but will very rarely cause any apparent issue. If you decide to consistently raise an error, sbupdate may just stop working for them. You can decide this is fine or take an alternative approach such as:

  • only pick kernels that are owned by at least one pacman package and error if there are many for one pkgbase
  • pick all kernels just like today, but if there's a conflict pick the only kernel owned by a package and error only if there are many for one pkgbase
  • always take the most recent version for each pkgbase (according to lstat or semver)
  • let user choose the kernel they want

Let me know if you need some help. I can also submit a PR.

@Maryse47
Copy link
Contributor

Maryse47 commented Mar 14, 2021

On the sbupdate side, it makes sense to display an error if multiple vmlinuzes are found for one pkgbase, as it is not clear which one should be used. I'll see how to better implement this.

This will cause more harm than good as then every single kernel-module-hook user will get it broken at every update and even OP admitted:

I don't think the way that hook works is particularly illegitimate, intrusive or "dirty" and it looks quite heavily used

Currently it works without issues for 99% of time. I wholly disagree this is fine approach.

only pick kernels that are owned by at least one pacman package

This sounds promising - in case there is more than one kernel version per pkgbase always choose one coming from package. I wonder how hard would be to implement it.

error if there are many for one pkgbase

Do you mean if there are many kernels for the same pkgbase owned by actual packages? How this can happen? This wouldn't help for your case.

always take the most recent version for each pkgbase (according to lstat or semver)

As explained earlier this isn't correct.

let user choose the kernel they want

Do you mean after every kernel upgrade you will go to sbupdate.conf and hardcode current kernel version? This doesn't sound much user friendly and it may be easier to take a look at /usr/lib/modules and clean it from cruft.

@gilbsgilbs
Copy link
Author

gilbsgilbs commented Mar 14, 2021

@Maryse47

Do you mean if there are many kernels for the same pkgbase owned by actual packages? How this can happen? This wouldn't help for your case.

That's what I meant. You're right in saying it wouldn't have helped in my specific case (filtering kernels belonging to a package would have been sufficient). The only case I can think of where this would happen is if the user installed two kernels with the same "pkgbase" that weren't properly marked as conflicting, but I agree it sounds already dubious enough that it may be safe to just ignore that case. Though it always feels safer when a program is self-aware of its own limitations and is able to detect inconsistent states (rather than breaking the system by doing something wrong).

Do you mean after every kernel upgrade you will go to sbupdate.conf and hardcode current kernel version?

There may be other ways: e.g. define some rules in sbupdate.conf to automatically pick the proper version or show some interactive prompt (if that's even possible). Not saying I like that approach but that's technically feasible.


That said, I agreed with all your points. I tried to come up with multiple approaches to not restrain the discussion, but I think the two first are superior (leaving apart details on error handling). From the two, the first one might be the easiest to implement yet slightly less flexible and a bit more prone to breaking existing setups. Both solutions would work seamlessly with kernel-module-hook, would have prevented my original issue, shouldn't break any real-world setup and I think they'd improve sbupdate's behavior overall.

@andreyv
Copy link
Owner

andreyv commented Mar 15, 2021

This should be fixed in 4e6d106. You can test the use-boot-vmlinuz branch.

@gilbsgilbs
Copy link
Author

Thanks @andreyv . Just tested it and it worked well (both with and without the hook). Very smart and clean solution, I only see upsides from it.

@andreyv
Copy link
Owner

andreyv commented Mar 19, 2021

Alright, thanks.

@gilbsgilbs
Copy link
Author

gilbsgilbs commented Mar 19, 2021

After thinking of it a bit more, I suspect this solution implies an important security drawback related to #36. An attacker with write access to the /boot partition could swap the kernels at /boot/vmlinuz-* with an unsigned malicious one. If the next update triggers the hook but doesn't include a kernel or if the user runs sbupdate manually, sbupdate will recklessly sign the malicious kernel. It is even possible to increase the odds of a successful attack by swapping the kernels just after a release of the systemd package in arch repos.

@andreyv
Copy link
Owner

andreyv commented Mar 19, 2021

See the comment in the linked issue. /boot should be on a secure filesystem.

Previously the same conditions already applied to /boot/initramfs-*.img. mkinitcpio also accesses /boot/vmlinuz-* to determine installed kernel version.

Probably the related warning in README should be made more prominent.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests

3 participants