Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

systemd initrd loads graphics driver too late #202846

Open
ncfavier opened this issue Nov 25, 2022 · 18 comments
Open

systemd initrd loads graphics driver too late #202846

ncfavier opened this issue Nov 25, 2022 · 18 comments

Comments

@ncfavier
Copy link
Member

ncfavier commented Nov 25, 2022

The systemd-based initrd parallelises the early boot process, which is good except that it removes the ordering between loading kernel modules (systemd-modules-load.service) and udev (systemd-udevd.service). This causes issues for me because my graphics driver (amdgpu) takes about 3 seconds to load, at which point:

  1. the console font set by the 90-vconsole.rules udev rule is reset to the default, and only reset correctly much later, when reload-systemd-vconsole-setup.service is pulled by multi-user.target;
  2. as reported here, the console gets scrolled up a few lines while the LUKS password prompt stays floating in the middle of the screen. I don't know if this is a bug with amdgpu, Linux, systemd, or how to even begin troubleshooting it.

Setting

{ boot.initrd.systemd.services.systemd-udevd.after = [ "systemd-modules-load.service" ]; }

fixes both issues.

If this turns out to be the correct solution to this problem then it should be taken upstream, but I wanted to ask opinions here first.

There is also the alternative of considering issues 1 and 2 as bugs (with Linux? amdgpu?) and trying to fix them, which might take a long time.

cc @oxalica @ElvishJerricco @NickCao @lheckemann @dasJ

@NickCao
Copy link
Member

NickCao commented Nov 25, 2022

Shouldn't udev be well prepared to handle these module load events, as it is the Dynamic device management daemon after all, maybe just tweaking the udev rule a bit to trigger 90-vconsole.rules again after the module load is the right fix. Still I'm curious about the design behind all these: why we need both a service and a udev rule to get the console right.

@ncfavier
Copy link
Member Author

maybe just tweaking the udev rule a bit to trigger 90-vconsole.rules again after the module load is the right fix

That would be great, but I can't figure it out: looking at udev debug logs, the only event emitted for vtcon0 is the initial add. I tried adding the same rule for the amdgpu device and for fb* devices, but in both cases systemd-vconsole-setup fails.

why we need both a service and a udev rule to get the console right

I think the udev rule is for initialising the vconsole as soon as it appears, while the service is for reconfiguring it when changes are made to vconsole.conf.

@NickCao
Copy link
Member

NickCao commented Nov 25, 2022

Searching through the issues led me to this: systemd/systemd#2612

Well, I don't see how this could ever work: we simply don't know whether there will be another KMS driver showing up or not. Device probing is full async, hence it might appear any time, and there's no point in time where we know everything has shown up. Hence wey cannot delay vconsole accordingly.

However it also mentions:

in systemd 231-232 we reworked the vconsole font setup process, fixing various issues

While not pointing out what the reworked process looks like,

@ncfavier
Copy link
Member Author

I saw that, it's rather old, I'm pretty sure the rework is systemd/systemd#3742, which is just the current state of things as we know it.

@NickCao
Copy link
Member

NickCao commented Nov 25, 2022

If that's the rework, it does not seem to address the race conditions, or does it?

@ncfavier
Copy link
Member Author

At least not the one we care about now.

@NickCao
Copy link
Member

NickCao commented Nov 25, 2022

Per the information I have had, the vconsole setup issue is a won't fix, due to the natural of async device probing, imaging the scenario that a particular mother board powering up the GPU halfway through the boot process, after systemd-modules-load.service, whatever ordering of services won't be able to handle this. The mysterious scrolling requires further investigation though.

@NickCao
Copy link
Member

NickCao commented Nov 25, 2022

If there has to be a fix, I think it's the kernel VT infrastructure's responsibility to either persist the vconsole states across the driver load, or inform udev of the change in order to trigger a reload.

@ElvishJerricco ElvishJerricco added this to To Do in systemd in Stage 1 via automation Nov 25, 2022
@9ary
Copy link
Contributor

9ary commented Jan 18, 2023

For what it's worth, #210205 should improve the console font situation for hidpi users by relying on in-kernel font selection instead of loading it from userspace. However it doesn't help those who want to customize the font or need a different one for i18n.

I've had a quick look at the fbcon driver, and nothing in there suggests that a resolution change would reset the font to the default. My current working theory is that when loading the gpu driver, a whole new instance of fbcon takes over the existing vt, so all existing state is lost.
I don't know how practical it would be for the kernel to persist that on its end, as opposed to notifying userspace of the change. I think it would be best to start by sending a message to the kernel mailing list and see what they have to say about this.

Also, this doesn't just affect systemd in initrd, the same problem exists when loading GPU drivers in stage 2, and I can also reproduce it on Arch.

@ncfavier
Copy link
Member Author

sending a message to the kernel mailing list

Do you want to take care of it? You seem more competent than I am.

@9ary
Copy link
Contributor

9ary commented Jan 18, 2023

I would, but I'm very overwhelmed with other things right now so I'm not really sure that I can give this proper attention.

@rikkaneko
Copy link

I found that the systemd-vconsole-setup is called by the udev rules only but does not include the systemd-vconsole-setup.service. The solution would be to include this service to the initrd and add the systemd-modules-load.service as a dependency.

@9ary
Copy link
Contributor

9ary commented Jan 24, 2023

That's incorrect as far as I know. The udev rule exists for event-driven initialization and it is working as intended. This is a kernel problem and it would be much more useful to fix the root cause rather than come up with bandaid workarounds.

@9ary
Copy link
Contributor

9ary commented Jan 24, 2023

I actually ended up posting to lkml about this because apparently I'm hyperfixating on this anyway.

https://lore.kernel.org/all/CANnEQ3Ef5-XRSVL=RCBuKKhR0oZF+SO2BSSiBigZOyjMeQ7f_g@mail.gmail.com/

@9ary
Copy link
Contributor

9ary commented Mar 12, 2023

Unfortunately I never got a reply. :( Maybe I should try again.

@ncfavier
Copy link
Member Author

@9ary
Copy link
Contributor

9ary commented Mar 21, 2023

Interesting find, I wonder if that commit helps or makes things worse.

Edit: looks like no difference. That revert is present in 6.2.7, but I'm still seeing the same problem. So that change really just removed code that didn't work.

Here's the same patch in mainline: torvalds/linux@12d5796.

@9ary
Copy link
Contributor

9ary commented Mar 21, 2023

It would be helpful to have a minimal VM repro of this (using virtio-gpu maybe?).

Also here's my understanding of what's happening and what needs to change: during boot, we initially see efifb and the first fbcon instance. Then when the GPU driver loads, a new instance of fbcon spawns on the new framebuffer driver, and takes over the existing vtcon. This discards the font, and because a new vtcon isn't being added, no event hits udev.

One way to work around this problem is to pass the quiet option to the kernel. In that case, the vtcon isn't initialized until fairly late in the boot process, often enough long after the GPU driver has loaded, so the font doesn't get disrupted.

So what needs to happen is the kernel should emit a "change" event whenever a new backend driver takes over a vtcon.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Development

No branches or pull requests

4 participants