Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for new crashkernel.conf file in kernel package #2894

Open
ryncsn opened this issue Jun 10, 2021 · 4 comments
Open

Add support for new crashkernel.conf file in kernel package #2894

ryncsn opened this issue Jun 10, 2021 · 4 comments

Comments

@ryncsn
Copy link

ryncsn commented Jun 10, 2021

Hi, currently a new crashkernel.conf file is being proposed to be added to Fedora/RHEL kernel package:
https://gitlab.com/cki-project/kernel-ark/-/merge_requests/1171

This is a replacement of crashkernel=auto which was an RHEL-only kernel feature, it was added after a long discussion in upstream:
https://lore.kernel.org/linux-mm/20210507010432.IN24PudKT%25akpm@linux-foundation.org/

The idea is simply, each kernel package will install a extra config file: /usr/lib/modules/<kver>/crashkernel.conf
It contains the recommended default crashkernel value of that kernel.

Other packages can use that file to config kernel, like kexec-tools:
https://lists.fedoraproject.org/archives/list/kexec@lists.fedoraproject.org/thread/MKUSELX3GKKXPFNLEGJMWXSS2LADRPMQ/

kexec-tools will introduce a new command kdumpctl reset-crashkernel and will install a kernel install hook.
The new command is a helper to read from that file and reset kernel's command param to the default value.
The kernel install hook will update new installed kernel's crashkernel argument to new crashkernel.conf's value, if current kernel is also using default value from its crashkernel.conf. If user is using a different value, it won't touch the value then.

The problem with rpm-ostree is that, it skips all kernel hooks. So I'm not sure how should we handle this for CoreOS.

The worst impact is crashkernel= simply used the default value, but not being updated automatically when kernel updated, this may result in OOM in kdump kernel if new kernel require more crashkernel value to work.

@cgwalters
Copy link
Member

Thanks for filing this!

The problem with rpm-ostree is that, it skips all kernel hooks.

Yeah, rpm-ostree wants to own the dracut invocation to make it part of the transaction. ostree owns the kernel/initramfs.

We can add new special casing for this somewhere in the stack, or...

So I'm not sure how should we handle this for CoreOS.

Note it's not just CoreOS that uses rpm-ostree, so does e.g. RHEL for Edge.

kexec-tools will introduce a new command kdumpctl reset-crashkernel and will install a kernel install hook.
The new command is a helper to read from that file and reset kernel's command param to the default value.
The kernel install hook will update new installed kernel's crashkernel argument to new crashkernel.conf's value, if current kernel is also using default value from its crashkernel.conf. If user is using a different value, it won't touch the value then.

I only partially follow this, but it sounds like it boils down to:

  1. Tool that computes new kernel argument
  2. Apply new kernel argument

Right? I think only 2) needs to differ for rpm-ostree, it needs to be rpm-ostree kargs instead of e.g. grubby or so.

Or is there more to it?

@cgwalters
Copy link
Member

Note also that as a general rule, anything you do in an e.g. %post that actually wants to affect user data is usually better done as a systemd unit instead. That seems to apply here, particularly because there already is a kdump.service where this update could be performed.

@jlebon
Copy link
Member

jlebon commented Jun 11, 2021

That seems to apply here, particularly because there already is a kdump.service where this update could be performed.

IIUC, I think the issue there is that it would be out-of-step with the update. So e.g. you would update to a new OSTree which ships a new crashkernel.conf, but the new arg doesn't actually make it to the kernel cmdline until the next boot after that. So for one boot, you may be allocating less memory than recommended for the given kernel.

Depending on how tight the tolerances are and how often the values are actually expected to change, that may not be too big an issue. If we really want, we could also have glue code in early initramfs which detects this and edits the bootloader config and immediately reboots, like Ignition kargs.

This is related to ostreedev/ostree#479.

Another approach (though it would require more kernel modifications) would be to split the karg into two: one for the memory sizes/ranges (e.g. crashkernelmem), and one for whether to turn it on or off (e.g. crashkernel=1). Then we could just bake in the crashkernelmem karg at compose time and kdump.service would just add crashkernel.

@ryncsn
Copy link
Author

ryncsn commented Jun 11, 2021

Thank you for looking into this!

Thanks for filing this!

The problem with rpm-ostree is that, it skips all kernel hooks.

Yeah, rpm-ostree wants to own the dracut invocation to make it part of the transaction. ostree owns the kernel/initramfs.

We can add new special casing for this somewhere in the stack, or...

So I'm not sure how should we handle this for CoreOS.

Note it's not just CoreOS that uses rpm-ostree, so does e.g. RHEL for Edge.

That's very helpful info, we have only tested this new idea on RHEL/Fedora, so far it worked pretty well.

kexec-tools will introduce a new command kdumpctl reset-crashkernel and will install a kernel install hook.
The new command is a helper to read from that file and reset kernel's command param to the default value.
The kernel install hook will update new installed kernel's crashkernel argument to new crashkernel.conf's value, if current kernel is also using default value from its crashkernel.conf. If user is using a different value, it won't touch the value then.

I only partially follow this, but it sounds like it boils down to:

  1. Tool that computes new kernel argument
  2. Apply new kernel argument

Right? I think only 2) needs to differ for rpm-ostree, it needs to be rpm-ostree kargs instead of e.g. grubby or so.

Or is there more to it?

Yes, for 2), kexec-tools, it doesn't support ostree yet but should be simple to just call rpm-ostree kargs instead of ostree detected.
The new kernel hook (/usr/lib/kernel/install.d/92-crashkernel.install) also belongs to kexec-tools (may change that to other package if necessary). So kexec-tools package will also want to update kernel cmdline as kernel gets installed.

Using a service, as @jlebon said, will have to make user reboot at least twice after kernel update to get the crashkernel value right. And besides, it actually makes update the crashkernel value more difficult, since if it doesn't happen at kernel installation, kexec-tools couldn't know what crashkernel value the previous kernel is using. It only update crashkernel value if previous kernel is also using default value.

This crashkernel.conf update probably will only happen once or twice for a major release, so it's not expected to happen very often.
So I agree it's might be not a big problem, but still that's not guaranteed so will be better if we can implement this well. I think some more glue code should be good, but not sure if that will make thing ugly from your side?

Breaking the crashkernel cmdlint into two will look really wired from the kernel side, I think upstream is not going to like it, but we can append another crashkernel=0 to override it, or release the memory after boot finished (just echo 0 > /sys/kernel/kexec_crash_size). Release after boot finished might make the boot time memory pressure slightly higher, but usually, that's not a problem I think.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants