Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Redundant kernel args #199

Open
TommyTran732 opened this issue Feb 7, 2024 · 11 comments
Open

Redundant kernel args #199

TommyTran732 opened this issue Feb 7, 2024 · 11 comments

Comments

@TommyTran732
Copy link

TommyTran732 commented Feb 7, 2024

According to the kernel's documentation:

auto,nosmt:
Mitigate all CPU vulnerabilities, disabling SMT if needed. This is for users who always want to be fully mitigated, even if it means losing SMT. Equivalent to:

  • l1tf=flush,nosmt [X86]
  • mds=full,nosmt [X86]
  • tsx_async_abort=full,nosmt [X86]
  • mmio_stale_data=full,nosmt [X86]
  • retbleed=auto,nosmt [X86]

Why are these other args being explicitly set in /etc/default/grub.d if mitigations=auto,nosmt is already being set?

There is a limit on number of characters that can be in the kernel args as well, and the kernel args we set will just get longer and longer over time. I don't think it is a good idea to waste the precious characters on these redundant args. Either we use mitigations=auto,nosmt (which should be the default anyways), or explicitly spell out which args to set so the user can easily customize them. There is really no reason to have both.

@adrelanos
Copy link
Member

Saving space in kernel command line, reducing command line options, avoiding redundancy would be good.

We might comment these options out and document that inside the settings file instead out deleting these to avoid having this topic come up again.

Waiting a few days before I look into this and welcoming other comments or review.

@monsieuremre
Copy link
Contributor

auto,nosmt
Mitigate all CPU vulnerabilities, disabling SMT
if needed. This is for users who always want to
be fully mitigated, even if it means losing SMT.
Equivalent to:
l1tf=flush,nosmt [X86]
mds=full,nosmt [X86]
tsx_async_abort=full,nosmt [X86]
mmio_stale_data=full,nosmt [X86]
retbleed=auto,nosmt [X86]

Quoting from the documentation. So the man is right. Might as well be less redundant.

@raja-grewal
Copy link
Contributor

Yes this can be done if desired. Note this situation was discussed earlier in Issue #177 (comment).

The primary reason the redundant kernel parameters were kept in place was for potential future proofing. If the default mitigations=auto,nosmt settings were to change and become more lenient in the future, our hardcoded settings would remain fixed and be applied regardless.

However, the concern regarding limited number of characters for kernel arguments was not considered by me. So, if this is a genuine concern, we can easily remove the parameters. Perhaps instead of removing the lines entirely just comment them out so users can verify the correct mitigations are in place if required.

@TommyTran732
Copy link
Author

Should I add a note explicitly saying that these are redundant?

@adrelanos
Copy link
Member

Yes. That would be nice so anyone else looking into this knows it and no need to rehash this discussion. Always good that way.

@raja-grewal
Copy link
Contributor

As a rough example, one way to highlight a downside of not using fine-grained settings can be shown with the l1tf mitigation.

Upon closer inspection, one potential limitation regarding solely using mitigations=auto,nosmt exists if did not to also manually include l1d_flush=on.

If you look closely at the kernel parameters:

    mitigations=
                    [X86,PPC,S390,ARM64] Control optional mitigations for
                    CPU vulnerabilities.  This is a set of curated,
                    arch-independent options, each of which is an
                    aggregation of existing arch-specific options.

                    auto,nosmt
                            Mitigate all CPU vulnerabilities, disabling SMT
                            if needed.  This is for users who always want to
                            be fully mitigated, even if it means losing SMT.
                            Equivalent to: l1tf=flush,nosmt [X86]
                                           mds=full,nosmt [X86]
                                           tsx_async_abort=full,nosmt [X86]
                                           mmio_stale_data=full,nosmt [X86]
                                           retbleed=auto,nosmt [X86]

Now if you focus on the L1 Terminal Fault (L1TF) mitigation parameter:

    l1tf=           [X86] Control mitigation of the L1TF vulnerability on
                          affected CPUs

                    The kernel PTE inversion protection is unconditionally
                    enabled and cannot be disabled.

                    full
                            Provides all available mitigations for the
                            L1TF vulnerability. Disables SMT and
                            enables all mitigations in the
                            hypervisors, i.e. unconditional L1D flush.

                            SMT control and L1D flush control via the
                            sysfs interface is still possible after
                            boot.  Hypervisors will issue a warning
                            when the first VM is started in a
                            potentially insecure configuration,
                            i.e. SMT enabled or L1D flush disabled.

                    full,force
                            Same as 'full', but disables SMT and L1D
                            flush runtime control. Implies the
                            'nosmt=force' command line option.
                            (i.e. sysfs control of SMT is disabled.)

                    flush
                            Leaves SMT enabled and enables the default
                            hypervisor mitigation, i.e. conditional
                            L1D flush.

                            SMT control and L1D flush control via the
                            sysfs interface is still possible after
                            boot.  Hypervisors will issue a warning
                            when the first VM is started in a
                            potentially insecure configuration,
                            i.e. SMT enabled or L1D flush disabled.

                    flush,nosmt

                            Disables SMT and enables the default
                            hypervisor mitigation.

                            SMT control and L1D flush control via the
                            sysfs interface is still possible after
                            boot.  Hypervisors will issue a warning
                            when the first VM is started in a
                            potentially insecure configuration,
                            i.e. SMT enabled or L1D flush disabled.

                    flush,nowarn
                            Same as 'flush', but hypervisors will not
                            warn when a VM is started in a potentially
                            insecure configuration.

                    off
                            Disables hypervisor mitigations and doesn't
                            emit any warnings.
                            It also drops the swap size and available
                            RAM limit restriction on both hypervisor and
                            bare metal.

                    Default is 'flush'.

Notice how the more 'secure' option is using l1tf=full,force rather than l1tf=flush,nosmt. If were to simply use mitigations=auto,nosmt and not simultaneously manually apply l1d_flush=on it would be problematic. Why the latter is not also included in mitigations=auto,nosmt I am not sure.

Overall, unless we are in actual need of additional characters, I think for the time being we should leave the kernel arguments as they are without modification. Maybe consider closing PR #200 and potentially reopening it at future date if required.

@TommyTran732
Copy link
Author

TommyTran732 commented Mar 22, 2024

Well the current default is l1d_flush=on and I am not removing it anyways. I am not advocating for removing everything and only keeping mitigations=auto,nosmt, I am advocating for removing the ones that are known to be redundant.

@raja-grewal
Copy link
Contributor

Personally, I think outsourcing responsibility to mitigations=auto,nosmt is likely not the best option at this point in time.

You still have not addressed this:

If the default mitigations=auto,nosmt settings were to change and become more lenient in the future, our hardcoded settings would remain fixed and be applied regardless.

One could make a strong case that mitigating against CPU vulnerabilities is one of the most important features of a 'hardening' project.

Currently, we have the best of both worlds, mitigations=auto,nosmt enforces many mitigations and potential future mitigations that maintainers of this repository might miss. This is then combined with our hardcoded settings that ensure strict compliance to the highest known standards which in turn makes us resilient to any potential future weakening of mitigations=auto,nosmt.

Unless we are currently in need for more characters for kernel arguments, I do not see opening up this potential exposure is necessary?

@TommyTran732
Copy link
Author

TommyTran732 commented May 22, 2024

If the default mitigations=auto,nosmt settings were to change and become more lenient in the future, our hardcoded settings would remain fixed and be applied regardless.

Well on this point, we are supposed to check what the kernel does as time goes on anyways.

Unless we are currently in need for more characters for kernel arguments, I do not see opening up this potential exposure is necessary?

Another point is that it makes management easier. Having a gigantic list of kargs does make it more painful to deal with when you need to do debugging and stuff.

I did have a quick talk with Madaidan yesterday and he just said he personally isn't sure if the whole hard coding kargs matters that much.

@raja-grewal
Copy link
Contributor

Well on this point, we are supposed to check what the kernel does as time goes on anyways.

Personally, I am not a fan of putting additional burden on volunteer contributors when it can be avoided. I would much rather implement a through general solution that is robust moving forward in time, as opposed to a very acceptable solution that requires somewhat constant vigilance.

Another point is that it makes management easier. Having a gigantic list of kargs does make it more painful to deal with when you need to do debugging and stuff.

I overall very strongly agree with everything you stated in this point. In fact, I don’t think it is possible for anyone to actually disagree with this point.

Overall, in light of all of the above discussion, I will leave it to @adrelanos to decide what approach he considers superior.

@adrelanos
Copy link
Member

Useful would be a feature request against the kernel:
mitigations=strict should always enable the most secure kernel mitigation settings.

If such a feature were to exist, then no such work downstream in security-misc would be needed.


Lower maintenance effort is preferable.

Should I add a note explicitly saying that these are redundant?

This for sure is actionable. I wouldn't mind more comments saying "this options may be superfluous because...". Or 1 longer comment explaining this and then just a tag "## potentially-superfluous".

Then as a follow-up step the potentially superfluous kernel parameters could even be moved to a separate file. Thereby making it easier for those concerned or impacted by the length of the kernel command line to easily delete these.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants