Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

4.2-RC3: OpenBSD 7.3 ISO doesn't boot anymore #8502

Closed
rapenne-s opened this issue Sep 8, 2023 · 30 comments
Closed

4.2-RC3: OpenBSD 7.3 ISO doesn't boot anymore #8502

rapenne-s opened this issue Sep 8, 2023 · 30 comments
Labels
affects-4.2 This issue affects Qubes OS 4.2. C: kernel C: Xen diagnosed Technical diagnosis has been performed (see issue comments). P: default Priority: default. Default priority for new issues, to be replaced given sufficient information. r4.2-host-stable regression A bug in which a supported feature that worked in a prior Qubes OS release has stopped working. T: bug Type: bug report. A problem or defect resulting in unintended behavior in something that exists.
Milestone

Comments

@rapenne-s
Copy link

How to file a helpful issue

Qubes OS release

4.2-RC3

Brief summary

The amd64boot iso for OpenBSD doesn't boot anymore since I upgraded to 4.2, the same happened for my OpenBSD HVMs that were working fine just before the upgrade.

I'd say it's not OpenBSD fault here because it was working just before the upgrade, and it's still booting fine on any other virtualization software.

7.3 is the last release to date.

Steps to reproduce

OpenBSD_failure

Expected behavior

it boots

Actual behavior

boot is failing

@rapenne-s rapenne-s added P: default Priority: default. Default priority for new issues, to be replaced given sufficient information. T: bug Type: bug report. A problem or defect resulting in unintended behavior in something that exists. labels Sep 8, 2023
@marmarek
Copy link
Member

marmarek commented Sep 8, 2023

Any non-default settings for that qube? (memory size? some qvm-features?)

@andyhhp
Copy link

andyhhp commented Sep 8, 2023

Sorry, not much to go on here. What's trap 4, because it surely isn't #OF? Any way to figure out what the faulting instruction is?

@rapenne-s
Copy link
Author

No, I also tried with a new qube created from the GUI in the qube manager with the same result. I think it's related to the CPU (Ryzen 5 5600X).

On another 4.2-RC3 with an i5-7300U this is working fine. 🤔

@andyhhp
Copy link

andyhhp commented Sep 8, 2023

What Zen uarch is the Ryzen? First thought is maybe the PKS feature

@rapenne-s
Copy link
Author

My OpenBSD qube (that is already installed) has a more explicit error

OpenBSD_fail_installed

@rapenne-s
Copy link
Author

What Zen uarch is the Ryzen? First thought is maybe the PKS feature

seems to be Zen 3
https://www.amd.com/en/products/cpu/amd-ryzen-5-5600x

@andyhhp
Copy link

andyhhp commented Sep 8, 2023

Huh. So it was a #GP on RDMSR, which is definitely odd

@rapenne-s
Copy link
Author

Could you explain a bit what #GP mean? I'm not familiar with those terms.

@andyhhp
Copy link

andyhhp commented Sep 8, 2023

#GP is general protection fault. Error code of 0 is "misc" and a dumping ground for everything.

@rapenne-s
Copy link
Author

In parallel, I'll report this on OpenBSD mailing list. Could you confirm the HVM are using qemu? (when trying to stop a stuck HVM, the qubes applet provide a way to show qemu logs.. so I guess it's using qemu)

@marmarek
Copy link
Member

marmarek commented Sep 8, 2023

yup, qemu is running as device model (but based on the crash, it's rather irrelevant here)

@andyhhp
Copy link

andyhhp commented Sep 8, 2023

This will most likely be a Xen issue, but I have to admit I'm a bit confused. %cr4.tsd can cause RDMSR to fault, but that should only effect user mode, not supervisor. Xen can intercept the instruction, but we've not had an emulation error reported here that I'm aware of. Xen does in some cases pass #GP back as a last resort error, but I don't see why that would be the case here.

I wonder if there's a weird TSC mode set for the VM, and rdtsc is generally intercepted. When the VM is crashed, can you run xl debug-keys q then gather xl dmesg from dom0 please

@rapenne-s
Copy link
Author

dump.txt

@marmarek
Copy link
Member

marmarek commented Sep 8, 2023

Can you do the same with xl debug-keys s too ?

@andyhhp
Copy link

andyhhp commented Sep 8, 2023

Sorry, I did mean s. (I'm debugging from a phone in breaks while driving, not in front of my usual dev setup)

@andyhhp
Copy link

andyhhp commented Sep 8, 2023

Right - so I'm an idiot so far. That's an RDMSR failing, not RDTSC.

Which MSR is being used in tsc_identify()?

@andyhhp
Copy link

andyhhp commented Sep 8, 2023

So I expect it is

        def = rdmsr(MSR_PSTATEDEF(0));

blowing up in tsc_freq_msr().
Something that has changed recently is Xen is advertising MSR_HWCR.TSCSFREQEL and we don't advertise pstates at all to guests under any circumstance

@DemiMarie
Copy link

@andyhhp does that mean that OpenBSD is peeking at MSRs it has no business looking at, or that Xen is advertising a feature and then injecting #GP when the guest tries to use it?

@andyhhp
Copy link

andyhhp commented Sep 8, 2023

Not sure. I don't have the manuals to hand right now, but I do recall being dubious about the HWCR patch in the first place

@DemiMarie
Copy link

HWCR?

@andyhhp
Copy link

andyhhp commented Sep 8, 2023

I'm fairly sure it was broken in Xen 4.15 by https://lore.kernel.org/xen-devel/0c8043e3-07aa-6242-19bd-07b04f574b87@suse.com/, a series committed over my objections concerning the correctness of the changes.

It appears it was to shut up Linux, which makes different and equally dubious model specific assumptions about the availability of certain MSRs.

  • It is buggy for Linux to declare TSCFREQSEL missing to be a firmware bug - it may legitimately be so due to levelling.
  • It's buggy for Xen to advertise the bit like that - because it's not levelled and not part of the migration stream.
  • It's buggy for OpenBSD to perform any model-specific checks without first checking for !hypervisor.
  • And it's probably buggy for Xen to state "TSC counts at the P0 frequency" without giving the P0 frequency, but the jury is still out on this final point because there's no possible way the guest is going to get to see Pstate information.

@DemiMarie
Copy link

@rapenne-s: I think the reason this was missed for so long is that OpenBSD is not used on Xen very much.

@andrewdavidwong andrewdavidwong added C: kernel C: Xen diagnosed Technical diagnosis has been performed (see issue comments). waiting for upstream This issue is waiting for something from an upstream project to arrive in Qubes. Remove when closed. affects-4.2 This issue affects Qubes OS 4.2. regression A bug in which a supported feature that worked in a prior Qubes OS release has stopped working. labels Sep 9, 2023
@rapenne-s
Copy link
Author

@rapenne-s: I think the reason this was missed for so long is that OpenBSD is not used on Xen very much.

does it mean it's an OpenBSD issue?

@DemiMarie
Copy link

@rapenne-s: I think the reason this was missed for so long is that OpenBSD is not used on Xen very much.

does it mean it's an OpenBSD issue?

From @andyhhp (Xen core developer and x86 expert) it appears that this is a combination of a Xen bug and an OpenBSD bug. Fixing either issue should be enough for OpenBSD to boot, but obviously fixing both would be ideal.

@mlarkin2015
Copy link

I'm fairly sure it was broken in Xen 4.15 by https://lore.kernel.org/xen-devel/0c8043e3-07aa-6242-19bd-07b04f574b87@suse.com/, a series committed over my objections concerning the correctness of the changes.

It appears it was to shut up Linux, which makes different and equally dubious model specific assumptions about the availability of certain MSRs.

  • It is buggy for Linux to declare TSCFREQSEL missing to be a firmware bug - it may legitimately be so due to levelling.
  • It's buggy for Xen to advertise the bit like that - because it's not levelled and not part of the migration stream.
  • It's buggy for OpenBSD to perform any model-specific checks without first checking for !hypervisor.
  • And it's probably buggy for Xen to state "TSC counts at the P0 frequency" without giving the P0 frequency, but the jury is still out on this final point because there's no possible way the guest is going to get to see Pstate information.

I don't understand this part:

  • It's buggy for OpenBSD to perform any model-specific checks without first checking for !hypervisor.

From what I can tell, none of the architecure PRMs/SDMs say that "an operating system should not perform model specific checks without checking !hv". There are certain MSRs that are gated behind specific CPUID bits; an OS should certainly be checking those bits before blindly assuming such MSRs exist (I just made a fix for this for HW power management on AMD last week in OpenBSD). But making the broad claim that "if CPUID_HV is set then all model specific checks should not be performed" is a bit odd.

If that bit is set, what should an OS assume? That it's running on an 80386?

Maybe I'm misunderstanding something in that claim; if the statement was intended to say:

"It's buggy for OpenBSD to read an MSR if the CPUID bit describing presence of that MSR is not set" then I'm 100% in agreement. But saying "if HV is set, then no model specific checks should be performed" is a bit odd.

Can you clarify?

@DemiMarie
Copy link

I’m not @andyhhp, but my understanding is that “model-specific checks” refers to physical CPU models, not to MSRs. A hypervisor is allowed to pick and choose which features it exposes to guests, so a guest cannot assume that a hypervisor is emulating a specific CPU that actually existed. In particular, a guest cannot assume that support for feature X implies support for feature Y unless the architecture specification says so, even if every processor shipped with feature X has also had feature Y.

In this case, Xen claims to support TSCFREQSEL. OpenBSD seems to assume that if TSCFREQSEL is present, then so are P-states. However, Xen does not expose P-states, so OpenBSD’s attempt to access the P-state MSRs faults. This is an OpenBSD bug: OpenBSD must not assume that support for TSCFREQSEL implies P-state support, even if every physical processor that ever shipped with TSCFREQSEL did in fact have P-state support.

As it happens, Xen should not actually be advertising TSCFREQSEL if live migration is enabled. This doesn’t matter for Qubes OS, though, because Qubes OS does not support live migration and so the limitations of live migration are irrelevant.

@andyhhp
Copy link

andyhhp commented Sep 12, 2023

I don't understand this part:

  • It's buggy for OpenBSD to perform any model-specific checks without first checking for !hypervisor.

From what I can tell, none of the architecure PRMs/SDMs say that "an operating system should not perform model specific checks without checking !hv". There are certain MSRs that are gated behind specific CPUID bits; an OS should certainly be checking those bits before blindly assuming such MSRs exist (I just made a fix for this for HW power management on AMD last week in OpenBSD). But making the broad claim that "if CPUID_HV is set then all model specific checks should not be performed" is a bit odd.

If that bit is set, what should an OS assume? That it's running on an 80386?

Maybe I'm misunderstanding something in that claim; if the statement was intended to say:

"It's buggy for OpenBSD to read an MSR if the CPUID bit describing presence of that MSR is not set" then I'm 100% in agreement. But saying "if HV is set, then no model specific checks should be performed" is a bit odd.

Can you clarify?

Some MSRs are architectural; you can rely on CPUID bits declaring their presence. In the case of MSR_HWCR, you can rely on the fact that you're on AMD and Long Mode is available. Architectural MSRs are described in the APM, Vol2 System Programming.

Some MSRs, including MSR_HWCR sadly, have a mix of architectural and model specific bits. The TSC_FREQ_SEL is excluded from the list of architectural bits in APM Vol2 3.2.10 Hardware Configuration Register (HWCR), so it is model specific. i.e. the meaning of that bit is not (or might not be) the same across different AMD CPUs.

The C/Pstate control MSRs are entirely model specific.

And this is the problem. When virtualised, you get given a Family/Model/Stepping at boot, but as you migrate around, you will end up on different hardware, and the meaning of those bits do change. Also, as you migrate around, the P0 frequency will change, which is why VMs specifically don't get given that info in the first place (in the general case - if you're willing to never migrate then it's safe to pass through more hardware details).

Under virt, if you're using the Family/Model/Stepping for anything more than diagnostic information, you're in for an unwelcome surprise when migrating.

@mlarkin2015
Copy link

Some MSRs are architectural; you can rely on CPUID bits declaring their presence. In the case of MSR_HWCR, you can rely on the fact that you're on AMD and Long Mode is available. Architectural MSRs are described in the APM, Vol2 System Programming.

Some MSRs, including MSR_HWCR sadly, have a mix of architectural and model specific bits. The TSC_FREQ_SEL is excluded from the list of architectural bits in APM Vol2 3.2.10 Hardware Configuration Register (HWCR), so it is model specific. i.e. the meaning of that bit is not (or might not be) the same across different AMD CPUs.

The C/Pstate control MSRs are entirely model specific.

And this is the problem. When virtualised, you get given a Family/Model/Stepping at boot, but as you migrate around, you will end up on different hardware, and the meaning of those bits do change. Also, as you migrate around, the P0 frequency will change, which is why VMs specifically don't get given that info in the first place (in the general case - if you're willing to never migrate then it's safe to pass through more hardware details).

Under virt, if you're using the Family/Model/Stepping for anything more than diagnostic information, you're in for an unwelcome surprise when migrating.

Thanks Demi and Andrew for the clarifications. When viewed from the perspective of migration, then this makes a lot more sense, and I agree with the explanations given. I'll take a look where OpenBSD is making these assumptions and see if I can clean those up.

olafhering pushed a commit to olafhering/xen that referenced this issue Sep 18, 2023
OpenBSD 7.3 will unconditionally access HWCR if the TSC is reported as
Invariant, and it will then attempt to also unconditionally access PSTATE0 if
HWCR.TscFreqSel is set (currently the case on Xen).

The motivation for exposing HWCR.TscFreqSel was to avoid warning messages from
Linux.  It has been agreed that Linux should be changed instead to not
complaint about missing HWCR.TscFreqSel when running virtualized.

The relation between HWCR.TscFreqSel and PSTATE0 is not clearly written down in
the PPR, but it's natural for OSes to attempt to fetch the P0 frequency if the
TSC is stated to increment at the P0 frequency.

Exposing PSTATEn (PSTATE0 at least) with all zeroes is not a suitable solution
because the PstateEn bit is read-write, and OSes could legitimately attempt to
set PstateEn=1 which Xen couldn't handle.

Furthermore, the TscFreqSel bit is model specific and was never safe to expose
like this in the first place.  At a minimum it should have had a toolstack
adjustment to know not to migrate such a VM.

Therefore, simply remove the bit.  Note the HWCR itself is an architectural
register, and does need to be accessible by the guest.  Since HWCR contains
both architectural and non-architectural bits, going forward care must be taken
to assert the exposed value is correct on newer CPU families.

Reported-by: Solène Rapenne <solene@openbsd.org>
Link: QubesOS/qubes-issues#8502
Fixes: 14b95b3 ('x86/AMD: expose HWCR.TscFreqSel to guests')
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
@andyhhp
Copy link

andyhhp commented Sep 18, 2023

@rapenne-s
Copy link
Author

I updated my dom0 and OpenBSD starts again. Thanks everyone who got involved to solve this!

@andrewdavidwong andrewdavidwong removed the waiting for upstream This issue is waiting for something from an upstream project to arrive in Qubes. Remove when closed. label Sep 21, 2023
@andrewdavidwong andrewdavidwong added this to the Release 4.2 milestone Sep 21, 2023
olafhering pushed a commit to olafhering/xen that referenced this issue Nov 14, 2023
OpenBSD 7.3 will unconditionally access HWCR if the TSC is reported as
Invariant, and it will then attempt to also unconditionally access PSTATE0 if
HWCR.TscFreqSel is set (currently the case on Xen).

The motivation for exposing HWCR.TscFreqSel was to avoid warning messages from
Linux.  It has been agreed that Linux should be changed instead to not
complaint about missing HWCR.TscFreqSel when running virtualized.

The relation between HWCR.TscFreqSel and PSTATE0 is not clearly written down in
the PPR, but it's natural for OSes to attempt to fetch the P0 frequency if the
TSC is stated to increment at the P0 frequency.

Exposing PSTATEn (PSTATE0 at least) with all zeroes is not a suitable solution
because the PstateEn bit is read-write, and OSes could legitimately attempt to
set PstateEn=1 which Xen couldn't handle.

Furthermore, the TscFreqSel bit is model specific and was never safe to expose
like this in the first place.  At a minimum it should have had a toolstack
adjustment to know not to migrate such a VM.

Therefore, simply remove the bit.  Note the HWCR itself is an architectural
register, and does need to be accessible by the guest.  Since HWCR contains
both architectural and non-architectural bits, going forward care must be taken
to assert the exposed value is correct on newer CPU families.

Reported-by: Solène Rapenne <solene@openbsd.org>
Link: QubesOS/qubes-issues#8502
Fixes: 14b95b3 ('x86/AMD: expose HWCR.TscFreqSel to guests')
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
master commit: e4ca4e2
master date: 2023-09-18 15:07:49 +0200
krystian-hebel pushed a commit to TrenchBoot/xen that referenced this issue Dec 8, 2023
OpenBSD 7.3 will unconditionally access HWCR if the TSC is reported as
Invariant, and it will then attempt to also unconditionally access PSTATE0 if
HWCR.TscFreqSel is set (currently the case on Xen).

The motivation for exposing HWCR.TscFreqSel was to avoid warning messages from
Linux.  It has been agreed that Linux should be changed instead to not
complaint about missing HWCR.TscFreqSel when running virtualized.

The relation between HWCR.TscFreqSel and PSTATE0 is not clearly written down in
the PPR, but it's natural for OSes to attempt to fetch the P0 frequency if the
TSC is stated to increment at the P0 frequency.

Exposing PSTATEn (PSTATE0 at least) with all zeroes is not a suitable solution
because the PstateEn bit is read-write, and OSes could legitimately attempt to
set PstateEn=1 which Xen couldn't handle.

Furthermore, the TscFreqSel bit is model specific and was never safe to expose
like this in the first place.  At a minimum it should have had a toolstack
adjustment to know not to migrate such a VM.

Therefore, simply remove the bit.  Note the HWCR itself is an architectural
register, and does need to be accessible by the guest.  Since HWCR contains
both architectural and non-architectural bits, going forward care must be taken
to assert the exposed value is correct on newer CPU families.

Reported-by: Solène Rapenne <solene@openbsd.org>
Link: QubesOS/qubes-issues#8502
Fixes: 14b95b3 ('x86/AMD: expose HWCR.TscFreqSel to guests')
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
affects-4.2 This issue affects Qubes OS 4.2. C: kernel C: Xen diagnosed Technical diagnosis has been performed (see issue comments). P: default Priority: default. Default priority for new issues, to be replaced given sufficient information. r4.2-host-stable regression A bug in which a supported feature that worked in a prior Qubes OS release has stopped working. T: bug Type: bug report. A problem or defect resulting in unintended behavior in something that exists.
Projects
None yet
Development

No branches or pull requests

7 participants