Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OpenBSD SMP kernel crashes bhyve (perhaps only on AMD-V hardware?) #248

Closed
despair86 opened this issue Jan 6, 2020 · 6 comments
Closed

Comments

@despair86
Copy link

despair86 commented Jan 6, 2020

After a normal installation of OpenBSD/amd64, any attempt to boot the SMP kernel will crash bhyve after failing to simulate a conditional branch followed by PAUSE while attempting to spin up the second CPU:

OS:

[root@sun-srv2 /zones/ed32fa7a-4845-4d37-9b58-dc8f99b4310a/cores]# uname -a
SunOS sun-srv2 5.11 joyent_20191121T115853Z i86pc i386 i86pc Solaris

Hardware:

[root@sun-srv2 ~]# psrinfo -vp
The physical processor has 16 virtual processors (0-15)
  x86 (AuthenticAMD 600F20 family 21 model 2 step 0 clock 2800 MHz)
        AMD Opteron(tm) Processor 6386 SE       [ Socket: G34 ]
The physical processor has 16 virtual processors (16-31)
  x86 (AuthenticAMD 600F20 family 21 model 2 step 0 clock 2800 MHz)
        AMD Opteron(tm) Processor 6386 SE       [ Socket: G34 ]
The physical processor has 16 virtual processors (32-47)
  x86 (AuthenticAMD 600F20 family 21 model 2 step 0 clock 2800 MHz)
        AMD Opteron(tm) Processor 6386 SE       [ Socket: G34 ]
The physical processor has 16 virtual processors (48-63)
  x86 (AuthenticAMD 600F20 family 21 model 2 step 0 clock 2800 MHz)
        AMD Opteron(tm) Processor 6386 SE       [ Socket: G34 ]
probing: pc0 com0 com1 mem[640K 3049M 16M 4M 64K 1024M]
disk: hd0 hd1
>> OpenBSD/amd64 BOOTX64 3.48
switching console to com0
>> OpenBSD/amd64 BOOTX64 3.48
boot>
booting hd0a:/bsd: 12830024+2741264+340000+0+708608 [799870+128+1016856+743598]=0x124d5c8
entry point at 0x1001000
[ using 2561480 bytes of bsd ELF symbol table ]
Copyright (c) 1982, 1986, 1989, 1991, 1993
        The Regents of the University of California.  All rights reserved.
Copyright (c) 1995-2020 OpenBSD. All rights reserved.  https://www.OpenBSD.org

OpenBSD 6.6-current (GENERIC.MP) #584: Sat Jan  4 14:08:54 MST 2020
    deraadt@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP
real mem = 4277514240 (4079MB)
avail mem = 4135415808 (3943MB)
mpath0 at root
scsibus0 at mpath0: 256 targets
mainbus0 at root
bios0 at mainbus0: SMBIOS rev. 3.0 @ 0xbfb58000 (14 entries)
bios0: vendor BHYVE version "1.00" date 03/14/2014
bios0: Joyent SmartDC HVM
acpi0 at bios0: ACPI 4.0
acpi0: sleep states S5
acpi0: tables DSDT FACP HPET APIC MCFG SPCR
acpi0: wakeup devices
acpitimer0 at acpi0: 3579545 Hz, 32 bits
acpihpet0 at acpi0: 16777216 Hz
acpimadt0 at acpi0 addr 0xfee00000: PC-AT compat
cpu0 at mainbus0: apid 0 (boot processor)
cpu0: AMD Opteron(tm) Processor 6386 SE, 2800.79 MHz, 15-02-00
cpu0: FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,MMX,FXSR,SSE,SSE2,HTT,SSE3,PCLMUL,SSSE3,FMA3,CX16,SSE4.1,SSE4.2,POPCNT,AES,XSAVE,AVX,F16C,HV,NXE,MMXX,PAGE1GB,LONG,LAHF,CMPLEG,EAPICSP,AMCR8,ABM,SSE4A,MASSE,3DNOWP,XOP,SKINIT,FMA4,TBM,TOPEXT,BMI1
cpu0: 64KB 64b/line 2-way I-cache, 16KB 64b/line 4-way D-cache, 2MB 64b/line 16-way L2 cache, 12MB 64b/line 128-way L3 cache
cpu0: ITLB 48 4KB entries fully associative, 24 4MB entries fully associative
cpu0: DTLB 64 4KB entries fully associative, 64 4MB entries fully associative
cpu0: smt 0, core 0, package 6
mtrr: CPU supports MTRRs but not enabled by BIOS
cpu0: apic clock running at 134MHz
cpu1 at mainbus0: apid 1 (application processor)

[NOTICE: Zone halted]
[root@sun-srv2 /zones/ed32fa7a-4845-4d37-9b58-dc8f99b4310a/logs]# tail platform.log
{"log":"wrmsr to register 0xc0011029(0x3) on vcpu 0\n","stream":"stderr","time":"2020-01-05T15:25:48.901606000Z"}
{"log":"fbuf frame buffer base: fffffc7feb600000 [sz 16777216]\r\n","stream":"stdout","time":"2020-01-06T01:42:27.439048000Z"}
{"log":"Configuration from /var/run/bhyve/zhyve.cmd:\nnvlist version: 0\n\tbhyve_args = bhyve","stream":"stdout","time":"2020-01-06T01:42:28.186421000Z"}
{"log":" -H -U ed32fa7a-4845-4d37-9b58-dc8f99b4310a -B 1,manufacturer=Joyent,product=SmartDC HVM,version=7.20191121T115853Z,serial=ed32fa7a-4845-4d37-9b58-dc8f99b4310a,sku=001,family=Virtual Machine -s 31,lpc -l bootrom,/usr/share/bhyve/uefi-rom.bin -l com1,/dev/zconsole -l com2,socket,/tmp/vm.ttyb -s 0,hostbridge,model=i440fx -c 4 -m 4096 -s 0:4:0,virtio-blk,/dev/zvol/rdsk/zones/ed32fa7a-4845-4d37-9b58-dc8f99b4310a/disk0 ","stream":"stdout","time":"2020-01-06T01:42:28.186582000Z"}
{"log":"-s 0:4:1,virtio-blk,/dev/zvol/rdsk/zones/ed32fa7a-4845-4d37-9b58-dc8f99b4310a/disk1 -s 6:0,virtio-net-viona,net0 -w -c sockets=4,cores=1,threads=1 -s 30:0,fbuf,vga=off,unix=/tmp/vm.vnc -s 30:1,xhci,tablet SYSbhyve-103\n","stream":"stdout","time":"2020-01-06T01:42:28.186675000Z"}
{"log":"wrmsr to register 0xc0011029(0x3) on vcpu 0\n","stream":"stderr","time":"2020-01-06T01:42:32.258593000Z"}
{"log":"Failed to emulate instruction [0xf7 0x04 0x25 0x00 0x53 0xf0 ","stream":"stderr","time":"2020-01-06T01:42:33.267534000Z"}
{"log":"fbuf frame buffer base: fffffc7feb600000 [sz 16777216]\r\n","stream":"stdout","time":"2020-01-06T01:42:33.267618000Z"}
{"log":"0x81 0x00 0x10 0x00 0x00 0x74 0x08 0xf3 0x90] at 0xffffffff81735790\n","stream":"stderr","time":"2020-01-06T01:42:33.267730000Z"}
{"event":"close","stream":"logfile","time":"2020-01-06T01:43:38.132591000Z"}

The instruction stream is F7 04 25 00 53 F0 81 00 10 00 00 74 08 F3 90 - a comparison, with a conditional branch, followed by PAUSE

Booting the single-processor kernel works as expected.

@despair86
Copy link
Author

despair86 commented Jan 6, 2020

Full zone log and a core file.
platform.log
core.bhyve.15002.gz

https://twitter.com/__rvx86/status/1213829887770976256?s=20
a thread with some more info

@jasonbking
Copy link

This looks like illumos#12998 which is fixed in 20200813T030805Z or newer. I'd try a platform image with that fix and see if that solves the issue.

@pfmooney
Copy link

pfmooney commented Sep 4, 2020

This looks like illumos#12998 which is fixed in 20200813T030805Z or newer. I'd try a platform image with that fix and see if that solves the issue.

There are two failures at play here, AFAICT. The first is the wrmsr failure, which indeed should be helped by #12998. The second is the failure decoding the testl instruction, which is probably used by the obsd debugger waiting on some MMIO value to change. That would have been address in the 2019 Sept Sync (merged in 2020 March) which added the ability for the bhyve instruction emulation to handle test variants.

@despair86
Copy link
Author

just got around to testing this with the platform image from a week or two ago, fix is confirmed. thank you so much!

@despair86
Copy link
Author

despair86 commented Sep 12, 2020

side note: does bhyve expose any kind of keyboard? it has a USB 3.x tablet as a pointing device but i can't type anything into a winlogon(1) or a xdm(1) via VNC

on windows at least i can summon the on-screen keyboard until i configure RDP enabled

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants