Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kvm: Windows 3.1 installation various crashes #1492

Closed
jschwartzenberg opened this issue Jun 8, 2021 · 41 comments
Closed

kvm: Windows 3.1 installation various crashes #1492

jschwartzenberg opened this issue Jun 8, 2021 · 41 comments
Labels
kvm kvm-related problems

Comments

@jschwartzenberg
Copy link
Member

Describe the bug
When I try to install Windows 3.1 multiple times in a row:

  • sometimes it works 100% fine
  • sometimes it crashes in one spot
  • sometimes in another spot

To Reproduce
Steps to reproduce the behavior:

  1. Install Windows 3.1 in a fresh environment
  2. Install Windows 3.1 again in a fresh environment
  3. Install Windows 3.1 again in a fresh environment
  4. etc.
    There will be different issues each time. My script still automates enough that I can do many installations in a row within 10-15 to test variants.

Attach the log
What would make sense here that I post sets of screenshots plus logs? Which debug options should I supply to start with? There is nothing notable in the default logs when this occurs now.

A regression?
It was more stable in the past. I could do many installations without running into any issues.

Additional info
This is with MS-DOS 6.22.

One variant that occurred right after playing the first sound when installing SB16 drivers after the Windows installation:
dosemu-right-after-playing-first-part-startup-sound

Another variant in the middle of the Windows installation:
Screenshot_20210608_203902

@stsp
Copy link
Member

stsp commented Jun 8, 2021

Does this have to do with the
original revert or not?

@stsp
Copy link
Member

stsp commented Jun 8, 2021

Which debug options should I supply to start with?

-D9+M as a start.

@jschwartzenberg
Copy link
Member Author

Does this have to do with the
original revert or not?

No, just tested: with commit 1d8ca75 and a revert of 0db958a75d6082f847f421bcab1af1e311c1111a on top, this also occurs.

@stsp
Copy link
Member

stsp commented Jun 9, 2021

Maybe you can bisect it then, please? :)

@jschwartzenberg
Copy link
Member Author

Here is a log with -D9+M. Right after the Program Manager popped up during the installation the error occurred:
boot.log.gz

But yeah a bisect is probably faster :)

@stsp
Copy link
Member

stsp commented Jun 9, 2021

Maybe not, I'll look into the log a bit
later today. Will tell if there is a need
to bisect.

@stsp
Copy link
Member

stsp commented Jun 9, 2021

Please try native DPMI instead
of KVM. Same problem?

stsp added a commit that referenced this issue Jun 9, 2021
Uninitialized variable, valgrind finds.
I am shocked gcc did not hint about that one!
@stsp
Copy link
Member

stsp commented Jun 9, 2021

I've fixed something.
Please re-check.

@jschwartzenberg
Copy link
Member Author

Now dosemu2 crashed:
boot.log.gz

@jschwartzenberg
Copy link
Member Author

Second attempt:
Screenshot_20210610_192207

I'll look at a bisect.

@stsp
Copy link
Member

stsp commented Jun 10, 2021

There is something really weird
though: dosemu thinks the LDT
write is happening, when its not.
I suspect some memory corruption
or whatever.
Bisect would be good, but giving
me an ssh access may do as well,
because in principle I can see where
the problem happens, just no idea how.

@jschwartzenberg
Copy link
Member Author

You cannot reproduce it at all? I'm simply running the Windows 3.1 installation.

@stsp
Copy link
Member

stsp commented Jun 10, 2021

I tried a few times, it works, but
maybe it needs 10 times or who
knows what.
Anyway, please try to go before
1eba496 and 42c1823 commits.
Maybe one of these is a problem.

Also have you tried native DPMI
instead of KVM?

@jschwartzenberg
Copy link
Member Author

Isn't native DPMI the default? I would say it needs about 2-3 times.

@stsp
Copy link
Member

stsp commented Jun 10, 2021

No, its not default.

@jschwartzenberg
Copy link
Member Author

The issue was between 5fd7c33 and f92c5e9. I am testing each revision 3 times. During the bisect, I ran into this with the second run on 48714f2:
Screenshot_20210610_205532

Not sure if that's a bad or a good. Going to test more, will also test with native DPMI.

@jschwartzenberg
Copy link
Member Author

jschwartzenberg commented Jun 10, 2021

Same error but during WinG install instead of VfW:
Screenshot_20210610_210443

My script first installs WinG and then VfW, so in the run above, the WinG installation didn't run into this. The Windows install keeps going fine though, so it seems a good in regards to this issue. No, it's a bad revision, same issue with SYSTEM.DRV occurred a later run with the same commit.

@jschwartzenberg
Copy link
Member Author

Hehe indeed I'm getting closer to the point where the default was switch from native to KVM for DPMI :) I had totally missed or forgotten that was changed somehow.

@jschwartzenberg
Copy link
Member Author

jschwartzenberg commented Jun 10, 2021

Yeah I cannot reproduce this with native DPMI. At least now it is clear what with KVM DPMI the issues pop up rather randomly. I hope this useful info!

@stsp
Copy link
Member

stsp commented Jun 10, 2021

The issue was between 5fd7c33 and f92c5e9.

Indeed, because KVM was enabled
in that range. :)

@stsp
Copy link
Member

stsp commented Jun 10, 2021

Could you please try the kvm_syn
branch?

@jschwartzenberg
Copy link
Member Author

That one crashes before entering the graphical part with:

ERROR: KVM_EXIT_FAIL_ENTRY: hardware_entry_failure_reason = 0x80000021
leavedos_main(kvm_run:859|0) called - shutting down

@stsp
Copy link
Member

stsp commented Jun 11, 2021

OK, out of ideas, and it doesn't
crash for me after 3 reinstalls of
win31.
So I guess vnc would be the way
to debug this.

@stsp
Copy link
Member

stsp commented Jun 11, 2021

It almost seems like an out-of-sync
LDT. The stack segment somehow
page-faults on an LDT buffer...

stsp added a commit that referenced this issue Jun 11, 2021
We need that extension to be able to write to guest memory.
@stsp
Copy link
Member

stsp commented Jun 11, 2021

I applied some patch.
It appears we need KVM_CAP_SYNC_MMU
to be able to reliably write to guest
memory.
So the hope is not very big that
this patch will disable KVM for you.
If not, then it won't help.
It can't help other than by disabling kvm.

stsp added a commit that referenced this issue Jun 12, 2021
May be needed because we frequently update LDT buffer
from host. Guest should not cache it.
@jschwartzenberg
Copy link
Member Author

Now latest devel crashes for me each time before it would go in the graphical part as well:

ERROR: unexpected CPU exception 0x0e err=0x00000007 cr2=0ab06814 while in vm86 (DOS)

Real-mode state dump:
EIP: f000:0000feb3 ESP: 05a5:000001fa  VFLAGS(b): 00011 00110000 00010111
EAX: 00000808 EBX: 00000810 ECX: 00000001 EDX: 000006d6 VFLAGS(h): 00033017
ESI: 00000118 EDI: 00000008 EBP: 00000efe DS: 9eb0 ES: 05a5 FS: 0000 GS: 0000
FLAGS: CF PF AF IF RF VM  IOPL: 3
STACK: 2f ff 00 f0 86 30 00 00 b0 9e -> 09 f5 00 f0 17 30 eb 0b ba c6 
OPS  : f0 b0 20 e6 a0 e6 20 cd 02 cf -> eb 3f 00 00 00 00 00 00 00 00 
        eb3f                f000:feb3 jmp  short FEF4 ($+3f)

leavedos_main(leavedos_from_sig:593|4) called - shutting down
coopthreads stopped

@stsp
Copy link
Member

stsp commented Jun 12, 2021

And if you just start windows, not install?

@jschwartzenberg
Copy link
Member Author

That doesn't crash.

@stsp
Copy link
Member

stsp commented Jun 12, 2021

Thanks for ssh.
It seems KVM_SET_SREGS doesn't
work properly for you.
For example this change:

--- a/src/base/emu-i386/kvm.c
+++ b/src/base/emu-i386/kvm.c
@@ -794,8 +794,8 @@ static unsigned int kvm_run(struct vm86_regs *regs)
   struct kvm_regs kregs = {};
   static struct vm86_regs saved_regs;
 
-  if (run->exit_reason != KVM_EXIT_HLT &&
-      memcmp(regs, &saved_regs, sizeof(*regs))) {
+  if (1/*run->exit_reason != KVM_EXIT_HLT &&
+      memcmp(regs, &saved_regs, sizeof(*regs))*/) {
     /* Only set registers if changes happened, usually
        this means a hardware interrupt or sometimes
        a callback, and also for the very first call to boot */

makes dosemu to not even start
on your CPU (and obviously on
Andrewbird's one, too), but I checked
on AMD FX and Intel Core i7, and
dosemu works fine with that change.
So I suppose this is some KVM bug
on Core2Duo.

@stsp
Copy link
Member

stsp commented Jun 12, 2021

Without this change, KVM_SET_SREGS
is done very infrequently, because most
of the time the KVM monitor is setting the
segregs. Which is why the problem is
very random.

@andrewbird
Copy link
Member

So is that a kernel problem?

@jschwartzenberg
Copy link
Member Author

Could it also be a microcode bug?

@jschwartzenberg
Copy link
Member Author

First run on my other notebook:
afbeelding

@stsp
Copy link
Member

stsp commented Jun 12, 2021

I very much suspect its an LDT
getting out of sync. But I have no
proofs and I don't know how to
even try to prove that.
I'll ask kvm list for help.

stsp added a commit that referenced this issue Jun 13, 2021
This reverts commit 53ef318.

KVM_CAP_SYNC_MMU description in kernel docs seems invalid.
It says:
---
When the KVM_CAP_SYNC_MMU capability is available, changes in the backing of
the memory region are automatically reflected into the guest.
---

But it seems like it is actually only needed when we mmap()
that memory region to something else.
@stsp
Copy link
Member

stsp commented Jun 14, 2021

Could you please try this change?

--- a/src/base/emu-i386/kvm.c
+++ b/src/base/emu-i386/kvm.c
@@ -283,7 +283,7 @@ void init_kvm_monitor(void)
   mprotect_kvm(MAPPING_KVM, sregs.tr.base + offsetof(struct monitor, code),
               sizeof(monitor->code), PROT_READ | PROT_EXEC);
 
-  sregs.cr0 |= X86_CR0_PE | X86_CR0_PG | X86_CR0_NE | X86_CR0_ET;
+  sregs.cr0 |= X86_CR0_PE | X86_CR0_PG | X86_CR0_NE | X86_CR0_ET | X86_CR0_CD;
   sregs.cr4 |= X86_CR4_VME;
 
   /* setup registers to point to VM86 monitor */

It will at least rule out the
caching problem.

@stsp
Copy link
Member

stsp commented Jun 14, 2021

@sean-jc have suggested to disable
unrestricted guest on the I7 CPU I used
for testing, and indeed, now I can
reproduce the problem myself.
So no need to check for CR0_CD -
I tested it and it changes nothing.

Hope Sean can tell what to do next. :)

@stsp
Copy link
Member

stsp commented Jun 14, 2021

@sean-jc I am getting mail rejects
from your address.
JFYI,

@stsp stsp changed the title Windows 3.1 installation various crashes kvm: Windows 3.1 installation various crashes Jun 15, 2021
@stsp stsp added the kvm kvm-related problems label Jun 15, 2021
stsp added a commit that referenced this issue Jun 18, 2021
stsp added a commit that referenced this issue Jun 18, 2021
We don't use them (yet) as our guest does not do LTR or LLDT.
But the proper GDT setup can avoid surprises in the future, for
example if KVM adds more sanity checks.
@stsp stsp closed this as completed in be7dea7 Jun 18, 2021
@stsp
Copy link
Member

stsp commented Jun 18, 2021

This should now be fixed.
I received the great help in the
KVM list.
I believe there will be the kvm
patches in kernel, too, but the
fix on our side appeared quite
simple.

If the problem persists please
open another ticket.

@jschwartzenberg
Copy link
Member Author

Now it crashes before entering the graphical part, I'll open another ticket.

@andrewbird
Copy link
Member

Cool, with this fix the KVM tests pass for me locally for the first time in many months. Thank you!

@stsp
Copy link
Member

stsp commented Jun 19, 2021

So we ruled out invalid guest state
(that was hulking on us in many tickets),
but the page fault problem is still here.

stsp added a commit that referenced this issue Dec 15, 2022
This reverts commit d5fecf0.

We need SYNC_MMU because we now map dpmi shm behind kvm's back.
[#1839]
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kvm kvm-related problems
Projects
None yet
Development

No branches or pull requests

3 participants