-
-
Notifications
You must be signed in to change notification settings - Fork 57
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
FPU save-restore hooks for vm86/native/cpuemu #1886
base: devel
Are you sure you want to change the base?
Conversation
src/base/emu-i386/simx86/cpu-emu.c
Outdated
if (CONFIG_CPUSIM) | ||
fp87_save_fpstate(&vm86_fpu_state); | ||
else | ||
vm86_fpu_state.cwd = (vm86_fpu_state.cwd & ~0x3f) | (TheCPU.fpuc & 0x3f); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this all it saves?
So jit doesn't really support fpu?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
JIT saves/restores from/to vm86_fpu_state elsewhere. Just need to sync with fpuc here.
But I think it would be cleaner in the end to eliminate the global vm86_fpu_state variable, and reintroduce TheCPU._fpstate for cpu-emu (removed in 7b31442).
Of course that would mean that kvm_enter etc. would need to pass a vm86_fpu_state pointer, but the state itself can be on the stack in dpmi_set_pm()
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added a patch to eliminate the global vm86_fpu_state.
The only odd one now is kvm_fpu_update()
. It's identical to kvm_enter()
but I suppose kvm_enter()
could be changed to also alter some KVM monitor fields that never change during vm86/dpmi.
In any case the port i/o function (which was buggy before depending on the backend) needs both a get
and a set
, so we'd need 4 functions per backend? enter/leave/update/getfpstate?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so checking a bit further and checking Bochs (as commented), fpu_reset()
does not need to read the FPU state as it should completely reset the FPU (zeroing all other values). The write to port 0xf0 doesn't need to clear the busy bit in the FPU status word, rather it acts like an EOI to the FPU after raising an exception (IRQ13); the code in bios.S does that in the int75 code. The busy bit has to do with fwait behaviour not exceptions.
So i'll get back to this tomorrow, also the failing test case in the CI for simulated vm86 + native DPMI
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess if you eliminate the global
state, then enter/leave are not needed,
so all you need is set_fpu_state()/
get_fpu_state()?
Also if you are at it, how about
(here or in another PR) to convert
that if (cpuvm==
mess into the
cpu_backend ops? That of course
should not be limited to just fpu
ops, so quite a bit of a routine work
would be needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But how do you know which
one of those private structs is
valid?
I propose to propagate the flag
from getter to setter - similar to
what you did before, except to
not add such a flag to any struct.
Passing it between 2 funcs is fine.
That way you will eventually have
the full fxsave in a private buffer,
so no need for 2 private buffers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The fxsave one is always valid. The fsave one is used on demand, and functions like a cache, to avoid converting back and forth for e.g. every single vm86 call.
But indeed using a flag in the setters/getters is another possibility and may eliminate the need to convert from fsave to fxsave for sure (and fxsave to fsave only for i386 in native dpmi since the kernel needs that).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(I mean valid at get and set time)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok so playing with this further I think it's simplest to use a union with fsave and fxsave but with the flag being config.cpufxsr
. And on x86_64 fsave is never used (it could be a union with one field there).
It's theoretically possible to have an i386 kernel that does not save fxsave on a CPU that can, but that's simple enough, force config.cpufxsr = 0 once you see the kernel doesn't save it in sigcontext.
To keep things consistent, the simulator can then save/restore its state directly into fsave or fxsave depending on config.cpufxsr.
I was playing with that but did not have time to finish that today.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To keep things consistent, the simulator can then save/restore its state directly into fsave or fxsave
depending on config.cpufxsr.
But that's the problem.
Simulator shouldn't care, and what
if we eventually have tcg as a simulator?
I really think its better to use the separate
flag in getters/setters and try to "accumulate"
the full fxsave state eventually.
I.e. keep exchanging the fxsave state as
now, but have an "incomplete" flag to say
that only the fsave part is trusted.
This will solve the "first-time" problem, and
on non-first time you'll accumulate the full
fxsave state, so that flag will clear.
What do you think about that?
That's a very minimal change against the
current code, as it already does the same
thing.
Tests will probably fail - in that |
7d4a194
to
5436778
Compare
We could zero out fxsave part of the state on i386. Touch only fsave part. Thanks Bart for spotting this.
@@ -257,8 +257,7 @@ static void fpu_io_write(ioport_t port, Bit8u val) | |||
{ | |||
switch (port) { | |||
case 0xf0: | |||
/* not sure about this */ | |||
vm86_fpu_state.swd &= ~0x8000; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But wasn't this also from bochs?
Untrigger of an IRQ is usually happening
as a result of clearing some status bit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No this wasn't from Bochs. I checked the Bochs source code, and Bochs does the equivalent pic_untrigger(13).
Resetting this bit is wrong, it's not related to IRQ13, rather it dates back to the time when the CPU and FPU were seperate and CPUs needed fwait
which would wait until this (FPU busy) bit was cleared by the FPU.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK but do you want to say that
just untrigger the irq is good idea?
I mean, only 2 possibilities do exist:
- IRQ is edge-delivered (not to confuse
with edge-triggered), in which case
my code was correct and doesn't need
this patch. - IRQ is level-delivered (still edge-triggered),
in which case it gets lowered only together
with some state bit. Maybe that state bit is
fully internal and is not exported to fxsave,
but it should be there if you want to change
the IRQ to level-delivery.
So in bochs is it level-delivered?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it just moves the untrigger from 3 other places in the code to the port handler. So the untrigger happens no matter what since IRQ13 invokes INT75, which does the port i/o, it just happens a bit later with this patch.
Or do you mean that we need something similar to port60_ready
in src/base/dev/misc/8042.c
.
There is some documentation here https://xem.github.io/minix86/manual/intel-x86-and-64-manual-vol1/o_7281d5ea06a5b67a-427.html
Bochs does this:
https://github.com/bochs-emu/Bochs/blob/master/bochs/iodev/extfpuirq.cc
but they don't handle IGNNE either.
This comment was marked as outdated.
This comment was marked as outdated.
Sorry, something went wrong.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
overall professional).
OK still no one uses dosemu there, and overall, what future do you see for that kind of freedos? Go to virtualbox? I can only think of the future of a 64bit clang-buildable gdb-debuggable freedos, not more, not less. :)
I do use dosemu2 with the latest FreeDOS kernels. I also have contributed to the kernel, SYS, boot loaders, and SHARE.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do use dosemu2 with the latest FreeDOS kernels.
Unless I am mistaken, you have your
own DOS kernel too. So your use of a
freedos kernel is probably not too frequent.
Though if you know why you use freedos
kernel and not fdpp, please speak up. :)
If there is anything better done in freedos,
fdpp can do that better yet. So tell me what
is it, and get it in fdpp after a while.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do use dosemu2 with the latest FreeDOS kernels.
Unless I am mistaken, you have your own DOS kernel too.
No, actually, the RxDOS kernel is not nearly usable generally. For dosemu2 the biggest missing piece is the lack of FS redirector support. I rarely use it and haven't progressed on it for years. My debugger(s) and TSRs are the only programs that I currently develop.
So your use of a freedos kernel is probably not too frequent.
Nearly all developing, testing, and building (including decompress-testing) of DOS software on our server and local Linux desktop box is done using a recent FreeDOS kernel, either in dosemu2 or qemu (or, rarely, Virtual Box 6.x). The DOS desktop box still boots MS-DOS 7.10 by default though, but I rarely use it. (The HP 95LX devices use their own MS-DOS 3.20, and I do not know how viable it is to run another kernel on that.)
Though if you know why you use freedos kernel and not fdpp, please speak up. :) If there is anything better done in freedos, fdpp can do that better yet. So tell me what is it, and get it in fdpp after a while.
Our kernel is portable among PC-compatible machines =) And it is easier for me to modify and debug, though obviously if you babysat me I could probably learn how to do the same or better for fdpp.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are only a few tricks needed
to know when debugging fdpp.
Namely, you need to add a local.mak
file into a build dir and write
EXTRA_DEBUG = 1
there, after which
you'll get an access to most freedos
variables with gdb.
If some variable is still missing, you
can get an info about it with a double
underscode, eg:
(gdb) p break_ena
No symbol "break_ena" in current context.
(gdb) p /x __break_ena
$2 = {sym = {<FarPtrBase<char>> = {ptr = {off = 0x337, seg = 0xfa96}},
obj = std::shared_ptr<class ObjIf> (empty) = {get() = 0x0}, nonnull = 0x0}}
(gdb) p lowmem_base[(0xfa96<<4)+0x337]
$3 = 1 '\001'
So you look up the address with 2
underscores and then look up the
value via lowmem_base
.
IIRC Andrew wanted to add a gdb
macros for this all, but that was on
a very early development stages so
I wasn't yet interested.
Andrew, maybe you still want to add
the gdb instrumentation for fdpp?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi Bart.
I cherry-picked that patch after the
fnclex patches, as in that case it makes
sense. I also added the check if an FPU
irq can be injected, as we discussed
above.
src/dosext/dpmi/dpmi.c
Outdated
@@ -468,7 +468,7 @@ static void dpmi_set_pm(int pm) | |||
} | |||
dpmi_pm = pm; | |||
if (config.cpu_vm != config.cpu_vm_dpmi) { | |||
emu_fpstate fpstate; | |||
static emu_fpstate fpstate; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ahha!
So here's the trick!
Global fpstate IS needed, or you
need hacks like this one.
Please just don't remove the global
fpstate then. Adding private fpstate
to cpu_emu etc is still fine, but please
update the good old vm86_fpu_state
here in this func, instead of making
things static.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't agree about the global fpstate here. It's not like it magically contained more!
The global fpstate also started "static", then got updated at init time with savefpstate from the host (not portable!), which on i386 also did not fill in all the fields! So we had exactly the same issue with global fpstate as with local fpstate!
I don't like the trick either but then the proper solution is that the "get" functions fill in ALL the state and not just parts right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So we had exactly the same issue with global fpstate as with local fpstate!
No, unless you correct the above
to say "local static fpstate". Then
they are more identical, but still
global state has many advantages,
like being available for save/restore,
being available for reset by 0xf0 and
reboot, and all that.
Neither the global fpstate or static
local fpstate has that crash problem
because they both can accumulate
and keep the proper values from whatever
back-end provides such values. If you
took full fpstate from 1 backend, then
it doesn't matter that you only take part
from back-end 2 - its still fine, the parts
from back-end 1 are preserved.
I don't like the trick either but then the proper solution is that the "get" functions fill in ALL the state and
not just parts right?
That's absolutely contrary to what I
did, and I don't see how does this solve
the problem. My idea is trivial (and
hopefully works): just preserve the
right parts of the state if you managed
to take it from either back-end. Get from
another back-end only in what he is
certain to be valid.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In fact, even though I am against
having any duality in a global state,
the backend getters can very well
return a flag specifying what state
have they filled. Maybe they can
even be provided with 2 pointers:
one for fxsave and another for
fsave, and they fill up only 1 of
them, saying which one.
Setters should accept any and
locally save the full state even if
they can use only fsave part.
In this case in all back-ends the
full fxsave state will eventually
accumulate, so they will be able
to return it, unless both back-ends
use just fsave.
If we ever want to fill up the global
state in that scheme, then we call
the getter and it will most certainly
return the full fxsave state, as it was
probably "accumulated". If not -
convert to fxsave and store in a
global state. But global state may
not be needed at all.
What do you think?
Or is this what you already done?
Maybe so, as you seem to store
both already...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The duality (if needed) is all private at the moment in this PR.
e.g. for true_vm86, "set" stores everything in private fxsave, and converts to fsave if needed.
then it'll use either fsave or fxsave
then with "get", if fsave was used, the fsave fields are merged with the fxsave fields.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But the cost of that approach is
to never share the full fxsave?
src/dosext/dpmi/dpmi.c
Outdated
static void dpmi_set_pm(int pm) | ||
{ | ||
static emu_fpstate fpstate; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, lets just leave vm86_fpu_state
in
then.
Similar to the KVM hooks.
That's the correct way, as int75 does an "out" to port f0 (idea from Bochs).
Rely on passing a temporary struct on the stack instead, state is only in backends (static for vm86, in VM for KVM, signal stack for native DPMI, TheCPU for cpu-emu).
This wasn't needed for kernel consumption but is needed when the actual frstor instruction is used.
This way it's zero initialized, and fsave_to_fxsave doesn't leave garbage. Fixes crash combining simulator and native DPMI.
Instead of enter/leave, for clarity.
5436778
to
99071c3
Compare
Only use the 512 byte structure when really needed (for explicit use of fxsave and fxrstor in true_vm86 and JIT); other places just need copies of the relevant state (x87 + first 8 XMM registers only).
Eliminates static _scp_fpregs pointer to sigcontext.
This way even without an int2 FP exception handler installed an FP exception won't hang.
@bartoldeman did you get a chance to try rebasing my ci-12 QEMU FPU test branch on top of your FPU work? I did it locally rather than in Github Actions but I'm still getting failures including some that appear to be crashes.
The individual tests are named as
and you can run them individually as say:
And the failure there is
ajb@polly:/clients/common/dosemu2.git$ tail -14 test_dos.PPDOSGITTestCase.test_fpu_qemu_fisttp_JIT_JIT.xpt C:>testit.bat C:>c:\fputest
|
@andrewbird thank for that work. No I haven't yet. Will get back to this on Friday or later. |
Some program may rely on IGNNE being set by the port 0xf0, and disable interrupts before raising an FPU exception. In that case our fnclex work-around in bios.S is not going to be called so we can get stuck in a SIGFPE flood. Just terminate dosemu is that case. Other options may include the force unmasking, or just call the inthandler directly, or clear the pending exception in an FPU state. Since the problem is fairly theoretical, use simplest approach, as was discussed with Bart.
Hi Bart, I cherry-picked a few patches So I hope that resolves the most contended |
outb %al, $0xf0 | ||
movb $0x20, %al | ||
outb %al, $0xa0 | ||
outb %al, $0x20 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Applied, thanks!
handler hasn't already done so, so we won't | ||
get stuck (in real mode IGNNE would prevent | ||
further exceptions but we don't emulate that) */ | ||
iret |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Applied, thanks!
This avoids copying FPU state when it's not necessary, similar to how its done for KVM.
I tested with real vm86 to make sure it all works!