Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Correctly emulate FPU concurrent execution timings #2022

Merged
merged 5 commits into from Jan 29, 2022

Conversation

Copy link
Contributor

@goshhhy goshhhy commented Jan 29, 2022

Summary

since the original 8087, x87 floating point acted as a separate execution unit from the main processor, a feature which Intel called "concurrent execution": when an FPU instruction is run, the main processor has to spend a few cycles handing the instruction information and any relevent data (values from memory, etc) to the FPU. after that, for the remainder of the time that the FPU is running the instruction, the main CPU is free to continue running integer work.

naive emulation of x86 processors does not account for this. e.g. when running the following routine on an emulated i486,

  fdiv %st(1), %st(0)
  movl $0xFFFFFFFF, %eax
  movl %0xFFFFFFFF, %edx
  imull %edx
  fstps (%ecx)

a naive fpu emulation (assuming that imull is calculated correctly for the input values as 42 cycles, rather than averaged) would run this routine in 124 cycles - 73 for the fdiv, 1 for each mov, 42 for the imull, and 7 for the fstps.

on real hardware, while the fdiv does take 73 cycles if the fpu is in 80-bit precision mode, only 3 of those will stall the main cpu - 70 cycles are concurrent. this means that, overall, the above routine would actually take 80 cycles: the fdiv takes 73 cycles, the movls and imull run concurrently with the fdiv and therefore take a combined effective 0 cycles (but take up 44 total of the concurrent execution time), and then the fstps takes 7 cycles.

this set of patches adds support into 86box for emulating this properly, and adds timing information for the i486 taken from its datasheet.

running it with some of the tests i've been developing in qmark, a testing program i have been using as part of developing 486quake, everything appears to be working properly, and the timings look more like they should, no longer setting off my timing-related emulation checks (one behavioral check still does fire).

before

Screen Shot 2022-01-29 at 04 16 33

after

Screen Shot 2022-01-29 at 07 53 02

note: this patchset is only tested with the interpreter and the new dynarec - i am not able to test the old dynarec.

References

i486 Processor Programmer's Reference Manual

goshhhy and others added 5 commits Jan 29, 2022
add CLOCK_CYCLES_FPU, which does exactly what CLOCK_CYCLES already did.

add CONCURRENCY_CYCLES, which sets fpu_cycles, which is the number of
available concurrent execution cycles that the integer unit can do
"free" work in while the fpu is executing.

adjust CLOCK_CYCLES so that if there are fpu_cycles, the cycle count is
subtracted from fpu_cycles instead of cycles, emulating the behavior of
these concurrent cycles being "free" as on real hardware.
@goshhhy goshhhy marked this pull request as ready for review Jan 29, 2022
@OBattler OBattler merged commit 0545de2 into 86Box:master Jan 29, 2022
0 of 36 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants