Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sync dh fpu and normal fpu #2248

Closed
wants to merge 14 commits into from
Closed

Conversation

finalpatch
Copy link
Contributor

When the dynamic core uses normal core to run instructions, we need to

  1. Copy the dh fpu state to normal fpu before entering the normal core
  2. After returning from normal core, copy the normal fpu state to dh fpu

This addresses #2247

@FeralChild64
Copy link
Collaborator

FeralChild64 commented Jan 24, 2023

Indeed, the patch fixes the issue. Strangely, I can't reproduce the problem with 0.80.1, only with GIT master. And it is the guest software (Dos Navigator) which crashes, not DOSBox Staging itself (but I am using a Linux build):

Screenshot

@finalpatch
Copy link
Contributor Author

finalpatch commented Jan 24, 2023

@FeralChild64 this only happens if the dynamic core is using the dh fpu implementation. There are 3 different fpu code paths in dosbox, the normal core that interprets everything, the dynamic core without dh fpu that runs recompiled code but calls the normal fpu functions, and then the new dh fpu that does not share the same functions and data structures with normal core. I'm able to reproduce in windows stable release too.

Regarding whether it crashes dosbox or the guest app, I guess the overflow can happen in both normal core or dynamic core. If this happens in normal core it triggers the assertion, but if it happens in dynamic core, it'll probably trigger some sort of exception in the guest app.

src/cpu/core_dyn_x86.cpp Outdated Show resolved Hide resolved
src/cpu/core_dyn_x86.cpp Outdated Show resolved Hide resolved
src/cpu/core_dyn_x86.cpp Outdated Show resolved Hide resolved
src/fpu/fpu.cpp Outdated Show resolved Hide resolved
@johnnovak
Copy link
Member

johnnovak commented Jan 25, 2023

Do we notify the mainline DOSBox folks and @joncampbell123 out of courtesy @kcgen when a core emulation related important fix like this one lands in our repo?

@kcgen
Copy link
Member

kcgen commented Jan 25, 2023

Yup, it's rare (that we have these), but I've previously informed QBix via PM. Will do the same when we eventually merge this.

Another route would be is if @finalpatch was planning to submit a .patch file to sourceforge (for SVN or 0.74.3).

@kcgen
Copy link
Member

kcgen commented Jan 25, 2023

@finalpatch , all that's left is squashing all the intermediate commits down to one (I don't think we need to have this split across multiple.. it's a nice and small susinct change).

@finalpatch
Copy link
Contributor Author

@finalpatch , all that's left is squashing all the intermediate commits down to one (I don't think we need to have this split across multiple.. it's a nice and small susinct change).

I saw some CI check failures, but they do not seem to be build or test failures.

@finalpatch
Copy link
Contributor Author

Yup, it's rare (that we have these), but I've informed QBix via PM. Will do the same when we eventually merge this.

Another route would be is if @finalpatch was planning to submit a .patch file to sourceforge (for SVN or 0.74.3).

I can send them a patch file for sure. But it should be easy enough for them to just apply the changes themselves, it's not very large after all.

@GranMinigun
Copy link
Contributor

Are there any performance implications for other software?

@Grounded0
Copy link
Collaborator

For performance profiling my enhanced DOSBENCH can be found here:
https://anttipeltola.eu/misc/files/DOSBENCH.ZIP

@kcgen
Copy link
Member

kcgen commented Jan 25, 2023

2023-01-25_07-29

(text quote)

"On testing the latest commits in the above patch with core=dynamic, there is a significant
performance penalty, such as for win32 games as available to a Win95 guest OS. It may be
worth verifying in other environments. Otherwise, a better option is to test performance (and
the above fpu state issue) while running core=dynamic with a build based on the non-x86 fpu
(instead of the x86 fpu emulation)."

@finalpatch , have you found any DOS games that cause continuous cache lookup failures or have frequently-invalidated code blocks? Maybe self-modifying code would do this?

Unfortunately all of the above benchmarks never traverse this code path. I've also tested on x86-64 and ARM64 builds on native hardware using the dynamic and dynrec cores.

@kcgen
Copy link
Member

kcgen commented Jan 25, 2023

I wasn't able to reproduce this with my existing copy of DOS Navigator.

I re-downloaded DN from https://dnosp.com/e_index.php today, just to start with a clean slate.

I grabbed the real-mode version (they start up faster and typically are smaller). Again.. couldn't reproduce it.

So I tried the DPMI version, and finally hit the issue.

2023-01-25_09-05

@finalpatch
Copy link
Contributor Author

@kcgen I changed the class and assignment call around CPU_Core_Normal_Run() to straight function calls.

There are a few other instances of CPU_Core_Normal_Run() in this file, I'm not entirely sure if they all need the same treatment. The fix is currently only applied to the one that was directly causing the problem that I can reproduce.

@finalpatch
Copy link
Contributor Author

Okay, answering my own question, we actually need to have this fpu sync around all the other CPU_Core_Normal_Run() calls.

Without these DOS debuggers won't be able to see FPU changes from dynamic core. After adding this, I'm now able to single step in Turbo Debugger and observe the FPU window updating in realtime on the dynamic core.

@finalpatch
Copy link
Contributor Author

Turbo Debugger showing FPU on dynamic core (didn't work previously)
Screenshot 2023-01-26 193914

@kcgen
Copy link
Member

kcgen commented Jan 26, 2023

Wonderful to see the turbo debugger working correctly now! This is the level of validation that shows this is right approach. 🚀

@finalpatch
Copy link
Contributor Author

Unless you find new issues, I'm happy with the state of the PR. I'll leave it to you guys to merge it when you are ready.

@finalpatch
Copy link
Contributor Author

There is definitely opportunity to clean up the FPU emulation code. I don't see any good reason to have 2 completely separate FPU states. Code would be much simpler and more efficient if we just have one.

@kcgen kcgen self-requested a review January 26, 2023 11:54
Copy link
Member

@kcgen kcgen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this PR, @finalpatch, and working through the review. I learned a lot along the way - thanks for all the detailed explanations!

@kcgen
Copy link
Member

kcgen commented Jan 26, 2023

Oh, one last change: would be able to squash the 14 commits down to one? I think one is enough to capture this change (and provide a bisectable point).

@rderooy
Copy link
Collaborator

rderooy commented Jan 26, 2023

FYI, I ran a benchmark on DOSBox-X with Win98. You can see the results here: joncampbell123/dosbox-x#3965 (comment)

@finalpatch
Copy link
Contributor Author

finalpatch commented Jan 26, 2023

Thanks for this PR, @finalpatch, and working through the review. I learned a lot along the way - thanks for all the detailed explanations!

I think you can configure github to squash merge pull requests:
https://docs.github.com/en/repositories/configuring-branches-and-merges-in-your-repository/configuring-pull-request-merges/configuring-commit-squashing-for-pull-requests

I do not have merge permission so it's up to you how you want to merge it.

@Grounded0
Copy link
Collaborator

Grounded0 commented Jan 26, 2023

@rderooy Thanks for running those benchmarks. We can absorb a hit on 16-bit clean and 16/32-bit mixed code but not for 32-bit clean code because that's where the most performance demand is. I approve based on evidence provided.

@FeralChild64
Copy link
Collaborator

@kcgen IMHO this is a good candidate for backporting to 0.80.2 - we have no idea how many glitches or sporadic crashes this bug causes.

@kcgen
Copy link
Member

kcgen commented Jan 26, 2023

I think you can configure github to squash merge pull requests. I do not have merge permission so it's up to you how you want to merge it.

We don't like using GitHub's auto-squash feature.

In part because it's a derivative: it's not what you (as the author) actually put forward for merging. Instead, it creates a huge list of the all the squashed commits.

In general, we want PRs to be in final shape, bisectable, and nicely named. Squashing isn't the norm, but it felt like the right granularity (for this PR) if anyone ever needs to bisect back to it.

There are merge conflicts, so GitHub is blocking on my side:

2023-01-26_08-50

That's OK - I can do it manually. Here's a log (just for the record)

Checkout a new local branch on the main repo to hold your commits

$ git checkout -b fp/sync-fpu-1 main

Switched to a new branch 'fp/sync-fpu-1'

Import changes from your repo

git pull git@github.com:finalpatch/dosbox-staging.git main

remote: Enumerating objects: 75, done.
remote: Counting objects: 100% (75/75), done.
remote: Compressing objects: 100% (22/22), done.
remote: Total 75 (delta 55), reused 66 (delta 53), pack-reused 0
Unpacking objects: 100% (75/75), 18.75 KiB | 206.00 KiB/s, done.
From github.com:finalpatch/dosbox-staging
 * branch                main       -> FETCH_HEAD
Updating 0e791175e..b8b8157e8
Fast-forward
 include/fpu.h            | 13 +++++++++++++
 src/cpu/core_dyn_x86.cpp | 56 +++++++++++++++++++++++++++++++++++++++++++++++++++-----
 src/fpu/fpu.cpp          | 12 ++++++++++++
 3 files changed, 76 insertions(+), 5 deletions(-)

Rebase your feature branch against main

git rebase -i main

pick 5d3e5b3f3 Sync dh fpu and normal fpu
pick bf0a01d6c refactor
pick ab3e77583 format
pick 6cac6456c fix unaligned access
pick c2b808652 fixes
pick 3a56fbf4a comments
pick 50695e7ea tighten the scope of auto_fpu_sync
pick 53037c2d7 fix formatting
pick 23b749aa0 refactor
pick 85548f04e revert incorrect index change
pick 6638986b6 Revert "revert incorrect index change"
pick f5f7d576f Revert "Revert "revert incorrect index change""
pick b8b8157e8 protected other instances of CPU_Core_Normal_Run()

Fix conflicts

Auto-merging src/cpu/core_dyn_x86.cpp
CONFLICT (content): Merge conflict in src/cpu/core_dyn_x86.cpp
error: could not apply 23b749aa0... refactor
hint: Resolve all conflicts manually, mark them as resolved with
hint: "git add/rm <conflicted_files>", then run "git rebase --continue".
hint: You can instead skip this commit: run "git rebase --skip".
hint: To abort and get back to the state before "git rebase", run "git rebase --abort".
Could not apply 23b749aa0... refactor

Squash intermediate commits

git rebase -i main

pick 69ef75db2 Sync dh fpu and normal fpu
squash e7f34e11c refactor
squash 779fabd8f format
squash aaab71ba2 fix unaligned access
squash c25f0f1a6 fixes
squash af2308099 comments
squash 326c8f3fd tighten the scope of auto_fpu_sync
squash 90b683c49 fix formatting
squash 945c281c6 refactor
squash e1986864a revert incorrect index change
squash fdc725bbc Revert "revert incorrect index change"
squash 0512a5564 Revert "Revert "revert incorrect index change""
squash df3b1b9fb protected other instances of CPU_Core_Normal_Run()

# Rebase 0e791175e..df3b1b9fb onto 0e791175e (13 commands)
#
# Commands:
# p, pick <commit> = use commit
# r, reword <commit> = use commit, but edit the commit message
# e, edit <commit> = use commit, but stop for amending
# s, squash <commit> = use commit, but meld into previous commit

Reformat the commit message

Before

# This is a combination of 13 commits.
# This is the 1st commit message:

Sync dh fpu and normal fpu
# This is the commit message #2:

refactor

# This is the commit message #3:

format

# This is the commit message #4:

fix unaligned access

# This is the commit message #5:

fixes

# This is the commit message #6:

comments

# This is the commit message #7:

tighten the scope of auto_fpu_sync

# This is the commit message #8:

fix formatting

# This is the commit message #9:

refactor

# This is the commit message #10:

revert incorrect index change

# This is the commit message #11:

Revert "revert incorrect index change"

This reverts commit 85548f04e62f4d2b42cd415ad741f1faf4e1f842.

# This is the commit message #12:

Revert "Revert "revert incorrect index change""

This reverts commit 6638986b6bc1f4fd5810c3477d0efc4e39765855.

# This is the commit message #13:

protected other instances of CPU_Core_Normal_Run()

# Please enter the commit message for your changes. Lines starting
# with '#' will be ignored, and an empty message aborts the commit.
#

After

Summarized from the PR discussion ...

Sync dh FPU and normal FPU (#2247)

When the dynamic core uses normal core to run instructions, we
need to:

1. Copy the dh FPU state to normal FPU before entering the
   normal core.

2. After returning from normal core, copy the normal FPU state
   to dh FPU.

Known affected programs (when using "core = dynamic"):

- DOS Navigator when repeating running bad commands on the
  command prompt.                       

- Turbo Debugger when showing the FPU state window and
  stepping forward line-by-line.             

This only happens if the dynamic core is using the dh FPU
implementation. There are three different FPU code paths in
DOSBox, the normal core that interprets everything, the
dynamic core without dh FPU that runs recompiled code but
calls the normal FPU functions, and then the new dh FPU that
does not share the same functions and data structures with
normal core.

This normal core run branch is triggered when the code block
was modified by the running program for more than 4 times. The
first 3 times the dyn core will just recompile the guest code.
Starting from the 4th time it will decide this part is
frequently changed so there is no point to refill the dynrec
cache and just call the normal core to run it.

Performance impact and testing:

- DOS games: Quake, Doom, Elite II (and more) plus all DOS
  benchmarks in PERFDOS2 did not follow this code path
  (presumably because they're not repeatedly modifying the same
  code block), and therefore there is no performance impact.

- Win98 tests by @rderooy using DOSBox-X show a ~20% and 5% 
  impact. https://github.com/joncampbell123/dosbox-x/pull/3965#issuec>

Future optimizations:

- The first syncing (from dh FPU to normal FPU) be made
  conditional based on the dynamic core's "state_used" flag
  (however, this flag is currently reset by the prior hardware
  FPU saver, so that needs fixing too).

- The second syncing (from normal FPU back to dh FPU) cannot 
  be made conditional yet because there is no flag indicating
  if the normal FPU was used, therefore this would need to be
  added.

Apply code formatting rules

./scripts/format-commit.sh

git version 2.37.2
Ubuntu clang-format version 16.0.0 (++20230115043638+df5fc4504b86-1~exp1~20230115163648.253)
Using paths relative to: /usr/src/dosbox-staging
clang-format -i -lines=118:122 -lines=128:135 -lines=145:147 "include/fpu.h"
clang-format -i -lines=251:269 -lines=285:285 -lines=300:304 -lines=313:333 -lines=390:396 -lines=419:419 -lines=421:421 "src/cpu/core_dyn_x86.cpp"
clang-format -i -lines=44:56 "src/fpu/fpu.cpp"

clang-format formatted some code for you.

Run 'git diff' to see what changed.
Run 'git commit -a --amend' to save the result.

Build and test

Also checked a UBSAN build -- all still running clean.

Sign off

git commit --amend --signoff

Perform a fast-forward merge to main

git merge --ff-only fp/sync-fpu-1

Updating 0e791175e..fd14ee5e4
Fast-forward
 include/fpu.h            | 19 ++++++++++++++++++-
 src/cpu/core_dyn_x86.cpp | 61 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++-----
 src/fpu/fpu.cpp          | 14 ++++++++++++++

This produces a linear commit sequence on the main branch, without the merge commit appearing.

Push

git push

Ok - done.

@kcgen
Copy link
Member

kcgen commented Jan 26, 2023

Merge performed manually outside of Github; closing.

@kcgen kcgen closed this Jan 26, 2023
@finalpatch
Copy link
Contributor Author

finalpatch commented Jan 26, 2023

@kcgen I just realise we need to move this

    if (dyn_dh_fpu.state_used) {
        gen_dh_fpu_save();
    }

into copy_dh_fpu_to_normal()
so that we can be sure it's copying the state is up to date.

Another thing is to put dh fpu related code inside a

#if defined(X86_DYNFPU_DH_ENABLED)

block so that code is still compilable when it's not defined (using the old dyn fpu)

See the other PR I made for X joncampbell123/dosbox-x#3969

@kcgen
Copy link
Member

kcgen commented Jan 26, 2023

Great catches, @finalpatch.

@finalpatch
Copy link
Contributor Author

Great catches, @finalpatch.

Would you be able to apply these changes?

@kcgen
Copy link
Member

kcgen commented Jan 27, 2023

I'll try and then include you for review.

@johnnovak
Copy link
Member

Nice work everyone! 💯 🚀

@FeralChild64
Copy link
Collaborator

@finalpatch For the future:

  1. It is best to keep your main exactly the same as Staging main, no additional commits
  2. Features are best to be developed on branches, not on the main branch
  3. Never merge the main branch to your feature branches, always do git rebase main while being on the feature branch

I did the mistake of developing a feature on the main branch too, in another project - the only reason it worked was because I was left as the only contributor.

@johnnovak
Copy link
Member

You might want to familiarise yourself with the "Git rebase workflow" that we use @finalpatch

https://www.themoderncoder.com/a-better-git-workflow-with-rebase/

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working CPU/FPU emulation Issues related to CPU or FPU emulation
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

Need to sync dh_fpu and normal fpu when the dynamic core uses normal core to run some instructions
7 participants