Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JitArm64: Set flush-to-zero/rounding mode and improve float/double conversion accuracy #9458

Merged
merged 12 commits into from Apr 25, 2021

Conversation

JosJuice
Copy link
Member

@JosJuice JosJuice commented Jan 20, 2021

Fixes https://bugs.dolphin-emu.org/issues/12388. Might also fix other games that have problems with float, paired, float loadstore, or paired loadstore instructions in JitArm64, but I haven't tested any.

Left to do:

  • I need to fix the ARM64 MSVC CMake build
  • It would be useful if someone could test that ARM64 MSVC builds don't horribly crash when you start a game
  • With this change, a white square shows up on the title screen of Sonic Colors Implement accurate single/double conversion
  • Make accurate single/double conversion faster
  • Unit tests for single/double conversion
  • Sonic is falling through grind rails now? fselx needs to handle RW clobbering flags
  • Get PR PPCAnalyst: Rework the store-safe logic #9486 merged

@JosJuice

This comment has been minimized.

@JosJuice JosJuice force-pushed the arm-fpu-round branch 3 times, most recently from a3db9e9 to 29aeb17 Compare January 20, 2021 19:59
@JosJuice

This comment has been minimized.

@JosJuice

This comment has been minimized.

@JosJuice JosJuice changed the title Implement ArmFPURoundMode.cpp JitArm64: Set flush-to-zero/rounding mode and improve float/double conversion accuracy Jan 21, 2021
@JosJuice JosJuice marked this pull request as draft January 22, 2021 22:20
(2 << 22), // -inf
};

const u64 base = default_fpcr & ~(0b111 << 22);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it really OK to use a static const base value here? some reason not to just read the register at write-time?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I did it this way just because the x64 equivalent doesn't read the register at write-time – I don't see any reason why reading the register wouldn't work. What would the problem be with using a static const base value?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hypothetically something else could change the bits not being overridden here after process start, and then those settings would be lost. but idk if it's a problem.

@JosJuice JosJuice force-pushed the arm-fpu-round branch 3 times, most recently from 634565b to 4d73b95 Compare January 25, 2021 21:43
@JosJuice JosJuice force-pushed the arm-fpu-round branch 2 times, most recently from 521dca3 to ebc5743 Compare January 28, 2021 21:52
@JosJuice JosJuice force-pushed the arm-fpu-round branch 3 times, most recently from 8880fe9 to 1c84a4d Compare February 2, 2021 20:46
@JosJuice JosJuice force-pushed the arm-fpu-round branch 2 times, most recently from c376ef0 to 001522b Compare February 13, 2021 17:57
@JosJuice JosJuice marked this pull request as ready for review February 13, 2021 17:58
@JosJuice JosJuice force-pushed the arm-fpu-round branch 2 times, most recently from 4b98d18 to 0b00a21 Compare February 15, 2021 22:02
@JosJuice

This comment has been minimized.

@JosJuice

This comment has been minimized.

@caseyjbrett

This comment has been minimized.

@JosJuice

This comment has been minimized.

@caseyjbrett

This comment has been minimized.

// (if enabled in FPSCR) like almost any float operation does. We accomplish this by adding 0.0,
// which should be cheaper than FCVT 32 -> 64 followed by FCVT 64 -> 32.
m_float_emit.MOVI(8, EncodeRegToSingle(V0), 0);
m_float_emit.FADD(EncodeRegToSingle(VD), EncodeRegToSingle(VB), EncodeRegToSingle(V0));
Copy link
Member

@Tilka Tilka Apr 11, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will turn -0.0 into 0.0. To avoid this, add -0.0 instead.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I was not aware of that :)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be fixed now, assuming I've used MOVI correctly.

Copy link
Contributor

@merryhime merryhime Apr 11, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Exception if FPSCR.RMode is TowardsMinusInfinity, in which case you'd want to add +0.0 to prevent +0.0 from being converted to -0.0.

(Tests?)

Copy link
Member Author

@JosJuice JosJuice Apr 11, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm... Maybe I should just go with the double FCVT then. Having to re-emit code when the rounding mode changes seems annoying, and branching based on the rounding mode is probably worse for the performance than double FCVT. This case hopefully won't be triggered often anyway.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Briefly grepping the codebase apparently we don't currently adjust FPCR based on guest fpscr, which is slightly surprising.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's what the first commit of this PR fixes.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, I've pushed a new approach (as the last commit of this PR this time). It simply adds an additional check to the FMOV path so that we only use it if the register is store-safe.

Fixes https://bugs.dolphin-emu.org/issues/12388. Might also fix
other games that have problems with float/paired instructions
in JitArm64, but I haven't tested any.
This simplifies some of the following commits. It does require
an extra register, but hey, we have 32 of them.

Something I think would be nice to add to the register cache
in the future is the ability to keep both the single and double
version of a guest register in two different host registers
when that is useful. That way, the extra register we write to
here can be read by a later instruction, saving us from
having to perform the same conversion again.
Preparation for following commits.

This commit intentionally doesn't touch paired stores,
since paired stores are supposed to flush to zero.
(Consistent with Jit64.)
Needed because the next commit will make RW clobber flags.
Our old conversion approach became a lot more inaccurate when
enabling flush-to-zero, to the point of obviously breaking games.
If we can prove that FCVT will provide a correct conversion,
we can use FCVT. This makes the common case a bit faster
and the less likely cases (unfortunately including zero,
which FCVT actually can convert correctly) a bit slower.
I haven't observed this breaking any game, but it didn't match
the behavior of the interpreter as far as I could tell from
reading the code, in that denormals weren't being flushed.
@JMC47
Copy link
Contributor

JMC47 commented Apr 25, 2021

This has been reviewed and has been sitting for another two weeks. This has a lot of important fixes for players on AArch64 devices and if a regression sneaks in, the developer is active to fix any minor regressions.

@JMC47 JMC47 merged commit 5da85f3 into dolphin-emu:master Apr 25, 2021
10 checks passed
@JosJuice JosJuice deleted the arm-fpu-round branch April 25, 2021 14:24
@theofficialgman
Copy link

theofficialgman commented May 20, 2021

posting this here just so that its documented in the correct place instead of here #9666 and can be referred to as necessary:

I tested both 14066 (the merge associated with this PR) and the previous version 14053 to determine there is significant performance regressions. The performance drop about 13% in the games I've tested (double dash, mario kart wii, and windwaker tested).

@JMC47
Copy link
Contributor

JMC47 commented May 20, 2021

Thank you for documenting it here.

It should be known that accurate floating point math is needed for replays in double dash!! and Mario Kart Wii to sync. The bomb bounces/physics also rely on floating point math in Wind Waker. For the games to behave correctly, these changes are necessary.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
10 participants