New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
JitArm64: Set flush-to-zero/rounding mode and improve float/double conversion accuracy #9458
Conversation
This comment has been minimized.
This comment has been minimized.
a3db9e9
to
29aeb17
Compare
This comment has been minimized.
This comment has been minimized.
29aeb17
to
d03946e
Compare
This comment has been minimized.
This comment has been minimized.
19bf663
to
4cc598e
Compare
| (2 << 22), // -inf | ||
| }; | ||
|
|
||
| const u64 base = default_fpcr & ~(0b111 << 22); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is it really OK to use a static const base value here? some reason not to just read the register at write-time?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I did it this way just because the x64 equivalent doesn't read the register at write-time – I don't see any reason why reading the register wouldn't work. What would the problem be with using a static const base value?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hypothetically something else could change the bits not being overridden here after process start, and then those settings would be lost. but idk if it's a problem.
634565b
to
4d73b95
Compare
521dca3
to
ebc5743
Compare
8880fe9
to
1c84a4d
Compare
c376ef0
to
001522b
Compare
4b98d18
to
0b00a21
Compare
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
| // (if enabled in FPSCR) like almost any float operation does. We accomplish this by adding 0.0, | ||
| // which should be cheaper than FCVT 32 -> 64 followed by FCVT 64 -> 32. | ||
| m_float_emit.MOVI(8, EncodeRegToSingle(V0), 0); | ||
| m_float_emit.FADD(EncodeRegToSingle(VD), EncodeRegToSingle(VB), EncodeRegToSingle(V0)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will turn -0.0 into 0.0. To avoid this, add -0.0 instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, I was not aware of that :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should be fixed now, assuming I've used MOVI correctly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Exception if FPSCR.RMode is TowardsMinusInfinity, in which case you'd want to add +0.0 to prevent +0.0 from being converted to -0.0.
(Tests?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm... Maybe I should just go with the double FCVT then. Having to re-emit code when the rounding mode changes seems annoying, and branching based on the rounding mode is probably worse for the performance than double FCVT. This case hopefully won't be triggered often anyway.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Briefly grepping the codebase apparently we don't currently adjust FPCR based on guest fpscr, which is slightly surprising.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's what the first commit of this PR fixes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, I've pushed a new approach (as the last commit of this PR this time). It simply adds an additional check to the FMOV path so that we only use it if the register is store-safe.
49d9bcd
to
cb61adf
Compare
Fixes https://bugs.dolphin-emu.org/issues/12388. Might also fix other games that have problems with float/paired instructions in JitArm64, but I haven't tested any.
This simplifies some of the following commits. It does require an extra register, but hey, we have 32 of them. Something I think would be nice to add to the register cache in the future is the ability to keep both the single and double version of a guest register in two different host registers when that is useful. That way, the extra register we write to here can be read by a later instruction, saving us from having to perform the same conversion again.
Preparation for following commits. This commit intentionally doesn't touch paired stores, since paired stores are supposed to flush to zero. (Consistent with Jit64.)
Needed because the next commit will make RW clobber flags.
Our old conversion approach became a lot more inaccurate when enabling flush-to-zero, to the point of obviously breaking games.
If we can prove that FCVT will provide a correct conversion, we can use FCVT. This makes the common case a bit faster and the less likely cases (unfortunately including zero, which FCVT actually can convert correctly) a bit slower.
I haven't observed this breaking any game, but it didn't match the behavior of the interpreter as far as I could tell from reading the code, in that denormals weren't being flushed.
|
This has been reviewed and has been sitting for another two weeks. This has a lot of important fixes for players on AArch64 devices and if a regression sneaks in, the developer is active to fix any minor regressions. |
|
posting this here just so that its documented in the correct place instead of here #9666 and can be referred to as necessary: I tested both 14066 (the merge associated with this PR) and the previous version 14053 to determine there is significant performance regressions. The performance drop about 13% in the games I've tested (double dash, mario kart wii, and windwaker tested). |
|
Thank you for documenting it here. It should be known that accurate floating point math is needed for replays in double dash!! and Mario Kart Wii to sync. The bomb bounces/physics also rely on floating point math in Wind Waker. For the games to behave correctly, these changes are necessary. |
Fixes https://bugs.dolphin-emu.org/issues/12388. Might also fix other games that have problems with float, paired, float loadstore, or paired loadstore instructions in JitArm64, but I haven't tested any.
Left to do:
With this change, a white square shows up on the title screen of Sonic ColorsImplement accurate single/double conversionSonic is falling through grind rails now?fselx needs to handle RW clobbering flags