New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SPU LLVM: Partial revert for FM/FMA changes and other improvements #8338
Conversation
- Revert changes to FM and FMA instructions - Allow non accurate/approx FMA family instructions to use native FMA - Minor optimization for FMA ops with a constant 0 multiply
@@ -7355,7 +7355,7 @@ class spu_llvm_recompiler : public spu_recompiler_base, public cpu_translator | |||
{ | |||
const u32 exponent = data._u32[i] & 0x7f800000u; | |||
|
|||
if (data._u32[i] > 0x7f7fffffu || !exponent) | |||
if (data._u32[i] >= 0x7f7fffffu || !exponent) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nothing was really wrong in this line, in fact it can be even written as if (data._s32[i] < 0 || !exponent)
as an optimization.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In theory there's nothing wrong with if (data._s32[i] < 0 || !exponent)
but we produce "extended range" values differently than real hardware so it might not always be safe. If we had 100% accurate softfloat for all other instructions it'd be fine, but it's safer to match the output that the "non constant" path will produce I think.
Likewise, >= 0x7f7fffff
vs > 0x7f7fffff
is similar, and this one doesn't really matter, but for my peace of mind I'd rather match the "non constant" path behavior here too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually I just remembered, I originally wrote it in a similar way (Checked for any positive number that isn't 0), but there were issues in a lot of games. It's specifically because it was producing different results for values larger than 0x7f7fffff
, changing it to the current implementation fixed all of those issues.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a new comment so it doesn't get forgotten in the future.
Fixes exploding vertices introduced by #8316 in NCAA 14 |
GTA IV needs accurate FM again in order to not fall out of the world. |
Pushed some more optimizations |
This PR causes regression in Ratchet and Clank Tools of Destruction, in which you cannot complete the Voron Asteroid Belt run. EDIT: [BCUS98127] Ratchet & Clank Future: Tools of Destruction Crashlog: RPCS3.zip Savegame: BCUS98127_SAVE_14.zip For the savegame, once you load it, you have to get at least halfway through the level, right after the port where you control Clank on the rear turret. |
There's another crash (also in a flying mission, related?) just before going into the asteroid, same spot every time. |
Retest regressions with #8454 |
I found it |
Download the azure artifacts... |
Yeah.....I found it.....god this stuff makes me feel so stupid. EDIT: testing now ""RPCS3 v0.0.10-89d98ae3 Alpha | patch-8 | Firmware version: 4.86"" |
Seems to be working, got through the savegame without crashing twice. |
Is the regression issue being worked on? |
There was a silly issue where FCGT could produce bad results if constant
0x7f7fffff
was used as an input, but it's unlikely that any game hit this casePartially reverts the changes made in SPU LLVM: Use clamping helpers for FMA32x4 and FM #8316
Sadly RDR and some other games were effected negatively by the changes in that PR. This PR reverts enough to make both RDR and LBP2 happy, but maybe something else is still regressed.
also:
edit:
op.ra
==op.rb