Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SPU LLVM: Partial revert for FM/FMA changes and other improvements #8338

Merged
merged 4 commits into from Jun 6, 2020

Conversation

Whatcookie
Copy link
Member

@Whatcookie Whatcookie commented Jun 4, 2020

  • Fix a theoretical issue with FCGT optimizations
    There was a silly issue where FCGT could produce bad results if constant 0x7f7fffff was used as an input, but it's unlikely that any game hit this case
  • Partial revert for FM/FMA changes and other improvements
    Partially reverts the changes made in SPU LLVM: Use clamping helpers for FMA32x4 and FM #8316

Sadly RDR and some other games were effected negatively by the changes in that PR. This PR reverts enough to make both RDR and LBP2 happy, but maybe something else is still regressed.

also:

  • Allow non accurate/approx FMA family instructions to use native FMA
  • Minor optimization for FMA ops with a constant 0 multiply
  • fixes an unreported regression with shadows in NCAA 14

edit:

  • Added additional optimizations for FCGT
  • Added an optimization for FM when op.ra == op.rb

- Revert changes to FM and FMA instructions
- Allow non accurate/approx FMA family instructions to use native FMA
- Minor optimization for FMA ops with a constant 0 multiply
@@ -7355,7 +7355,7 @@ class spu_llvm_recompiler : public spu_recompiler_base, public cpu_translator
{
const u32 exponent = data._u32[i] & 0x7f800000u;

if (data._u32[i] > 0x7f7fffffu || !exponent)
if (data._u32[i] >= 0x7f7fffffu || !exponent)
Copy link
Contributor

@elad335 elad335 Jun 4, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nothing was really wrong in this line, in fact it can be even written as if (data._s32[i] < 0 || !exponent) as an optimization.

Copy link
Member Author

@Whatcookie Whatcookie Jun 4, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In theory there's nothing wrong with if (data._s32[i] < 0 || !exponent) but we produce "extended range" values differently than real hardware so it might not always be safe. If we had 100% accurate softfloat for all other instructions it'd be fine, but it's safer to match the output that the "non constant" path will produce I think.

Likewise, >= 0x7f7fffff vs > 0x7f7fffff is similar, and this one doesn't really matter, but for my peace of mind I'd rather match the "non constant" path behavior here too.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually I just remembered, I originally wrote it in a similar way (Checked for any positive number that isn't 0), but there were issues in a lot of games. It's specifically because it was producing different results for values larger than 0x7f7fffff, changing it to the current implementation fixed all of those issues.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a new comment so it doesn't get forgotten in the future.

@connorwalks
Copy link

Fixes exploding vertices introduced by #8316 in NCAA 14

@xddxd
Copy link
Contributor

xddxd commented Jun 4, 2020

GTA IV needs accurate FM again in order to not fall out of the world.

@Whatcookie
Copy link
Member Author

Pushed some more optimizations

@AluminumHaste
Copy link

AluminumHaste commented Jun 15, 2020

This PR causes regression in Ratchet and Clank Tools of Destruction, in which you cannot complete the Voron Asteroid Belt run.
The PR's before this work fine.

EDIT: [BCUS98127] Ratchet & Clank Future: Tools of Destruction
Ingame since 2018-09-24 (PR #5162), Wiki Page
Build Info
RPCS3 v0.0.10-10468-dcf5c06d Alpha | HEAD | FW 4.86 | Windows 10 1909
AMD Ryzen 9 3900X | 24 Threads | 15.93 GiB RAM | TSC: 3.800GHz | AVX+ | FMA3
GPU: AMD Radeon RX 5700 XT (20.3.1 - 20.5.1)
Per-game CPU Settings
PPU Decoder:    Recompiler/LLVM
SPU Decoder:    Recompiler/LLVM
SPU Lower Thread Priority:  [ ]
SPU Loop Detection:         [x]
Thread Scheduler:           [ ]
SPU Threads:               Auto
SPU Block Size:            Safe
Accurate xfloat:            [ ]
Force CPU Blit:             [ ]
Lib Mode:      Liblv2.sprx only
Per-game GPU Settings
Renderer:                Vulkan
Resolution:            1280x720
Resolution Scale:           300
Res Scale Threshold:         16
Anti-Aliasing:         Disabled
Anisotropic Filter:          16
RSX Buffers:                WCB
Shader Mode:              Async
ZCull:                     Full
Frame Limit:              VSync
Fatal Error
Thread terminated due to fatal error: Dead FIFO commands queue state has been detected!
Try increasing "Driver Wake-Up Delay" setting in Advanced settings.
(in file d:\a\1\s\rpcs3\Emu\RSX\RSXThread.cpp:2325)
Important Settings to Review
ℹ️ Enabling Thread Scheduler may or may not increase performance
Notes
⚠️ This RPCS3 build is 1 week old, please consider updating it
⚠️ To change custom configuration, Right-click on the game, then Configure
ℹ️ Game version: v1.00
ℹ️ Main hash: PPU-c14042df6304d3e420a9917e6f8e5fc05cc38b4c
Log from AluminumHaste | 205231765424570369
| Discord attachment | Parsed 100%

Crashlog: RPCS3.zip

Savegame: BCUS98127_SAVE_14.zip

For the savegame, once you load it, you have to get at least halfway through the level, right after the port where you control Clank on the rear turret.

@AluminumHaste
Copy link

There's another crash (also in a flying mission, related?) just before going into the asteroid, same spot every time.

BCUS98127_SAVE_3.zip

@elad335
Copy link
Contributor

elad335 commented Jun 17, 2020

Retest regressions with #8454

@AluminumHaste
Copy link

AluminumHaste commented Jun 18, 2020

Okay I would like to but dumb question; how?
In SVN I can just checkout at build #, how do you do that with Tortoise Git? I did a repo browser, but I don't see #8454, I'm guessing it's because it hasn't been committed to Master yet?

I found it

@illusion0001
Copy link
Contributor

Download the azure artifacts...

@AluminumHaste
Copy link

AluminumHaste commented Jun 18, 2020

Download the azure artifacts...

Yeah.....I found it.....god this stuff makes me feel so stupid.

EDIT: testing now ""RPCS3 v0.0.10-89d98ae3 Alpha | patch-8 | Firmware version: 4.86""

@AluminumHaste
Copy link

Seems to be working, got through the savegame without crashing twice.

@CoolEmuGuy
Copy link

regression in Dynasty Warriors Gundam Reborn. Now the models are rotating endlessly with LLVM Recompiler checked without accurate xfloat.
86890168-1a754600-c127-11ea-8b29-65fb7ccd4415

@CoolEmuGuy
Copy link

Is the regression issue being worked on?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

8 participants