Add various SPU instruction patterns #13897

RipleyTom · 2023-05-20T15:10:51Z

Add accurate_re intrinsic
Add pattern for accurate_re * value = full divison
Add variants of fast division
Add variants of sqrt
Fix pattern recognition by using peek_through_bitcasts (@Nekotekina's fix)

Accurate has no shortcuts.

Patterns for approximate:

FREST + FI => fully accurate calculation except for denormals => results in spu_re(a)
FRSQEST + FI => fully accurate calculation except for denormals => results in spu_rsqrte(a)

FMA(FNMS(div <> spu_re(div), float_value) <> spu_re(div), spu_re(div)):
Results in 1/div which is guaranteed per spu doc to be within 1 ulp so we shortcut to direct 1/div (also within 1 ulp on modern cpus), this seems a safe shortcut. It is tested for 2 values, 1.0f and 1.00000011920928955078125f(1.0f with lowest fraction bit set). We may need to change the shortcut to use that specific value but I have yet to see anything where this is an issue => results in re_accurate(div)

FMA(FNMS(spu_rsqrte(src) <> FM(0.5 <> spu_rsqrte(src)), float_value) <> FM(0.5 <> spu_rsqrte(src)), src * spu_rsqrte(src)):
Results in fsqrt(fabs(src)), I doubt this one is accurate within 1ulp but games seems happy enough with a direct shortcut to fsqrt(abs(src))

Patterns for relaxed:

All the approximate patterns +
FREST + FI => execute cpu intrinsic for reciprocal(unsafe for multiplayer) => results in spu_re(a)
FRSQEST + FI => execute cpu intrinsic for reciprocal square root(unsafe for multiplayer) => results in spu_rsqrte(a)

FMA(FNMS(FM(diva<> spu_re(divb)), divb, diva) <> spu_re(divb), FM(diva<> spu_re(divb))):
Results in diva/divb, probably not accurate within 1ulp as some games don't like this shortcut.

FM(re_accurate(divb) <> diva):
Results in diva/divb, the difference between diva * (1/divb) and diva/divb appears too significant to not create issues in some games.

Ordinary205 · 2023-05-20T15:53:34Z

The Need for Speed Most Wanted audio is broken on this PR Build.
RPCS3.log.gz

jgt11 · 2023-06-01T16:59:29Z

Testing needed?

GitHubProUser67 · 2023-06-11T13:48:21Z

This Pull Request has a positive impact on PlayStation Home it seems :

Now the MLAA anti-aliasing works with UI elements while before, it was a complete black-screen.

elad335 · 2023-07-15T17:01:47Z

rpcs3/Emu/Cell/SPURecompiler.cpp

+			return false;
+		};
+
+		if (check_accurate_reciprocal_pattern_for_float(1.0f))


maybe this is the pattern that breaks stuff?

elad335 · 2023-08-06T08:54:52Z

I disabled one pattern, needs testing.

Ordinary205 · 2023-08-06T09:30:09Z

Same results, the NFS audio is still broken on the latest PR build.
RPCS3.log.gz

RipleyTom · 2024-01-25T00:33:33Z

Rebased on Accurate FI PR and found the problematic shortcut, NFS now works.

Linear524 · 2024-01-25T02:03:55Z

Can't wait for this to be merged into master branch ! ))
A lot of GT6 softlocks are gone with this PR ! (maybe all of them)

Jonathan44062 · 2024-01-25T03:33:15Z

The PR breaks Resistance 2
Master:

PR:

RPCS3.log.gz

RipleyTom · 2024-01-26T08:21:36Z

Moved the spu_re_accurate(b) * a = a/b patterns to relaxed as they were the ones creating issues in Resistance 2.

Asinin3 · 2024-01-27T02:44:51Z

This PR doesn't fix audio in Blur I'm afraid, it still needs Accurate Xfloat to stop audio from randomly disappearing completely and game stability. (tested latest commit only).

RipleyTom · 2024-01-28T02:20:00Z

I couldn't find any regression on my games. Accurate now should be more accurate than it was, approximate may be very slightly slower but work with more games and relaxed should be faster than it was.

elad335 · 2024-01-28T11:19:13Z

Can you add description of the instruction patterns themselfs in the pr description. Such as FREST(x) => FI(x) = 1/x or sonething
So it would be easier to follow and review

elad335 · 2024-01-28T19:30:53Z

1.00000011920928955078125f(1.0f with lowest fraction bit set)

If this is what it is I rather have a bits representation such as bit_cast<f32>(bit_cast<u32>(1.f) + 1) instead of relying on the compiler to round it correctly and being more intuiative to read.

elad335 · 2024-02-06T08:33:47Z

rpcs3/Emu/Cell/SPULLVMRecompiler.cpp

+				if (std::get<0>(res))
+					return res;
+
+				res = match_expr(a, fm(fsplat<f32[4]>(0.5), MT));


Since the order doesnt matter, this change should be in llvm_mul struct not here.

But we're testing for intrinsics not for llvm_mul. And llvm_mul is not generated until the intrinsic has been evaluated.

Okay, I will implement a mechanism for it then.

cipherxof · 2024-04-19T02:01:23Z

This PR introduced a new deadlock in MGO (and presumably MGS4) when using relaxed xfloat.

It seems to be stemming from the FI instruction, and the deadlock happens in control_task.spu.task.

Previously MGO only needed approximate FM, FMA, and FNMS.

GalaxyGaming2000 · 2024-04-26T16:10:10Z

Well this broke lbp2 brainy cakes level making story no longer beatable

aikhalaf · 2024-04-26T16:59:43Z

Well this broke lbp2 brainy cakes level making story no longer beatable

the level was already broken since february if im not wrong

cipherxof · 2024-04-26T18:23:25Z

the level was already broken since february if im not wrong

This PR was merged in February.

GalaxyGaming2000 · 2024-04-27T02:52:18Z

the level was already broken since february if im not wrong

well downgrading to the version before this fixes it and it was merged in feb so uh

aikhalaf · 2024-04-27T06:02:10Z

the level was already broken since february if im not wrong

well downgrading to the version before this fixes it and it was merged in feb so uh

lmao excuse my foolishness

RipleyTom force-pushed the new_spu_patterns branch from 4091658 to 3a6a342 Compare May 20, 2023 15:11

Megamouse added the CPU label May 24, 2023

elad335 reviewed Jul 15, 2023

View reviewed changes

RipleyTom force-pushed the new_spu_patterns branch from f065404 to debc9dc Compare January 25, 2024 00:32

RipleyTom mentioned this pull request Jan 25, 2024

[performance tests needed] Accurate FI #15089

Closed

RipleyTom changed the title ~~Add various SPU patterns~~ [Testing needed] Add various SPU patterns Jan 25, 2024

RipleyTom force-pushed the new_spu_patterns branch from debc9dc to 728e8eb Compare January 25, 2024 00:54

RipleyTom force-pushed the new_spu_patterns branch from 728e8eb to 1189088 Compare January 26, 2024 08:19

RipleyTom mentioned this pull request Jan 27, 2024

[Regression] MGS4 Grey Screen bug #15112

Closed

Accurate FI

db5df32

RipleyTom force-pushed the new_spu_patterns branch from 1189088 to 46100e1 Compare January 28, 2024 02:14

RipleyTom marked this pull request as ready for review January 28, 2024 02:14

elad335 self-requested a review January 28, 2024 11:52

Add various SPU patterns

8b16917

RipleyTom force-pushed the new_spu_patterns branch from 46100e1 to 8b16917 Compare January 28, 2024 19:58

elad335 reviewed Feb 6, 2024

View reviewed changes

Merge branch 'master' into new_spu_patterns

f37a3df

elad335 merged commit 65d93c9 into RPCS3:master Feb 6, 2024
6 checks passed

elad335 changed the title ~~[Testing needed] Add various SPU patterns~~ Add various SPU instruction patterns Feb 6, 2024

JimScript mentioned this pull request Feb 11, 2024

[Regression] Misoriented effects with Xfloat Approximate in Ratchet & Clank: Into the Nexus after #13897 #15186

Closed

uhwot mentioned this pull request Feb 11, 2024

[Regression] NPEA00324 LittleBigPlanet 2, Gas doesn't disappear (#13897) #15178

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add various SPU instruction patterns #13897

Add various SPU instruction patterns #13897

RipleyTom commented May 20, 2023 •

edited

Ordinary205 commented May 20, 2023

jgt11 commented Jun 1, 2023

GitHubProUser67 commented Jun 11, 2023

elad335 Jul 15, 2023

elad335 commented Aug 6, 2023 •

edited

Ordinary205 commented Aug 6, 2023

RipleyTom commented Jan 25, 2024

Linear524 commented Jan 25, 2024

Jonathan44062 commented Jan 25, 2024

RipleyTom commented Jan 26, 2024

Asinin3 commented Jan 27, 2024

RipleyTom commented Jan 28, 2024

elad335 commented Jan 28, 2024 •

edited

elad335 commented Jan 28, 2024 •

edited

elad335 Feb 6, 2024 •

edited

RipleyTom Feb 6, 2024

elad335 Feb 6, 2024

cipherxof commented Apr 19, 2024

GalaxyGaming2000 commented Apr 26, 2024

aikhalaf commented Apr 26, 2024

cipherxof commented Apr 26, 2024

GalaxyGaming2000 commented Apr 27, 2024

aikhalaf commented Apr 27, 2024

Add various SPU instruction patterns #13897

Add various SPU instruction patterns #13897

Conversation

RipleyTom commented May 20, 2023 • edited

Patterns for approximate:

Patterns for relaxed:

Ordinary205 commented May 20, 2023

jgt11 commented Jun 1, 2023

GitHubProUser67 commented Jun 11, 2023

elad335 Jul 15, 2023

Choose a reason for hiding this comment

elad335 commented Aug 6, 2023 • edited

Ordinary205 commented Aug 6, 2023

RipleyTom commented Jan 25, 2024

Linear524 commented Jan 25, 2024

Jonathan44062 commented Jan 25, 2024

RipleyTom commented Jan 26, 2024

Asinin3 commented Jan 27, 2024

RipleyTom commented Jan 28, 2024

elad335 commented Jan 28, 2024 • edited

elad335 commented Jan 28, 2024 • edited

elad335 Feb 6, 2024 • edited

Choose a reason for hiding this comment

RipleyTom Feb 6, 2024

Choose a reason for hiding this comment

elad335 Feb 6, 2024

Choose a reason for hiding this comment

cipherxof commented Apr 19, 2024

GalaxyGaming2000 commented Apr 26, 2024

aikhalaf commented Apr 26, 2024

cipherxof commented Apr 26, 2024

GalaxyGaming2000 commented Apr 27, 2024

aikhalaf commented Apr 27, 2024

RipleyTom commented May 20, 2023 •

edited

elad335 commented Aug 6, 2023 •

edited

elad335 commented Jan 28, 2024 •

edited

elad335 commented Jan 28, 2024 •

edited

elad335 Feb 6, 2024 •

edited