Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add various SPU instruction patterns #13897

Merged
merged 3 commits into from Feb 6, 2024
Merged

Conversation

RipleyTom
Copy link
Contributor

@RipleyTom RipleyTom commented May 20, 2023

  • Add accurate_re intrinsic
  • Add pattern for accurate_re * value = full divison
  • Add variants of fast division
  • Add variants of sqrt
  • Fix pattern recognition by using peek_through_bitcasts (@Nekotekina's fix)

Accurate has no shortcuts.

Patterns for approximate:

FREST + FI => fully accurate calculation except for denormals => results in spu_re(a)
FRSQEST + FI => fully accurate calculation except for denormals => results in spu_rsqrte(a)

FMA(FNMS(div <> spu_re(div), float_value) <> spu_re(div), spu_re(div)):
Results in 1/div which is guaranteed per spu doc to be within 1 ulp so we shortcut to direct 1/div (also within 1 ulp on modern cpus), this seems a safe shortcut. It is tested for 2 values, 1.0f and 1.00000011920928955078125f(1.0f with lowest fraction bit set). We may need to change the shortcut to use that specific value but I have yet to see anything where this is an issue => results in re_accurate(div)

FMA(FNMS(spu_rsqrte(src) <> FM(0.5 <> spu_rsqrte(src)), float_value) <> FM(0.5 <> spu_rsqrte(src)), src * spu_rsqrte(src)):
Results in fsqrt(fabs(src)), I doubt this one is accurate within 1ulp but games seems happy enough with a direct shortcut to fsqrt(abs(src))

Patterns for relaxed:

All the approximate patterns +
FREST + FI => execute cpu intrinsic for reciprocal(unsafe for multiplayer) => results in spu_re(a)
FRSQEST + FI => execute cpu intrinsic for reciprocal square root(unsafe for multiplayer) => results in spu_rsqrte(a)

FMA(FNMS(FM(diva<> spu_re(divb)), divb, diva) <> spu_re(divb), FM(diva<> spu_re(divb))):
Results in diva/divb, probably not accurate within 1ulp as some games don't like this shortcut.

FM(re_accurate(divb) <> diva):
Results in diva/divb, the difference between diva * (1/divb) and diva/divb appears too significant to not create issues in some games.

@Ordinary205
Copy link
Contributor

The Need for Speed Most Wanted audio is broken on this PR Build.
RPCS3.log.gz

@Megamouse Megamouse added the CPU label May 24, 2023
@jgt11
Copy link

jgt11 commented Jun 1, 2023

Testing needed?

@GitHubProUser67
Copy link

This Pull Request has a positive impact on PlayStation Home it seems :

image

Now the MLAA anti-aliasing works with UI elements while before, it was a complete black-screen.

return false;
};

if (check_accurate_reciprocal_pattern_for_float(1.0f))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe this is the pattern that breaks stuff?

@elad335
Copy link
Contributor

elad335 commented Aug 6, 2023

I disabled one pattern, needs testing.

@Ordinary205
Copy link
Contributor

Same results, the NFS audio is still broken on the latest PR build.
RPCS3.log.gz

@RipleyTom
Copy link
Contributor Author

Rebased on Accurate FI PR and found the problematic shortcut, NFS now works.

@RipleyTom RipleyTom changed the title Add various SPU patterns [Testing needed] Add various SPU patterns Jan 25, 2024
@Linear524
Copy link

Can't wait for this to be merged into master branch ! ))
A lot of GT6 softlocks are gone with this PR ! (maybe all of them)

@Jonathan44062
Copy link

The PR breaks Resistance 2
Master:
image
PR:
image

RPCS3.log.gz

@RipleyTom
Copy link
Contributor Author

Moved the spu_re_accurate(b) * a = a/b patterns to relaxed as they were the ones creating issues in Resistance 2.

@Asinin3
Copy link
Contributor

Asinin3 commented Jan 27, 2024

This PR doesn't fix audio in Blur I'm afraid, it still needs Accurate Xfloat to stop audio from randomly disappearing completely and game stability. (tested latest commit only).

@RipleyTom RipleyTom marked this pull request as ready for review January 28, 2024 02:14
@RipleyTom
Copy link
Contributor Author

I couldn't find any regression on my games. Accurate now should be more accurate than it was, approximate may be very slightly slower but work with more games and relaxed should be faster than it was.

@elad335
Copy link
Contributor

elad335 commented Jan 28, 2024

Can you add description of the instruction patterns themselfs in the pr description. Such as FREST(x) => FI(x) = 1/x or sonething
So it would be easier to follow and review

@elad335 elad335 self-requested a review January 28, 2024 11:52
@elad335
Copy link
Contributor

elad335 commented Jan 28, 2024

1.00000011920928955078125f(1.0f with lowest fraction bit set)

If this is what it is I rather have a bits representation such as bit_cast<f32>(bit_cast<u32>(1.f) + 1) instead of relying on the compiler to round it correctly and being more intuiative to read.

if (std::get<0>(res))
return res;

res = match_expr(a, fm(fsplat<f32[4]>(0.5), MT));
Copy link
Contributor

@elad335 elad335 Feb 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the order doesnt matter, this change should be in llvm_mul struct not here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But we're testing for intrinsics not for llvm_mul. And llvm_mul is not generated until the intrinsic has been evaluated.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, I will implement a mechanism for it then.

@elad335 elad335 merged commit 65d93c9 into RPCS3:master Feb 6, 2024
6 checks passed
@elad335 elad335 changed the title [Testing needed] Add various SPU patterns Add various SPU instruction patterns Feb 6, 2024
@cipherxof
Copy link
Contributor

This PR introduced a new deadlock in MGO (and presumably MGS4) when using relaxed xfloat.

It seems to be stemming from the FI instruction, and the deadlock happens in control_task.spu.task.

Previously MGO only needed approximate FM, FMA, and FNMS.

@GalaxyGaming2000
Copy link

Well this broke lbp2 brainy cakes level making story no longer beatable

@aikhalaf
Copy link

Well this broke lbp2 brainy cakes level making story no longer beatable

the level was already broken since february if im not wrong

@cipherxof
Copy link
Contributor

the level was already broken since february if im not wrong

This PR was merged in February.

@GalaxyGaming2000
Copy link

the level was already broken since february if im not wrong

well downgrading to the version before this fixes it and it was merged in feb so uh

@aikhalaf
Copy link

the level was already broken since february if im not wrong

well downgrading to the version before this fixes it and it was merged in feb so uh

lmao excuse my foolishness

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet