New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add various SPU instruction patterns #13897
Conversation
4091658
to
3a6a342
Compare
The Need for Speed Most Wanted audio is broken on this PR Build. |
Testing needed? |
rpcs3/Emu/Cell/SPURecompiler.cpp
Outdated
return false; | ||
}; | ||
|
||
if (check_accurate_reciprocal_pattern_for_float(1.0f)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe this is the pattern that breaks stuff?
I disabled one pattern, needs testing. |
Same results, the NFS audio is still broken on the latest PR build. |
f065404
to
debc9dc
Compare
Rebased on Accurate FI PR and found the problematic shortcut, NFS now works. |
debc9dc
to
728e8eb
Compare
Can't wait for this to be merged into master branch ! )) |
728e8eb
to
1189088
Compare
Moved the spu_re_accurate(b) * a = a/b patterns to relaxed as they were the ones creating issues in Resistance 2. |
This PR doesn't fix audio in Blur I'm afraid, it still needs Accurate Xfloat to stop audio from randomly disappearing completely and game stability. (tested latest commit only). |
1189088
to
46100e1
Compare
I couldn't find any regression on my games. Accurate now should be more accurate than it was, approximate may be very slightly slower but work with more games and relaxed should be faster than it was. |
Can you add description of the instruction patterns themselfs in the pr description. Such as FREST(x) => FI(x) = 1/x or sonething |
If this is what it is I rather have a bits representation such as |
46100e1
to
8b16917
Compare
if (std::get<0>(res)) | ||
return res; | ||
|
||
res = match_expr(a, fm(fsplat<f32[4]>(0.5), MT)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since the order doesnt matter, this change should be in llvm_mul struct not here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But we're testing for intrinsics not for llvm_mul. And llvm_mul is not generated until the intrinsic has been evaluated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, I will implement a mechanism for it then.
This PR introduced a new deadlock in MGO (and presumably MGS4) when using relaxed xfloat. It seems to be stemming from the FI instruction, and the deadlock happens in Previously MGO only needed approximate FM, FMA, and FNMS. |
Well this broke lbp2 brainy cakes level making story no longer beatable |
the level was already broken since february if im not wrong |
This PR was merged in February. |
well downgrading to the version before this fixes it and it was merged in feb so uh |
lmao excuse my foolishness |
Accurate has no shortcuts.
Patterns for approximate:
FREST + FI => fully accurate calculation except for denormals => results in spu_re(a)
FRSQEST + FI => fully accurate calculation except for denormals => results in spu_rsqrte(a)
FMA(FNMS(div <> spu_re(div), float_value) <> spu_re(div), spu_re(div)):
Results in 1/div which is guaranteed per spu doc to be within 1 ulp so we shortcut to direct 1/div (also within 1 ulp on modern cpus), this seems a safe shortcut. It is tested for 2 values, 1.0f and 1.00000011920928955078125f(1.0f with lowest fraction bit set). We may need to change the shortcut to use that specific value but I have yet to see anything where this is an issue => results in re_accurate(div)
FMA(FNMS(spu_rsqrte(src) <> FM(0.5 <> spu_rsqrte(src)), float_value) <> FM(0.5 <> spu_rsqrte(src)), src * spu_rsqrte(src)):
Results in fsqrt(fabs(src)), I doubt this one is accurate within 1ulp but games seems happy enough with a direct shortcut to fsqrt(abs(src))
Patterns for relaxed:
All the approximate patterns +
FREST + FI => execute cpu intrinsic for reciprocal(unsafe for multiplayer) => results in spu_re(a)
FRSQEST + FI => execute cpu intrinsic for reciprocal square root(unsafe for multiplayer) => results in spu_rsqrte(a)
FMA(FNMS(FM(diva<> spu_re(divb)), divb, diva) <> spu_re(divb), FM(diva<> spu_re(divb))):
Results in diva/divb, probably not accurate within 1ulp as some games don't like this shortcut.
FM(re_accurate(divb) <> diva):
Results in diva/divb, the difference between diva * (1/divb) and diva/divb appears too significant to not create issues in some games.