Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Accurate frest & frsqest #15079

Merged
merged 2 commits into from Jan 23, 2024
Merged

Conversation

RipleyTom
Copy link
Contributor

They should now be 1:1 with ps3.
Could fix weird graphical issues coming from spu accuracy.

@RipleyTom RipleyTom force-pushed the frest_frsqest_accurate branch 2 times, most recently from 96b369c to df6c8f7 Compare January 22, 2024 01:57
const auto eval_sign = eval(extract(a_sign, i));

value_t<u32> r_fraction = load_const<u32>(m_spu_frest_fraction_lut, eval_fraction);
value_t<u32> r_exponent = load_const<u32>(m_spu_frest_exponent_lut, eval_exponent);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what happened to the gather version?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I looked into a bit more, the only intrinsic that'd make sense for use is llvm.masked.gather (Intrinsic::masked_gather) and it pretty much what I do except it requires a vector of pointers as a starting point(see https://llvm.org/docs/LangRef.html#llvm-masked-gather-intrinsics ). Of note is that it describes what the Instrinsic with all true mask results in which is very similar to what I already do(except I do the pointer calculations in the middle).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So what's preventing this solution from being used here? You can load the base pointers into a 512 bit vector, add index, and load.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the amount of pointers (8) is the problem you can try to force the upper 4 indices to be equal to the lowest index so only 4 memory locations are accessed.
It should be tested if it benefits performance.

Copy link
Contributor Author

@RipleyTom RipleyTom Jan 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried Intrinsic::vp_gather which seemed as good(though it returns an array of u64) but I can't seem to get it to work, it crashes during IR production from parsing the parameters of the CreateCall. If someone think they can make it work they're welcome to try.

rpcs3/Emu/Cell/SPUThread.h Outdated Show resolved Hide resolved
@Megamouse Megamouse added the CPU label Jan 22, 2024
@Asinin3
Copy link
Contributor

Asinin3 commented Jan 22, 2024

Could this also fix physics issues e.g GTA IV falling through the ground without accurate Xfloat?

@MarioSonic2987
Copy link
Contributor

It doesn't fix broken collision in GTA IV without using Accurate XFloat:
image

@JimScript
Copy link

inFamous has weird shadow striping with Accurate XFloat in this PR:

inFamousSPUStripes.mp4

Doesn't happen with Approximate though, but I don't know if this PR fixes the gameplay issues has without Accurate.

@Ordinary205
Copy link
Contributor

NFS Most Wanted and The Run audio doesnt break when using approximate xfloat after testing this PR.
NFS Most Wanted working
RPCS3.log.gz

@Ordinary205
Copy link
Contributor

Watch Dogs also works fine.
Watch Dogs working
RPCS3.log.gz

@RipleyTom RipleyTom marked this pull request as ready for review January 22, 2024 18:19
@JimScript
Copy link

inFamous still has the Accurate XFloat bug but it also seems to affect ASMJIT as well, here are some logs for them:
RPCS3_inFamous_SPU_Accurate_Stripes.log
RPCS3_inFamous_SPU_ASMJIT_Stripes.log

@Ordinary205
Copy link
Contributor

However testing NFS Most Wanted with ASMJIT recompiler causes an audio regression to make loud and broken noises, which doesnt happen on master builds.
RPCS3.log.gz
LLVM recompiler works fine.

@RipleyTom
Copy link
Contributor Author

Yes I just noticed that asmjit doesn't implement FI at all which the result of frest/frsqest rely on.

@RipleyTom RipleyTom force-pushed the frest_frsqest_accurate branch 2 times, most recently from db251fd to 998ee39 Compare January 23, 2024 04:22
@RipleyTom
Copy link
Contributor Author

Added FI implementation for ASMJit and removed special case for accurate xfloat in FI which just forwarded the value.

@JimScript
Copy link

The inFamous Striping issue has been resolved.

@elad335 elad335 merged commit d33955c into RPCS3:master Jan 23, 2024
6 checks passed
@RipleyTom RipleyTom deleted the frest_frsqest_accurate branch January 26, 2024 02:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants