SPU LLVM: Use 512bit xorsum for SPU verification#16642
Merged
Merged
Conversation
- Provides a 2-3% uplift in SPU limited titles - Removes the full_width_avx512 option - Adds a precise spu verification option, for debugging (config file only)
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Use a 512bit wide "xorsum" (even for machines with 128b and 256b simd) hash in place of a full comparison for SPU verification. In theory, hash collision is nearly impossible here, even though a human could create two matching hashes by hand pretty easily, but luckily for us, all PS3 programs were written a long time ago.
With a true random distribution of numbers, the chance of collision should be astronomically small, (1 in 2^512?) but I'm not sure how the chance changes when we change from true random to valid SPU opcodes. Either way, I don't think that collisions are likely, and the performance uplift is real. Just to be safe, the xorsum path is only taken if there are atleast 3 64byte blocks to hash.
The full_width_avx512 option is also removed, since even on CPUs which experienced severe AVX-512 downclocking, 512-wide spu verification was never an issue, since it only uses simple bitwise instructions, which are very power efficient.
Before: (78.4 FPS)

After: (80.0 FPS)

And yes, the uplift is similar on both AVX2 and AVX-512 targets.