New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Avoid using PDEP and PEXT on AMD Zen #8586
Conversation
|
https://uops.info/html-instr/PEXT_R64_R64_R64.html confirms the fact that PEXT is slow on Zen / Zen 2. PDEP is similarly slow, we should make sure we don't emit these if we have a better alternative. |
d0ac61e
to
3c62714
Compare
For some unknown reason PDEP and PEXT are ridiculously slow on AMD Zen architecture.
This was causing severe slowdown in some games.
And this is actually just the best case. In the worst case, the latency is more than 289 cycles: https://twitter.com/uops_info/status/1202950247900684290 |
|
Hopefully this is the cause of users on AMD machines suddenly saying Dolphin is really slow in partiuclar cases... |
|
PDEP and PEXT latency issues are fixed on Zen 3. That said, since the family was bumped (Zen 3 is now family 25), and this code is exclusively family = 23, Zen 3 is already using the fast path for PDEP and PEXT. Basically, if anyone else spots that Zen 3 has fixed its PDEP and PEXT issues, don't worry, no changes are required. It might be wise to update the comment though, and mention it is only Zen 1-2. |
|
I've been going over the CPUDetect code recently and realized that some dolphin jit code checks edit: oh nvm, realized the issue only impacts a subset of BMI2. |
For some unknown reason PDEP and PEXT are ridiculously slow on AMD Zen architecture, which is making DoubleToSingle an extremely costly operation on these CPUs, enough to cause severe slowdown in games. Unsure whether the usage in VertexLoaderX64 had any noticeable impact, but it's also been changed as a precautionary measure.
This should fix https://bugs.dolphin-emu.org/issues/11964