Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid using PDEP and PEXT on AMD Zen #8586

Merged
merged 4 commits into from Jan 27, 2020
Merged

Conversation

Techjar
Copy link
Contributor

@Techjar Techjar commented Jan 26, 2020

For some unknown reason PDEP and PEXT are ridiculously slow on AMD Zen architecture, which is making DoubleToSingle an extremely costly operation on these CPUs, enough to cause severe slowdown in games. Unsure whether the usage in VertexLoaderX64 had any noticeable impact, but it's also been changed as a precautionary measure.

This should fix https://bugs.dolphin-emu.org/issues/11964

@Techjar Techjar changed the title [TEST] Remove usage of PEXT from D2S ASM routine Jit64: Don't use PEXT in DoubleToSingle on AMD Zen Jan 27, 2020
@delroth
Copy link
Member

delroth commented Jan 27, 2020

https://uops.info/html-instr/PEXT_R64_R64_R64.html confirms the fact that PEXT is slow on Zen / Zen 2. PDEP is similarly slow, we should make sure we don't emit these if we have a better alternative.

@Techjar Techjar force-pushed the d2s-no-pext branch 3 times, most recently from d0ac61e to 3c62714 Compare January 27, 2020 02:50
@Techjar Techjar changed the title Jit64: Don't use PEXT in DoubleToSingle on AMD Zen Avoid using PDEP and PEXT on AMD Zen Jan 27, 2020
For some unknown reason PDEP and PEXT are ridiculously slow on AMD Zen
architecture.
This was causing severe slowdown in some games.
@Tilka Tilka merged commit f36c735 into dolphin-emu:master Jan 27, 2020
@andreas-abel
Copy link

https://uops.info/html-instr/PEXT_R64_R64_R64.html confirms the fact that PEXT is slow on Zen / Zen 2.

And this is actually just the best case. In the worst case, the latency is more than 289 cycles: https://twitter.com/uops_info/status/1202950247900684290

@JMC47
Copy link
Contributor

JMC47 commented Jan 27, 2020

Hopefully this is the cause of users on AMD machines suddenly saying Dolphin is really slow in partiuclar cases...

@MayImilae
Copy link
Contributor

MayImilae commented Nov 8, 2020

PDEP and PEXT latency issues are fixed on Zen 3. That said, since the family was bumped (Zen 3 is now family 25), and this code is exclusively family = 23, Zen 3 is already using the fast path for PDEP and PEXT. Basically, if anyone else spots that Zen 3 has fixed its PDEP and PEXT issues, don't worry, no changes are required.

It might be wise to update the comment though, and mention it is only Zen 1-2.

@shuffle2
Copy link
Contributor

shuffle2 commented Jul 19, 2022

I've been going over the CPUDetect code recently and realized that some dolphin jit code checks bBMI2 while other code checks bFastBMI2. Is that intentional (for some paths on family 17h to still be using BMI2) or would it make sense to remove bFastBMI2 and have bBMI2 forced to false on family 17h?

edit: oh nvm, realized the issue only impacts a subset of BMI2.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
7 participants