New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SPU LLVM: add AVX-512 SPU verification #10113
Conversation
rpcs3/rpcs3qt/settings_dialog.cpp
Outdated
@@ -202,6 +202,10 @@ settings_dialog::settings_dialog(std::shared_ptr<gui_settings> gui_settings, std | |||
m_emu_settings->EnhanceCheckBox(ui->accurateXFloat, emu_settings_type::AccurateXFloat); | |||
SubscribeTooltip(ui->accurateXFloat, tooltips.settings.accurate_xfloat); | |||
|
|||
m_emu_settings->EnhanceCheckBox(ui->fullWidthAVX512, emu_settings_type::FullWidthAVX512); | |||
SubscribeTooltip(ui->fullWidthAVX512, tooltips.settings.full_width_avx512); | |||
ui->fullWidthAVX512->setDisabled(!utils::has_avx512()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can just say setEnabled without the !.
But I guess that's preference
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, I didn't notice that there's a setEnabled. I was looking just looking at the LLVMdfma option. I'll just use that instead then
Hmm, do those new CPU have 512-bit bus? From my tests with asmjit (which has these paths disabled) it was significantly slower than its AVX2 counterpart. That's for skylake-x though, I don't know about newer processors. ASMJIT has those paths disabled, I guess you can enable one of them with this setting. |
I think both the newer cpus and skylake-x can do two 512bit loads per clock, but I believe they can only achieve that speed if the data is in the L1 cache.
ok. |
Also new code path for LLVM is almost the same as existing one, can you change existing to be more "variable" instead? In a sense of not hardcoding vector size (could be 128, 256, 512, maybe 1024 in future). |
Hmm, I'm not really sure on how to make the code more "variable" in a clean way. Is there something obvious I'm missing? |
Ah, ArrayRef constructor... I believe you can specify length manually as a second argument, just checked. |
- This is hidden behind a new setting, as some cpus may downclock agressively when executing 512 wide instructions
should be good now |
This PR adds a new setting, called "Full Width AVX-512". At the moment only 512 bit wide SPU verification is hidden behind this option, but in the future more code can be put behind it.
This code needs to be hidden behind a setting thanks to some cpus that downclock upon executing 512 bit wide AVX-512 code. While newer cpus like the 11900K don't experience any downclocking (see https://travisdowns.github.io/blog/2020/08/19/icl-avx512-freq.html), older cpus based on skylake-x will aggressively downclock, and the setting is better off for these cpus.
Users may also prefer this option disabled if they've overclocked their CPU with a negative AVX-512 offset.
This new setting is off by default and cannot be toggled if the users cpu doesn't support AVX-512.
For an example this reduces the code size of the largest .obj file in the mandelbrot homebrew from 29.5kb down to 27.1kb.