Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SPU LLVM: add AVX-512 SPU verification #10113

Merged
merged 1 commit into from Apr 16, 2021
Merged

Conversation

Whatcookie
Copy link
Member

This PR adds a new setting, called "Full Width AVX-512". At the moment only 512 bit wide SPU verification is hidden behind this option, but in the future more code can be put behind it.

This code needs to be hidden behind a setting thanks to some cpus that downclock upon executing 512 bit wide AVX-512 code. While newer cpus like the 11900K don't experience any downclocking (see https://travisdowns.github.io/blog/2020/08/19/icl-avx512-freq.html), older cpus based on skylake-x will aggressively downclock, and the setting is better off for these cpus.

Users may also prefer this option disabled if they've overclocked their CPU with a negative AVX-512 offset.

This new setting is off by default and cannot be toggled if the users cpu doesn't support AVX-512.

For an example this reduces the code size of the largest .obj file in the mandelbrot homebrew from 29.5kb down to 27.1kb.

@@ -202,6 +202,10 @@ settings_dialog::settings_dialog(std::shared_ptr<gui_settings> gui_settings, std
m_emu_settings->EnhanceCheckBox(ui->accurateXFloat, emu_settings_type::AccurateXFloat);
SubscribeTooltip(ui->accurateXFloat, tooltips.settings.accurate_xfloat);

m_emu_settings->EnhanceCheckBox(ui->fullWidthAVX512, emu_settings_type::FullWidthAVX512);
SubscribeTooltip(ui->fullWidthAVX512, tooltips.settings.full_width_avx512);
ui->fullWidthAVX512->setDisabled(!utils::has_avx512());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can just say setEnabled without the !.
But I guess that's preference

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I didn't notice that there's a setEnabled. I was looking just looking at the LLVMdfma option. I'll just use that instead then

@Nekotekina
Copy link
Member

Nekotekina commented Apr 14, 2021

Hmm, do those new CPU have 512-bit bus? From my tests with asmjit (which has these paths disabled) it was significantly slower than its AVX2 counterpart. That's for skylake-x though, I don't know about newer processors.

ASMJIT has those paths disabled, I guess you can enable one of them with this setting.

@Whatcookie
Copy link
Member Author

I think both the newer cpus and skylake-x can do two 512bit loads per clock, but I believe they can only achieve that speed if the data is in the L1 cache.

ASMJIT has those paths disabled, I guess you can enable one of them with this setting.

ok.

@Nekotekina
Copy link
Member

Also new code path for LLVM is almost the same as existing one, can you change existing to be more "variable" instead? In a sense of not hardcoding vector size (could be 128, 256, 512, maybe 1024 in future).

@Whatcookie
Copy link
Member Author

Hmm, I'm not really sure on how to make the code more "variable" in a clean way.
For example the size of the shufflevector is based off of the size of the indices array. I can copy to smaller arrays as needed, but the code becomes full of branches and is difficult to read as a result.

Is there something obvious I'm missing?

@Nekotekina
Copy link
Member

Ah, ArrayRef constructor... I believe you can specify length manually as a second argument, just checked.

- This is hidden behind a new setting, as some cpus may downclock agressively when executing 512 wide instructions
@Whatcookie
Copy link
Member Author

should be good now

@Nekotekina Nekotekina merged commit 0a7df9d into RPCS3:master Apr 16, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants