New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SIMD: Allow to switch off vectorized gather/scatter #15875
Conversation
@@ -175,6 +176,12 @@ set(CMAKE_INSTALL_RPATH_USE_LINK_PATH "ON" CACHE BOOL | |||
) | |||
mark_as_advanced(CMAKE_INSTALL_RPATH_USE_LINK_PATH) | |||
|
|||
option(DEAL_II_WITH_VECTORIZATION_GATHER | |||
"For the x86 compilation target, the use of SIMD gather/scatter instructions can be much slower than using scalar loads. This includes a wide range of Intel hardware (in particular, server processors of the Broadwell, Skylake, Cascade Lake, and Ice Lake families released between 2015 and 2021). While the default is to aggressively use these instructions, this variable can be used to disable their use if deemed to give better performance." | |||
ON |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we can figure out a way to disable automatically for Intel but enable for AMD later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure - this is really tricky. For example, my login node is unaffected but my compute nodes are very much affected.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, that is not something we can know. In situations like this you have to specify things manually.
Can you add a changelog as well? |
@kronbichler I suggest we simply merge after the renaming. I have changed the configuration to set the toggle to false and we see what happens. |
1d000e5
to
bacc9c6
Compare
Changelog added, thank you for the remark. |
bacc9c6
to
6d92ba3
Compare
@kronbichler The results are split: https://dealii.org/performance_tests//reports/render.html?#!53ac485e6801587a97bce6ebca3cd11dc4f6ecb4.md Most of the workloads improved - but did not fully recover. Noticeably, with clang-16 some workloads regressed further. |
This is an attempt for a mitigation against the extremely annoying Intel GDS mitigation (aka Downfall): Add a flag to control the use of the
gather
andscatter
instructions viaVectorizedArray
, which should shield our code base against the use of these instructions. Note that the performance tests https://dealii.org/performance_tests//reports/render.html?#!5824d3360e2ca5440695b4ac2da40146c42317a6.md discussed in #15873 document a performance degradation of our matrix-vector products by 20-40% (timing-step_37/matvec_double
goes from 1.11s to 1.58s in the average case). As these are some of the highest-performing components, we need to take action now. The compiler might still insertgather
instructions sometimes, because compilers will also try to use the opportunities and in fact those will only disappear once compilers have learned about the Downfall degradation, but this is out of my control. As I promised in #15873 (comment), we might also do some changes on a larger scale.@tamiko with this PR, you should be able to set
DEAL_II_USE_VECTORIZATION_GATHER=OFF
. I would be curious what the performance gets with this PR.