Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SIMD: Allow to switch off vectorized gather/scatter #15875

Merged
merged 2 commits into from Aug 14, 2023

Conversation

kronbichler
Copy link
Member

@kronbichler kronbichler commented Aug 14, 2023

This is an attempt for a mitigation against the extremely annoying Intel GDS mitigation (aka Downfall): Add a flag to control the use of the gather and scatter instructions via VectorizedArray, which should shield our code base against the use of these instructions. Note that the performance tests https://dealii.org/performance_tests//reports/render.html?#!5824d3360e2ca5440695b4ac2da40146c42317a6.md discussed in #15873 document a performance degradation of our matrix-vector products by 20-40% (timing-step_37/matvec_double goes from 1.11s to 1.58s in the average case). As these are some of the highest-performing components, we need to take action now. The compiler might still insert gather instructions sometimes, because compilers will also try to use the opportunities and in fact those will only disappear once compilers have learned about the Downfall degradation, but this is out of my control. As I promised in #15873 (comment), we might also do some changes on a larger scale.

@tamiko with this PR, you should be able to set DEAL_II_USE_VECTORIZATION_GATHER=OFF. I would be curious what the performance gets with this PR.

@@ -175,6 +176,12 @@ set(CMAKE_INSTALL_RPATH_USE_LINK_PATH "ON" CACHE BOOL
)
mark_as_advanced(CMAKE_INSTALL_RPATH_USE_LINK_PATH)

option(DEAL_II_WITH_VECTORIZATION_GATHER
"For the x86 compilation target, the use of SIMD gather/scatter instructions can be much slower than using scalar loads. This includes a wide range of Intel hardware (in particular, server processors of the Broadwell, Skylake, Cascade Lake, and Ice Lake families released between 2015 and 2021). While the default is to aggressively use these instructions, this variable can be used to disable their use if deemed to give better performance."
ON
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we can figure out a way to disable automatically for Intel but enable for AMD later.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure - this is really tricky. For example, my login node is unaffected but my compute nodes are very much affected.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, that is not something we can know. In situations like this you have to specify things manually.

@tjhei
Copy link
Member

tjhei commented Aug 14, 2023

Can you add a changelog as well?

include/deal.II/base/config.h.in Outdated Show resolved Hide resolved
@tamiko
Copy link
Member

tamiko commented Aug 14, 2023

@kronbichler I suggest we simply merge after the renaming. I have changed the configuration to set the toggle to false and we see what happens.

@kronbichler
Copy link
Member Author

Can you add a changelog as well?

Changelog added, thank you for the remark.

@masterleinad masterleinad merged commit 53ac485 into dealii:master Aug 14, 2023
15 checks passed
@tamiko
Copy link
Member

tamiko commented Aug 14, 2023

@kronbichler The results are split: https://dealii.org/performance_tests//reports/render.html?#!53ac485e6801587a97bce6ebca3cd11dc4f6ecb4.md

Most of the workloads improved - but did not fully recover. Noticeably, with clang-16 some workloads regressed further.

@kronbichler kronbichler deleted the gather_mitigation branch August 14, 2023 20:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants