New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Introduce DEAL_II_VECTORIZATION_WIDTH_IN_BITS #9705
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fine with me!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like this change. Would you mind to quickly test my suggestion of "deprecating" the old macro?
#define DEAL_II_COMPILER_VECTORIZATION_LEVEL 1 | ||
#else | ||
#define DEAL_II_COMPILER_VECTORIZATION_LEVEL 0 | ||
#endif |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would you mind to test quickly whether it is possible to write something like
#if DEAL_II_COMPILER_HAS_DIAGNOSTIC_PRAGMA
// ...
# define DEAL_II_COMPILER_VECTORIZATION_LEVEL _Pragma ("GCC warning \"The DEAL_II_COMPILER_VECTORIZATION_LEVEL macro is deprecated\"") 3
// ...
#else
// what you have
#endif
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will try that now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Works like a charm, I will add this together with the renaming of the variable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I find the word HIGHEST_N_VECTORIZATION_BITS
difficult to read. I think for a number of bits, LARGEST
would actually be the better choice. In either case, the N_
somewhere in the middle is difficult to mentally parse. What if you used LARGEST_VECTORIZATION_SIZE_IN_BITS
?
include/deal.II/base/config.h.in
Outdated
@@ -116,7 +116,21 @@ | |||
*/ | |||
|
|||
#cmakedefine DEAL_II_WORDS_BIGENDIAN | |||
#define DEAL_II_COMPILER_VECTORIZATION_LEVEL @DEAL_II_COMPILER_VECTORIZATION_LEVEL@ | |||
#define DEAL_II_HIGHEST_N_VECTORIZATION_BITS @DEAL_II_HIGHEST_N_VECTORIZATION_BITS@ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would you mind adding a comment here documenting what the macro is supposed to be represent (or where else to find documentation)?
How about something like |
I like |
5c4ad63
to
74dcfa3
Compare
I don't think that is necessary, especially for anyone familiar with hardware. |
SET(DEAL_II_EXPAND_REAL_SCALARS_VECTORIZED | ||
"${DEAL_II_EXPAND_REAL_SCALARS_VECTORIZED}" "VectorizedArray<double,2>" "VectorizedArray<float,4>") | ||
SET(DEAL_II_EXPAND_FLOAT_VECTORIZED "${DEAL_II_EXPAND_FLOAT_VECTORIZED}" "VectorizedArray<float,4>") | ||
ENDIF() | ||
|
||
IF((${DEAL_II_COMPILER_VECTORIZATION_LEVEL} GREATER 1) AND ( DEAL_II_HAVE_AVX OR DEAL_II_HAVE_ALTIVEC)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what happened with altivec here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AltiVec is 128 bits, so the condition cannot be true, and the altivec case goes to the VECTORIZATION_WIDTH_IN_BITS > 0
case. It took me a while to understand the condition, but I think we simply missed it; in any case, I think we could also skip the AVX
and AVX512
parts here because the macro tells us what we have.
include/deal.II/base/utilities.h
Outdated
* list of possible return values is: | ||
* | ||
* <table> | ||
* <tr> | ||
* <td><tt>VECTORIZATION_LEVEL</tt></td> | ||
* <td><tt>N_VECTORIZATION_BITS</tt></td> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do you want to update this caption?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did, because I do not think the VECTORIZATION_LEVEL
makes much sense now any more. I was a bit unsure if we should completely remove this column now that we have the name VECTORIZATION_WIDTH_IN_BITS
and two columns with the same information.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for my cryptic comment: I think you need to update this column name to reflect the rename to VECTORIZATION_WIDTH
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or rather, keep it as vectorization level if we decide to keep the second macro or delete the redundant column otherwise.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, I'll undo this change for now.
include/deal.II/base/numbers.h
Outdated
2; | ||
#elif DEAL_II_COMPILER_VECTORIZATION_LEVEL >= 3 && defined(__AVX512F__) | ||
#elif DEAL_II_VECTORIZATION_WIDTH_IN_BITS >= 512 && defined(__AVX512F__) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
are these && defined(bla)
necessary? I thought we catch this in vectorization.h
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are right, we should be getting into these errors here:
dealii/include/deal.II/base/vectorization.h
Lines 46 to 55 in 61a022c
# if DEAL_II_COMPILER_VECTORIZATION_LEVEL >= 2 && defined(__SSE2__) && \ | |
!defined(__AVX__) | |
# error \ | |
"Mismatch in vectorization capabilities: AVX was detected during configuration of deal.II and switched on, but it is apparently not available for the file you are trying to compile at the moment. Check compilation flags controlling the instruction set, such as -march=native." | |
# endif | |
# if DEAL_II_COMPILER_VECTORIZATION_LEVEL >= 3 && defined(__SSE2__) && \ | |
!defined(__AVX512F__) | |
# error \ | |
"Mismatch in vectorization capabilities: AVX-512F was detected during configuration of deal.II and switched on, but it is apparently not available for the file you are trying to compile at the moment. Check compilation flags controlling the instruction set, such as -march=native." | |
# endif |
So I will simply remove the second part.
include/deal.II/base/numbers.h
Outdated
@@ -81,13 +81,13 @@ namespace internal | |||
* Maximal vector length of VectorizedArray for double. | |||
*/ | |||
constexpr static unsigned int max_width = | |||
#if DEAL_II_COMPILER_VECTORIZATION_LEVEL >= 1 && defined(__ALTIVEC__) | |||
#if DEAL_II_VECTORIZATION_WIDTH_IN_BITS >= 128 && defined(__ALTIVEC__) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why is altivec a special case here? Can't we just use the width and sort all entries here?
include/deal.II/base/config.h.in
Outdated
*/ | ||
#ifdef DEAL_II_COMPILER_HAS_DIAGNOSTIC_PRAGMA | ||
# if DEAL_II_VECTORIZATION_WIDTH_IN_BITS == 512 | ||
# define DEAL_II_COMPILER_VECTORIZATION_LEVEL _Pragma ("GCC warning \"The DEAL_II_COMPILER_VECTORIZATION_LEVEL macro is deprecated\"") 3 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don' think there is any harm in keeping this macro around (sorry, you already put some effort into this).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't feel strongly about it but I guess we have slightly different opinions here. - let's make a poll if we want to deprecate this now with 👍 or just keep it around 👎
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same, I don't feel strongly about it. It just means getting annoying warnings in every code using matrix-free (and adding a bunch of #ifdefs in codes that should compile with 9.1 as well).
74dcfa3
to
53a690f
Compare
ENDIF() | ||
|
||
IF(DEAL_II_HAVE_ALTIVEC) | ||
SET(DEAL_II_COMPILER_VECTORIZATION_LEVEL 1) | ||
SET(DEAL_II_VECTORIZATION_WIDTH_IN_BITS 1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why is this not 128?
@@ -297,7 +297,7 @@ ENDIF() | |||
# | |||
|
|||
IF(DEAL_II_WITH_CUDA) | |||
SET(DEAL_II_COMPILER_VECTORIZATION_LEVEL 0) | |||
SET(DEAL_II_VECTORIZATION_WIDTH_IN_BITS 0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
unrelated to this PR but does this mean that CPU vectorization is disabled when compiling with CUDA support?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
include/deal.II/base/vectorization.h
Outdated
|
||
# if DEAL_II_COMPILER_VECTORIZATION_LEVEL >= 2 && defined(__SSE2__) && \ | ||
# if DEAL_II_VECTORIZATION_WIDTH_IN_BITS >= 256 && defined(__SSE2__) && \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand why you need to check for the presence of SSE2 macro here. I know, that was here before...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The problem we try to address here is mostly user projects that set the CPU flags on their own. What can happen is that deal.II is compiled with say AVX512, but then a user project forgets to add -march=native
and falls back to SSE2. This leads to very strange errors as the size of data structures differs between the compiled deal.II code sitting in libdeal_II.so
and the user code.
The second thing this is supposed to prevent is the case when switching between different CPUs with -march=native
. One could blame the user for it, but I like to speak out when something goes wrong. I assume we should put this discussion in the file, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, but you mean specifically the SSE2 macro here, given that x86-64 is the only instruction set right now with 256 bits or more? I think the reasoning for using it was to prevent additional trouble when we have another instruction set (like ARM's SVE) where we could have 256 bits (or 512) but not defined AVX. I guess even that case could be commented - or maybe replaced by a more intuitive #ifdef __x86_64__
(or the 32 bit equivalent, if anyone cares) to explain what we intend to identify here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My question is why the SSE2 check is necessary. It should also warn if the user compiles deal.II with AVX and the own program without any parallelization (so SSE2 is not defined). I think this should still produce an error, so I would remove && defined(SSE2)
here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At least on x86-64, __SSE2__
should always be defined (well, I am not completely sure about MSVC) even if we have explicitly disabled vectorization by e.g. setting DEAL_II_HAVE_SSE2=OFF
because it is part of x86-64. But I do not insist on this topic, so let me remove it and remember that there is something here - well, the first one to try a non x86 architecture with 256+ bit vectors will get this error 😉
include/deal.II/base/vectorization.h
Outdated
!defined(__AVX__) | ||
# error \ | ||
"Mismatch in vectorization capabilities: AVX was detected during configuration of deal.II and switched on, but it is apparently not available for the file you are trying to compile at the moment. Check compilation flags controlling the instruction set, such as -march=native." | ||
# endif | ||
# if DEAL_II_COMPILER_VECTORIZATION_LEVEL >= 3 && defined(__SSE2__) && \ | ||
# if DEAL_II_VECTORIZATION_WIDTH_IN_BITS >= 512 && defined(__SSE2__) && \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same here
@@ -925,7 +925,7 @@ vectorized_transpose_and_store(const bool add_into, | |||
// for safety, also check that __AVX512F__ is defined in case the user manually | |||
// set some conflicting compile flags which prevent compilation | |||
|
|||
# if DEAL_II_COMPILER_VECTORIZATION_LEVEL >= 3 && defined(__AVX512F__) | |||
# if DEAL_II_VECTORIZATION_WIDTH_IN_BITS >= 512 && defined(__AVX512F__) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we remove the define
here and fail instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again, it is to hedge for the non-x86-64 case.
@@ -2125,7 +2125,7 @@ vectorized_transpose_and_store(const bool add_into, | |||
|
|||
# endif | |||
|
|||
# if DEAL_II_COMPILER_VECTORIZATION_LEVEL >= 2 && defined(__AVX__) | |||
# if DEAL_II_VECTORIZATION_WIDTH_IN_BITS >= 256 && defined(__AVX__) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same here and below
@@ -30,31 +30,31 @@ static_assert(std::is_standard_layout<VectorizedArray<float>>::value && | |||
std::is_trivial<VectorizedArray<float>>::value, | |||
"VectorizedArray<float> must be a POD type"); | |||
|
|||
#if DEAL_II_COMPILER_VECTORIZATION_LEVEL >= 1 && !defined(DEAL_II_MSVC) | |||
# if DEAL_II_COMPILER_VECTORIZATION_LEVEL >= 3 && defined(__AVX512F__) | |||
#if DEAL_II_VECTORIZATION_WIDTH_IN_BITS >= 128 && !defined(DEAL_II_MSVC) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you know why we exclude MSVC? This should have a comment here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some systems require the explicit instantiation of the static constexpr
variant, others like MSVC do explicitly not want it. Let me find the respective change in the history.
53a690f
to
b728c51
Compare
@tjhei I addressed your comments except for the two places mentioned above where I instead added more comments to the code why it looks like that. I also adapted the step-37 and step-48 output and wrote a changelog entry. Should be complete now. |
Code looks good now. I would propose again to keep the old #define around without deprecation. |
I added the two settings as separate commits to be able to identify the respective places slightly more easily. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to merge to me. Thanks for applying all the fixes I am asking for.
d905491
to
8c2fc54
Compare
Any additional comments? |
@@ -0,0 +1,5 @@ | |||
Deprecated: The macro DEAL_II_COMPILER_VECTORIZATION_LEVEL (using values 1, 2, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you update this (we don't deprecate after all)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, let us move this to minor instead.
x86-64 is currently the only architecture and the file needs to be extended in case we want to support other architectures as well.
8c2fc54
to
89be6b8
Compare
Based on the discussion in #7321, this PR changes the unintuitive
DEAL_II_COMPILER_VECTORIZATION_LEVEL
withDEAL_II_HIGHEST_N_VECTORIZATION_BITS
. I think this name is a good description of what happens; I do not think it makes sense to split doubles and floats because all systems where we support vectorization have the same number of bits for doubles and floats, and bits is also the marketing name for it rather than "length" where one could be confused by the number of doubles in the SIMD array versus bits. Opinions? Once we agree on the name, I will add a changelog in the incompatibilities section.Fixes #7321.