Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce DEAL_II_VECTORIZATION_WIDTH_IN_BITS #9705

Merged
merged 5 commits into from Mar 27, 2020

Conversation

kronbichler
Copy link
Member

Based on the discussion in #7321, this PR changes the unintuitive DEAL_II_COMPILER_VECTORIZATION_LEVEL with DEAL_II_HIGHEST_N_VECTORIZATION_BITS. I think this name is a good description of what happens; I do not think it makes sense to split doubles and floats because all systems where we support vectorization have the same number of bits for doubles and floats, and bits is also the marketing name for it rather than "length" where one could be confused by the number of doubles in the SIMD array versus bits. Opinions? Once we agree on the name, I will add a changelog in the incompatibilities section.

Fixes #7321.

Copy link
Member

@peterrum peterrum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fine with me!

Copy link
Member

@tamiko tamiko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this change. Would you mind to quickly test my suggestion of "deprecating" the old macro?

#define DEAL_II_COMPILER_VECTORIZATION_LEVEL 1
#else
#define DEAL_II_COMPILER_VECTORIZATION_LEVEL 0
#endif
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would you mind to test quickly whether it is possible to write something like

#if  DEAL_II_COMPILER_HAS_DIAGNOSTIC_PRAGMA
  // ...
  # define DEAL_II_COMPILER_VECTORIZATION_LEVEL _Pragma ("GCC warning \"The DEAL_II_COMPILER_VECTORIZATION_LEVEL macro is deprecated\"") 3
  // ...
#else 
  // what you have  
#endif 

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will try that now.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Works like a charm, I will add this together with the renaming of the variable.

Copy link
Member

@bangerth bangerth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find the word HIGHEST_N_VECTORIZATION_BITS difficult to read. I think for a number of bits, LARGEST would actually be the better choice. In either case, the N_ somewhere in the middle is difficult to mentally parse. What if you used LARGEST_VECTORIZATION_SIZE_IN_BITS?

@@ -116,7 +116,21 @@
*/

#cmakedefine DEAL_II_WORDS_BIGENDIAN
#define DEAL_II_COMPILER_VECTORIZATION_LEVEL @DEAL_II_COMPILER_VECTORIZATION_LEVEL@
#define DEAL_II_HIGHEST_N_VECTORIZATION_BITS @DEAL_II_HIGHEST_N_VECTORIZATION_BITS@
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would you mind adding a comment here documenting what the macro is supposed to be represent (or where else to find documentation)?

@tjhei
Copy link
Member

tjhei commented Mar 22, 2020

How about something like VECTORIZATION_WIDTH or VECTORIZATION_WIDTH_IN_BITS?

@kronbichler
Copy link
Member Author

I like DEAL_II_VECTORIZATION_WIDTH_IN_BITS - the only question is whether we insert a LARGEST or MAX before that because we do technically support narrower SIMD arrays as well, it's just that we need a flag for the widest. But I think this could be covered by comments at the place where we set the macro.

@kronbichler kronbichler changed the title Introduce DEAL_II_HIGHEST_N_VECTORIZATION_BITS Introduce DEAL_II_VECTORIZATION_WIDTH_IN_BITS Mar 23, 2020
@tjhei
Copy link
Member

tjhei commented Mar 23, 2020

the only question is whether we insert a LARGEST or MAX before that because we do technically support narrower SIMD arrays as well

I don't think that is necessary, especially for anyone familiar with hardware.

SET(DEAL_II_EXPAND_REAL_SCALARS_VECTORIZED
"${DEAL_II_EXPAND_REAL_SCALARS_VECTORIZED}" "VectorizedArray<double,2>" "VectorizedArray<float,4>")
SET(DEAL_II_EXPAND_FLOAT_VECTORIZED "${DEAL_II_EXPAND_FLOAT_VECTORIZED}" "VectorizedArray<float,4>")
ENDIF()

IF((${DEAL_II_COMPILER_VECTORIZATION_LEVEL} GREATER 1) AND ( DEAL_II_HAVE_AVX OR DEAL_II_HAVE_ALTIVEC))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what happened with altivec here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AltiVec is 128 bits, so the condition cannot be true, and the altivec case goes to the VECTORIZATION_WIDTH_IN_BITS > 0 case. It took me a while to understand the condition, but I think we simply missed it; in any case, I think we could also skip the AVX and AVX512 parts here because the macro tells us what we have.

* list of possible return values is:
*
* <table>
* <tr>
* <td><tt>VECTORIZATION_LEVEL</tt></td>
* <td><tt>N_VECTORIZATION_BITS</tt></td>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do you want to update this caption?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did, because I do not think the VECTORIZATION_LEVEL makes much sense now any more. I was a bit unsure if we should completely remove this column now that we have the name VECTORIZATION_WIDTH_IN_BITS and two columns with the same information.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for my cryptic comment: I think you need to update this column name to reflect the rename to VECTORIZATION_WIDTH.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or rather, keep it as vectorization level if we decide to keep the second macro or delete the redundant column otherwise.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I'll undo this change for now.

2;
#elif DEAL_II_COMPILER_VECTORIZATION_LEVEL >= 3 && defined(__AVX512F__)
#elif DEAL_II_VECTORIZATION_WIDTH_IN_BITS >= 512 && defined(__AVX512F__)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are these && defined(bla) necessary? I thought we catch this in vectorization.h

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right, we should be getting into these errors here:

# if DEAL_II_COMPILER_VECTORIZATION_LEVEL >= 2 && defined(__SSE2__) && \
!defined(__AVX__)
# error \
"Mismatch in vectorization capabilities: AVX was detected during configuration of deal.II and switched on, but it is apparently not available for the file you are trying to compile at the moment. Check compilation flags controlling the instruction set, such as -march=native."
# endif
# if DEAL_II_COMPILER_VECTORIZATION_LEVEL >= 3 && defined(__SSE2__) && \
!defined(__AVX512F__)
# error \
"Mismatch in vectorization capabilities: AVX-512F was detected during configuration of deal.II and switched on, but it is apparently not available for the file you are trying to compile at the moment. Check compilation flags controlling the instruction set, such as -march=native."
# endif

So I will simply remove the second part.

@@ -81,13 +81,13 @@ namespace internal
* Maximal vector length of VectorizedArray for double.
*/
constexpr static unsigned int max_width =
#if DEAL_II_COMPILER_VECTORIZATION_LEVEL >= 1 && defined(__ALTIVEC__)
#if DEAL_II_VECTORIZATION_WIDTH_IN_BITS >= 128 && defined(__ALTIVEC__)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is altivec a special case here? Can't we just use the width and sort all entries here?

*/
#ifdef DEAL_II_COMPILER_HAS_DIAGNOSTIC_PRAGMA
# if DEAL_II_VECTORIZATION_WIDTH_IN_BITS == 512
# define DEAL_II_COMPILER_VECTORIZATION_LEVEL _Pragma ("GCC warning \"The DEAL_II_COMPILER_VECTORIZATION_LEVEL macro is deprecated\"") 3
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don' think there is any harm in keeping this macro around (sorry, you already put some effort into this).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't feel strongly about it but I guess we have slightly different opinions here. - let's make a poll if we want to deprecate this now with 👍 or just keep it around 👎

Copy link
Member

@tjhei tjhei Mar 23, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same, I don't feel strongly about it. It just means getting annoying warnings in every code using matrix-free (and adding a bunch of #ifdefs in codes that should compile with 9.1 as well).

ENDIF()

IF(DEAL_II_HAVE_ALTIVEC)
SET(DEAL_II_COMPILER_VECTORIZATION_LEVEL 1)
SET(DEAL_II_VECTORIZATION_WIDTH_IN_BITS 1)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is this not 128?

@@ -297,7 +297,7 @@ ENDIF()
#

IF(DEAL_II_WITH_CUDA)
SET(DEAL_II_COMPILER_VECTORIZATION_LEVEL 0)
SET(DEAL_II_VECTORIZATION_WIDTH_IN_BITS 0)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unrelated to this PR but does this mean that CPU vectorization is disabled when compiling with CUDA support?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we had problems with the CUDA compiler when we had intrinsics on that were difficult to resolve, see #7655 and #7542.


# if DEAL_II_COMPILER_VECTORIZATION_LEVEL >= 2 && defined(__SSE2__) && \
# if DEAL_II_VECTORIZATION_WIDTH_IN_BITS >= 256 && defined(__SSE2__) && \
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand why you need to check for the presence of SSE2 macro here. I know, that was here before...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem we try to address here is mostly user projects that set the CPU flags on their own. What can happen is that deal.II is compiled with say AVX512, but then a user project forgets to add -march=native and falls back to SSE2. This leads to very strange errors as the size of data structures differs between the compiled deal.II code sitting in libdeal_II.so and the user code.
The second thing this is supposed to prevent is the case when switching between different CPUs with -march=native. One could blame the user for it, but I like to speak out when something goes wrong. I assume we should put this discussion in the file, right?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, but you mean specifically the SSE2 macro here, given that x86-64 is the only instruction set right now with 256 bits or more? I think the reasoning for using it was to prevent additional trouble when we have another instruction set (like ARM's SVE) where we could have 256 bits (or 512) but not defined AVX. I guess even that case could be commented - or maybe replaced by a more intuitive #ifdef __x86_64__ (or the 32 bit equivalent, if anyone cares) to explain what we intend to identify here.

Copy link
Member

@tjhei tjhei Mar 24, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My question is why the SSE2 check is necessary. It should also warn if the user compiles deal.II with AVX and the own program without any parallelization (so SSE2 is not defined). I think this should still produce an error, so I would remove && defined(SSE2) here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At least on x86-64, __SSE2__ should always be defined (well, I am not completely sure about MSVC) even if we have explicitly disabled vectorization by e.g. setting DEAL_II_HAVE_SSE2=OFF because it is part of x86-64. But I do not insist on this topic, so let me remove it and remember that there is something here - well, the first one to try a non x86 architecture with 256+ bit vectors will get this error 😉

!defined(__AVX__)
# error \
"Mismatch in vectorization capabilities: AVX was detected during configuration of deal.II and switched on, but it is apparently not available for the file you are trying to compile at the moment. Check compilation flags controlling the instruction set, such as -march=native."
# endif
# if DEAL_II_COMPILER_VECTORIZATION_LEVEL >= 3 && defined(__SSE2__) && \
# if DEAL_II_VECTORIZATION_WIDTH_IN_BITS >= 512 && defined(__SSE2__) && \
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here

@@ -925,7 +925,7 @@ vectorized_transpose_and_store(const bool add_into,
// for safety, also check that __AVX512F__ is defined in case the user manually
// set some conflicting compile flags which prevent compilation

# if DEAL_II_COMPILER_VECTORIZATION_LEVEL >= 3 && defined(__AVX512F__)
# if DEAL_II_VECTORIZATION_WIDTH_IN_BITS >= 512 && defined(__AVX512F__)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we remove the define here and fail instead?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, it is to hedge for the non-x86-64 case.

@@ -2125,7 +2125,7 @@ vectorized_transpose_and_store(const bool add_into,

# endif

# if DEAL_II_COMPILER_VECTORIZATION_LEVEL >= 2 && defined(__AVX__)
# if DEAL_II_VECTORIZATION_WIDTH_IN_BITS >= 256 && defined(__AVX__)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here and below

@@ -30,31 +30,31 @@ static_assert(std::is_standard_layout<VectorizedArray<float>>::value &&
std::is_trivial<VectorizedArray<float>>::value,
"VectorizedArray<float> must be a POD type");

#if DEAL_II_COMPILER_VECTORIZATION_LEVEL >= 1 && !defined(DEAL_II_MSVC)
# if DEAL_II_COMPILER_VECTORIZATION_LEVEL >= 3 && defined(__AVX512F__)
#if DEAL_II_VECTORIZATION_WIDTH_IN_BITS >= 128 && !defined(DEAL_II_MSVC)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you know why we exclude MSVC? This should have a comment here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some systems require the explicit instantiation of the static constexpr variant, others like MSVC do explicitly not want it. Let me find the respective change in the history.

source/base/vectorization.cc Outdated Show resolved Hide resolved
tests/base/utilities_02b.cc Outdated Show resolved Hide resolved
@kronbichler
Copy link
Member Author

@tjhei I addressed your comments except for the two places mentioned above where I instead added more comments to the code why it looks like that. I also adapted the step-37 and step-48 output and wrote a changelog entry. Should be complete now.

@tjhei
Copy link
Member

tjhei commented Mar 24, 2020

Code looks good now. I would propose again to keep the old #define around without deprecation.

@kronbichler
Copy link
Member Author

I added the two settings as separate commits to be able to identify the respective places slightly more easily.

Copy link
Member

@tjhei tjhei left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to merge to me. Thanks for applying all the fixes I am asking for.

@kronbichler
Copy link
Member Author

Any additional comments?

@@ -0,0 +1,5 @@
Deprecated: The macro DEAL_II_COMPILER_VECTORIZATION_LEVEL (using values 1, 2,
Copy link
Member

@tjhei tjhei Mar 26, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you update this (we don't deprecate after all)?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, let us move this to minor instead.

x86-64 is currently the only architecture and the file needs to be extended
in case we want to support other architectures as well.
@tjhei tjhei merged commit 9063b76 into dealii:master Mar 27, 2020
@kronbichler kronbichler deleted the vectorization_name_bits branch October 1, 2020 11:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Use better names for SIMD widths in vectorization.h
5 participants