Introduce DEAL_II_VECTORIZATION_WIDTH_IN_BITS #9705

kronbichler · 2020-03-21T16:30:54Z

Based on the discussion in #7321, this PR changes the unintuitive DEAL_II_COMPILER_VECTORIZATION_LEVEL with DEAL_II_HIGHEST_N_VECTORIZATION_BITS. I think this name is a good description of what happens; I do not think it makes sense to split doubles and floats because all systems where we support vectorization have the same number of bits for doubles and floats, and bits is also the marketing name for it rather than "length" where one could be confused by the number of doubles in the SIMD array versus bits. Opinions? Once we agree on the name, I will add a changelog in the incompatibilities section.

Fixes #7321.

peterrum

Fine with me!

tamiko

I like this change. Would you mind to quickly test my suggestion of "deprecating" the old macro?

tamiko · 2020-03-22T17:46:47Z

include/deal.II/base/config.h.in

+#define DEAL_II_COMPILER_VECTORIZATION_LEVEL 1
+#else
+#define DEAL_II_COMPILER_VECTORIZATION_LEVEL 0
+#endif


Would you mind to test quickly whether it is possible to write something like

#if DEAL_II_COMPILER_HAS_DIAGNOSTIC_PRAGMA // ... # define DEAL_II_COMPILER_VECTORIZATION_LEVEL _Pragma ("GCC warning \"The DEAL_II_COMPILER_VECTORIZATION_LEVEL macro is deprecated\"") 3 // ... #else // what you have #endif

I will try that now.

Works like a charm, I will add this together with the renaming of the variable.

bangerth

I find the word HIGHEST_N_VECTORIZATION_BITS difficult to read. I think for a number of bits, LARGEST would actually be the better choice. In either case, the N_ somewhere in the middle is difficult to mentally parse. What if you used LARGEST_VECTORIZATION_SIZE_IN_BITS?

bangerth · 2020-03-22T21:40:32Z

include/deal.II/base/config.h.in

@@ -116,7 +116,21 @@
 */

 #cmakedefine DEAL_II_WORDS_BIGENDIAN
-#define DEAL_II_COMPILER_VECTORIZATION_LEVEL @DEAL_II_COMPILER_VECTORIZATION_LEVEL@
+#define DEAL_II_HIGHEST_N_VECTORIZATION_BITS @DEAL_II_HIGHEST_N_VECTORIZATION_BITS@


Would you mind adding a comment here documenting what the macro is supposed to be represent (or where else to find documentation)?

tjhei · 2020-03-22T22:22:15Z

How about something like VECTORIZATION_WIDTH or VECTORIZATION_WIDTH_IN_BITS?

kronbichler · 2020-03-23T05:40:29Z

I like DEAL_II_VECTORIZATION_WIDTH_IN_BITS - the only question is whether we insert a LARGEST or MAX before that because we do technically support narrower SIMD arrays as well, it's just that we need a flag for the widest. But I think this could be covered by comments at the place where we set the macro.

tjhei · 2020-03-23T12:40:53Z

the only question is whether we insert a LARGEST or MAX before that because we do technically support narrower SIMD arrays as well

I don't think that is necessary, especially for anyone familiar with hardware.

tjhei · 2020-03-23T12:41:30Z

cmake/configure/configure_vectorization.cmake

   SET(DEAL_II_EXPAND_REAL_SCALARS_VECTORIZED 
      "${DEAL_II_EXPAND_REAL_SCALARS_VECTORIZED}" "VectorizedArray<double,2>" "VectorizedArray<float,4>")
   SET(DEAL_II_EXPAND_FLOAT_VECTORIZED  "${DEAL_II_EXPAND_FLOAT_VECTORIZED}" "VectorizedArray<float,4>")
 ENDIF()

-IF((${DEAL_II_COMPILER_VECTORIZATION_LEVEL} GREATER 1) AND ( DEAL_II_HAVE_AVX OR DEAL_II_HAVE_ALTIVEC))


what happened with altivec here?

AltiVec is 128 bits, so the condition cannot be true, and the altivec case goes to the VECTORIZATION_WIDTH_IN_BITS > 0 case. It took me a while to understand the condition, but I think we simply missed it; in any case, I think we could also skip the AVX and AVX512 parts here because the macro tells us what we have.

tjhei · 2020-03-23T12:43:54Z

include/deal.II/base/utilities.h

     * list of possible return values is:
     *
     * <table>
     * <tr>
-     *   <td><tt>VECTORIZATION_LEVEL</tt></td>
+     *   <td><tt>N_VECTORIZATION_BITS</tt></td>


do you want to update this caption?

I did, because I do not think the VECTORIZATION_LEVEL makes much sense now any more. I was a bit unsure if we should completely remove this column now that we have the name VECTORIZATION_WIDTH_IN_BITS and two columns with the same information.

Sorry for my cryptic comment: I think you need to update this column name to reflect the rename to VECTORIZATION_WIDTH.

Or rather, keep it as vectorization level if we decide to keep the second macro or delete the redundant column otherwise.

OK, I'll undo this change for now.

tjhei · 2020-03-23T12:44:28Z

include/deal.II/base/numbers.h

      2;
-#elif DEAL_II_COMPILER_VECTORIZATION_LEVEL >= 3 && defined(__AVX512F__)
+#elif DEAL_II_VECTORIZATION_WIDTH_IN_BITS >= 512 && defined(__AVX512F__)


are these && defined(bla) necessary? I thought we catch this in vectorization.h

You are right, we should be getting into these errors here:

dealii/include/deal.II/base/vectorization.h

Lines 46 to 55 in 61a022c

# if DEAL_II_COMPILER_VECTORIZATION_LEVEL >= 2 && defined(__SSE2__) && \

!defined(__AVX__)

# error \

"Mismatch in vectorization capabilities: AVX was detected during configuration of deal.II and switched on, but it is apparently not available for the file you are trying to compile at the moment. Check compilation flags controlling the instruction set, such as -march=native."

# endif

# if DEAL_II_COMPILER_VECTORIZATION_LEVEL >= 3 && defined(__SSE2__) && \

!defined(__AVX512F__)

# error \

"Mismatch in vectorization capabilities: AVX-512F was detected during configuration of deal.II and switched on, but it is apparently not available for the file you are trying to compile at the moment. Check compilation flags controlling the instruction set, such as -march=native."

# endif

So I will simply remove the second part.

tjhei · 2020-03-23T12:44:58Z

include/deal.II/base/numbers.h

@@ -81,13 +81,13 @@ namespace internal
     * Maximal vector length of VectorizedArray for double.
     */
    constexpr static unsigned int max_width =
-#if DEAL_II_COMPILER_VECTORIZATION_LEVEL >= 1 && defined(__ALTIVEC__)
+#if DEAL_II_VECTORIZATION_WIDTH_IN_BITS >= 128 && defined(__ALTIVEC__)


why is altivec a special case here? Can't we just use the width and sort all entries here?

tjhei · 2020-03-23T12:46:59Z

include/deal.II/base/config.h.in

+ */
+#ifdef DEAL_II_COMPILER_HAS_DIAGNOSTIC_PRAGMA
+#  if DEAL_II_VECTORIZATION_WIDTH_IN_BITS == 512
+#  define DEAL_II_COMPILER_VECTORIZATION_LEVEL _Pragma ("GCC warning \"The DEAL_II_COMPILER_VECTORIZATION_LEVEL macro is deprecated\"") 3


I don' think there is any harm in keeping this macro around (sorry, you already put some effort into this).

I don't feel strongly about it but I guess we have slightly different opinions here. - let's make a poll if we want to deprecate this now with 👍 or just keep it around 👎

Same, I don't feel strongly about it. It just means getting annoying warnings in every code using matrix-free (and adding a bunch of #ifdefs in codes that should compile with 9.1 as well).

tjhei · 2020-03-24T15:14:58Z

cmake/checks/check_01_cpu_features.cmake

 ENDIF()

 IF(DEAL_II_HAVE_ALTIVEC)
-  SET(DEAL_II_COMPILER_VECTORIZATION_LEVEL 1)
+  SET(DEAL_II_VECTORIZATION_WIDTH_IN_BITS 1)


why is this not 128?

tjhei · 2020-03-24T15:16:03Z

cmake/checks/check_01_cpu_features.cmake

@@ -297,7 +297,7 @@ ENDIF()
 #

 IF(DEAL_II_WITH_CUDA)
-  SET(DEAL_II_COMPILER_VECTORIZATION_LEVEL 0)
+  SET(DEAL_II_VECTORIZATION_WIDTH_IN_BITS 0)


unrelated to this PR but does this mean that CPU vectorization is disabled when compiling with CUDA support?

Yes, we had problems with the CUDA compiler when we had intrinsics on that were difficult to resolve, see #7655 and #7542.

tjhei · 2020-03-24T15:19:19Z

include/deal.II/base/vectorization.h


-#  if DEAL_II_COMPILER_VECTORIZATION_LEVEL >= 2 && defined(__SSE2__) && \
+#  if DEAL_II_VECTORIZATION_WIDTH_IN_BITS >= 256 && defined(__SSE2__) && \


I don't understand why you need to check for the presence of SSE2 macro here. I know, that was here before...

The problem we try to address here is mostly user projects that set the CPU flags on their own. What can happen is that deal.II is compiled with say AVX512, but then a user project forgets to add -march=native and falls back to SSE2. This leads to very strange errors as the size of data structures differs between the compiled deal.II code sitting in libdeal_II.so and the user code.
The second thing this is supposed to prevent is the case when switching between different CPUs with -march=native. One could blame the user for it, but I like to speak out when something goes wrong. I assume we should put this discussion in the file, right?

Ah, but you mean specifically the SSE2 macro here, given that x86-64 is the only instruction set right now with 256 bits or more? I think the reasoning for using it was to prevent additional trouble when we have another instruction set (like ARM's SVE) where we could have 256 bits (or 512) but not defined AVX. I guess even that case could be commented - or maybe replaced by a more intuitive #ifdef __x86_64__ (or the 32 bit equivalent, if anyone cares) to explain what we intend to identify here.

My question is why the SSE2 check is necessary. It should also warn if the user compiles deal.II with AVX and the own program without any parallelization (so SSE2 is not defined). I think this should still produce an error, so I would remove && defined(SSE2) here.

At least on x86-64, __SSE2__ should always be defined (well, I am not completely sure about MSVC) even if we have explicitly disabled vectorization by e.g. setting DEAL_II_HAVE_SSE2=OFF because it is part of x86-64. But I do not insist on this topic, so let me remove it and remember that there is something here - well, the first one to try a non x86 architecture with 256+ bit vectors will get this error 😉

tjhei · 2020-03-24T15:19:30Z

include/deal.II/base/vectorization.h

    !defined(__AVX__)
 #    error \
      "Mismatch in vectorization capabilities: AVX was detected during configuration of deal.II and switched on, but it is apparently not available for the file you are trying to compile at the moment. Check compilation flags controlling the instruction set, such as -march=native."
 #  endif
-#  if DEAL_II_COMPILER_VECTORIZATION_LEVEL >= 3 && defined(__SSE2__) && \
+#  if DEAL_II_VECTORIZATION_WIDTH_IN_BITS >= 512 && defined(__SSE2__) && \


tjhei · 2020-03-24T15:19:59Z

include/deal.II/base/vectorization.h

@@ -925,7 +925,7 @@ vectorized_transpose_and_store(const bool                            add_into,
 // for safety, also check that __AVX512F__ is defined in case the user manually
 // set some conflicting compile flags which prevent compilation

-#  if DEAL_II_COMPILER_VECTORIZATION_LEVEL >= 3 && defined(__AVX512F__)
+#  if DEAL_II_VECTORIZATION_WIDTH_IN_BITS >= 512 && defined(__AVX512F__)


should we remove the define here and fail instead?

Again, it is to hedge for the non-x86-64 case.

tjhei · 2020-03-24T15:20:08Z

include/deal.II/base/vectorization.h

@@ -2125,7 +2125,7 @@ vectorized_transpose_and_store(const bool                        add_into,

 #  endif

-#  if DEAL_II_COMPILER_VECTORIZATION_LEVEL >= 2 && defined(__AVX__)
+#  if DEAL_II_VECTORIZATION_WIDTH_IN_BITS >= 256 && defined(__AVX__)


same here and below

tjhei · 2020-03-24T15:21:00Z

source/base/vectorization.cc

@@ -30,31 +30,31 @@ static_assert(std::is_standard_layout<VectorizedArray<float>>::value &&
                std::is_trivial<VectorizedArray<float>>::value,
              "VectorizedArray<float> must be a POD type");

-#if DEAL_II_COMPILER_VECTORIZATION_LEVEL >= 1 && !defined(DEAL_II_MSVC)
-#  if DEAL_II_COMPILER_VECTORIZATION_LEVEL >= 3 && defined(__AVX512F__)
+#if DEAL_II_VECTORIZATION_WIDTH_IN_BITS >= 128 && !defined(DEAL_II_MSVC)


Do you know why we exclude MSVC? This should have a comment here.

Some systems require the explicit instantiation of the static constexpr variant, others like MSVC do explicitly not want it. Let me find the respective change in the history.

source/base/vectorization.cc

tests/base/utilities_02b.cc

kronbichler · 2020-03-24T17:05:10Z

@tjhei I addressed your comments except for the two places mentioned above where I instead added more comments to the code why it looks like that. I also adapted the step-37 and step-48 output and wrote a changelog entry. Should be complete now.

tjhei · 2020-03-24T17:39:04Z

Code looks good now. I would propose again to keep the old #define around without deprecation.

kronbichler · 2020-03-24T17:50:10Z

I added the two settings as separate commits to be able to identify the respective places slightly more easily.

tjhei

Looks good to merge to me. Thanks for applying all the fixes I am asking for.

kronbichler · 2020-03-26T20:33:41Z

Any additional comments?

tjhei · 2020-03-26T21:02:43Z

doc/news/changes/incompatibilities/20200324MartinKronbichler

@@ -0,0 +1,5 @@
+Deprecated: The macro DEAL_II_COMPILER_VECTORIZATION_LEVEL (using values 1, 2,


can you update this (we don't deprecate after all)?

Right, let us move this to minor instead.

x86-64 is currently the only architecture and the file needs to be extended in case we want to support other architectures as well.

kronbichler added SIMD ready to test labels Mar 21, 2020

kronbichler added this to the Release 9.2 milestone Mar 21, 2020

peterrum approved these changes Mar 21, 2020

View reviewed changes

tamiko approved these changes Mar 22, 2020

View reviewed changes

bangerth reviewed Mar 22, 2020

View reviewed changes

kronbichler force-pushed the vectorization_name_bits branch from 5c4ad63 to 74dcfa3 Compare March 23, 2020 07:17

kronbichler changed the title ~~Introduce DEAL_II_HIGHEST_N_VECTORIZATION_BITS~~ Introduce DEAL_II_VECTORIZATION_WIDTH_IN_BITS Mar 23, 2020

tjhei reviewed Mar 23, 2020

View reviewed changes

kronbichler force-pushed the vectorization_name_bits branch from 74dcfa3 to 53a690f Compare March 23, 2020 14:26

tjhei reviewed Mar 24, 2020

View reviewed changes

kronbichler force-pushed the vectorization_name_bits branch from 53a690f to b728c51 Compare March 24, 2020 16:46

tjhei approved these changes Mar 24, 2020

View reviewed changes

kronbichler added 2 commits March 25, 2020 07:11

Introduce DEAL_II_VECTORIZATION_WIDTH_IN_BITS

473ed33

Update example output

ac13153

kronbichler force-pushed the vectorization_name_bits branch from d905491 to 8c2fc54 Compare March 25, 2020 06:11

tjhei reviewed Mar 26, 2020

View reviewed changes

kronbichler added 3 commits March 26, 2020 22:27

Changelog

e298697

Do not deprecate DEAL_II_COMPILER_VECTORIZATION_LEVEL for now

d5d63e2

Unconditionally bail out on non-AVX 256 vectorization without SSE2.

89be6b8

x86-64 is currently the only architecture and the file needs to be extended in case we want to support other architectures as well.

kronbichler force-pushed the vectorization_name_bits branch from 8c2fc54 to 89be6b8 Compare March 26, 2020 21:28

tjhei approved these changes Mar 26, 2020

View reviewed changes

tjhei merged commit 9063b76 into dealii:master Mar 27, 2020

kronbichler deleted the vectorization_name_bits branch October 1, 2020 11:16

	# if DEAL_II_COMPILER_VECTORIZATION_LEVEL >= 2 && defined(__SSE2__) && \
	!defined(__AVX__)
	# error \
	"Mismatch in vectorization capabilities: AVX was detected during configuration of deal.II and switched on, but it is apparently not available for the file you are trying to compile at the moment. Check compilation flags controlling the instruction set, such as -march=native."
	# endif
	# if DEAL_II_COMPILER_VECTORIZATION_LEVEL >= 3 && defined(__SSE2__) && \
	!defined(__AVX512F__)
	# error \
	"Mismatch in vectorization capabilities: AVX-512F was detected during configuration of deal.II and switched on, but it is apparently not available for the file you are trying to compile at the moment. Check compilation flags controlling the instruction set, such as -march=native."
	# endif


		# if DEAL_II_COMPILER_VECTORIZATION_LEVEL >= 2 && defined(__SSE2__) && \
		# if DEAL_II_VECTORIZATION_WIDTH_IN_BITS >= 256 && defined(__SSE2__) && \

		@@ -0,0 +1,5 @@
		Deprecated: The macro DEAL_II_COMPILER_VECTORIZATION_LEVEL (using values 1, 2,

Introduce DEAL_II_VECTORIZATION_WIDTH_IN_BITS #9705

Introduce DEAL_II_VECTORIZATION_WIDTH_IN_BITS #9705

Conversation

kronbichler commented Mar 21, 2020

peterrum left a comment

Choose a reason for hiding this comment

tamiko left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bangerth left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tjhei commented Mar 22, 2020

kronbichler commented Mar 23, 2020

tjhei commented Mar 23, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tjhei Mar 23, 2020 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tjhei Mar 24, 2020 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kronbichler commented Mar 24, 2020

tjhei commented Mar 24, 2020

kronbichler commented Mar 24, 2020

tjhei left a comment

Choose a reason for hiding this comment

kronbichler commented Mar 26, 2020

tjhei Mar 26, 2020 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tjhei Mar 23, 2020 •

edited

tjhei Mar 24, 2020 •

edited

tjhei Mar 26, 2020 •

edited