Skip to content
This repository
Browse code

workaround for Intel C++ codegen bug with interleave_second

  • Loading branch information...
commit f398ddb886cd4c9526276431dcadbeb066c9fd00 1 parent 6c95ab3
Mathias Gaunard mgaunard authored
9 modules/boost/simd/swar/include/boost/simd/swar/functions/simd/sse/avx/interleave_second.hpp
@@ -47,9 +47,16 @@ namespace boost { namespace simd { namespace ext
47 47
48 48 BOOST_FORCEINLINE result_type operator()(A0 const& a0, A0 const& a1) const
49 49 {
  50 + // workaround for bad ICC optimisation
  51 + #ifdef __INTEL_COMPILER
  52 + __m256d volatile lo = _mm256_unpacklo_pd(a0,a1);
  53 + #else
  54 + __m256d lo = _mm256_unpacklo_pd(a0,a1);
  55 + #endif
  56 +
50 57 // 0x31 is SCR1[128:255]|SRC2[128:255] according to Intel AVX manual
51 58 // The result of unpack_*_pd puts parts in the proper pairs beforehand
52   - return _mm256_permute2f128_pd ( _mm256_unpacklo_pd(a0,a1)
  59 + return _mm256_permute2f128_pd ( lo
53 60 , _mm256_unpackhi_pd(a0,a1)
54 61 , 0x31
55 62 );

0 comments on commit f398ddb

Please sign in to comment.
Something went wrong with that request. Please try again.