-
Notifications
You must be signed in to change notification settings - Fork 333
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Build fails on GCC 5.4 with "invalid register operand for `vmovdqu'" #58
Comments
This is indeed a bug in GCC, present up to 6.4. Another way to fix it, without invoking newer instruction sets, is to replace the stores here by: _mm256_mask_storeu_epi32(&out[0 * sizeof(__m256i)], (__mmask8)-1, _mm512_castsi512_si256(padded[0]));
_mm256_mask_storeu_epi32(&out[1 * sizeof(__m256i)], (__mmask8)-1, _mm512_castsi512_si256(padded[1]));
_mm256_mask_storeu_epi32(&out[2 * sizeof(__m256i)], (__mmask8)-1, _mm512_castsi512_si256(padded[2]));
_mm256_mask_storeu_epi32(&out[3 * sizeof(__m256i)], (__mmask8)-1, _mm512_castsi512_si256(padded[3]));
_mm256_mask_storeu_epi32(&out[4 * sizeof(__m256i)], (__mmask8)-1, _mm512_castsi512_si256(padded[4]));
_mm256_mask_storeu_epi32(&out[5 * sizeof(__m256i)], (__mmask8)-1, _mm512_castsi512_si256(padded[5]));
_mm256_mask_storeu_epi32(&out[6 * sizeof(__m256i)], (__mmask8)-1, _mm512_castsi512_si256(padded[6]));
_mm256_mask_storeu_epi32(&out[7 * sizeof(__m256i)], (__mmask8)-1, _mm512_castsi512_si256(padded[7]));
_mm256_mask_storeu_epi32(&out[8 * sizeof(__m256i)], (__mmask8)-1, _mm512_castsi512_si256(padded[8]));
_mm256_mask_storeu_epi32(&out[9 * sizeof(__m256i)], (__mmask8)-1, _mm512_castsi512_si256(padded[9]));
_mm256_mask_storeu_epi32(&out[10 * sizeof(__m256i)], (__mmask8)-1, _mm512_castsi512_si256(padded[10]));
_mm256_mask_storeu_epi32(&out[11 * sizeof(__m256i)], (__mmask8)-1, _mm512_castsi512_si256(padded[11]));
_mm256_mask_storeu_epi32(&out[12 * sizeof(__m256i)], (__mmask8)-1, _mm512_castsi512_si256(padded[12]));
_mm256_mask_storeu_epi32(&out[13 * sizeof(__m256i)], (__mmask8)-1, _mm512_castsi512_si256(padded[13]));
_mm256_mask_storeu_epi32(&out[14 * sizeof(__m256i)], (__mmask8)-1, _mm512_castsi512_si256(padded[14]));
_mm256_mask_storeu_epi32(&out[15 * sizeof(__m256i)], (__mmask8)-1, _mm512_castsi512_si256(padded[15])); |
In case it helps anyone, I had something similar happen which I think is this bug: https://www.mail-archive.com/bug-binutils@gnu.org/msg30569.html In any case, using the -mavx512bw option also resolved my issue until a time I can update the toolchain we use.
|
The new assembly implementations added in b6b3c27 also seems to work. |
…n CentOS 8 GCC 8.3.1 20191121 (Red Hat 8.3.1-5) was erroring out with: Error: unsupported instruction `vmovdqu' BLAKE3-team/BLAKE3#58 advised this was previously caused by a compiler bug up to 6.3, but that clearly isn't applicable here. But, the fix/workaround posted in comment BLAKE3-team/BLAKE3#58 (comment) works.
When trying to build the C implementation of BLAKE3 on Ubuntu 16.04 LTS (used by Travis) the build fails when compiling blake3_avx512.c:
Looking at the generated assembler code, GCC generates
vmovdqu %ymm17, (%rax)
which is an invalid instruction as far as I can tell (VEX encoded instead of EVEX and can thus only access ymm0-ymm15). So it looks to be a compiler bug. But, if I add-mavx512bw
GCC instead usesvmovdqu8 %ymm17, (%rax)
which compiles.I assume that
get_cpu_features()
should be updated to if-mavx512bw
is to be used, but other than that, is there any downside with using it?This error is seen in the CI of ccache, see ccache/ccache#519.
The text was updated successfully, but these errors were encountered: