Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Build fails on GCC 5.4 with "invalid register operand for `vmovdqu'" #58

Closed
erijo opened this issue Feb 10, 2020 · 3 comments · Fixed by #62
Closed

Build fails on GCC 5.4 with "invalid register operand for `vmovdqu'" #58

erijo opened this issue Feb 10, 2020 · 3 comments · Fixed by #62

Comments

@erijo
Copy link
Contributor

erijo commented Feb 10, 2020

When trying to build the C implementation of BLAKE3 on Ubuntu 16.04 LTS (used by Travis) the build fails when compiling blake3_avx512.c:

gcc -O3 -Wall -Wextra -std=c11 -pedantic  -c blake3_avx512.c -o blake3_avx512.o -mavx512f -mavx512vl
/tmp/ccaq5stm.s: Assembler messages:
/tmp/ccaq5stm.s:3763: Error: invalid register operand for `vmovdqu'
/tmp/ccaq5stm.s:3765: Error: invalid register operand for `vmovdqu'
Makefile:40: recipe for target 'blake3_avx512.o' failed
make: *** [blake3_avx512.o] Error 1

Looking at the generated assembler code, GCC generates vmovdqu %ymm17, (%rax) which is an invalid instruction as far as I can tell (VEX encoded instead of EVEX and can thus only access ymm0-ymm15). So it looks to be a compiler bug. But, if I add -mavx512bw GCC instead uses vmovdqu8 %ymm17, (%rax) which compiles.

I assume that get_cpu_features() should be updated to if -mavx512bw is to be used, but other than that, is there any downside with using it?

This error is seen in the CI of ccache, see ccache/ccache#519.

@sneves
Copy link
Collaborator

sneves commented Feb 10, 2020

This is indeed a bug in GCC, present up to 6.4. Another way to fix it, without invoking newer instruction sets, is to replace the stores here by:

    _mm256_mask_storeu_epi32(&out[0 * sizeof(__m256i)], (__mmask8)-1, _mm512_castsi512_si256(padded[0]));
    _mm256_mask_storeu_epi32(&out[1 * sizeof(__m256i)], (__mmask8)-1, _mm512_castsi512_si256(padded[1]));
    _mm256_mask_storeu_epi32(&out[2 * sizeof(__m256i)], (__mmask8)-1, _mm512_castsi512_si256(padded[2]));
    _mm256_mask_storeu_epi32(&out[3 * sizeof(__m256i)], (__mmask8)-1, _mm512_castsi512_si256(padded[3]));
    _mm256_mask_storeu_epi32(&out[4 * sizeof(__m256i)], (__mmask8)-1, _mm512_castsi512_si256(padded[4]));
    _mm256_mask_storeu_epi32(&out[5 * sizeof(__m256i)], (__mmask8)-1, _mm512_castsi512_si256(padded[5]));
    _mm256_mask_storeu_epi32(&out[6 * sizeof(__m256i)], (__mmask8)-1, _mm512_castsi512_si256(padded[6]));
    _mm256_mask_storeu_epi32(&out[7 * sizeof(__m256i)], (__mmask8)-1, _mm512_castsi512_si256(padded[7]));
    _mm256_mask_storeu_epi32(&out[8 * sizeof(__m256i)], (__mmask8)-1, _mm512_castsi512_si256(padded[8]));
    _mm256_mask_storeu_epi32(&out[9 * sizeof(__m256i)], (__mmask8)-1, _mm512_castsi512_si256(padded[9]));
    _mm256_mask_storeu_epi32(&out[10 * sizeof(__m256i)], (__mmask8)-1, _mm512_castsi512_si256(padded[10]));
    _mm256_mask_storeu_epi32(&out[11 * sizeof(__m256i)], (__mmask8)-1, _mm512_castsi512_si256(padded[11]));
    _mm256_mask_storeu_epi32(&out[12 * sizeof(__m256i)], (__mmask8)-1, _mm512_castsi512_si256(padded[12]));
    _mm256_mask_storeu_epi32(&out[13 * sizeof(__m256i)], (__mmask8)-1, _mm512_castsi512_si256(padded[13]));
    _mm256_mask_storeu_epi32(&out[14 * sizeof(__m256i)], (__mmask8)-1, _mm512_castsi512_si256(padded[14]));
    _mm256_mask_storeu_epi32(&out[15 * sizeof(__m256i)], (__mmask8)-1, _mm512_castsi512_si256(padded[15]));

@dlegaultbbry
Copy link

In case it helps anyone, I had something similar happen which I think is this bug:

https://www.mail-archive.com/bug-binutils@gnu.org/msg30569.html
https://sourceware.org/git/gitweb.cgi?p=binutils-gdb.git;h=97ed31ae00ea83410f9daf61ece8a606044af365

In any case, using the -mavx512bw option also resolved my issue until a time I can update the toolchain we use.

/tmp/ccBsfqcL.s: Assembler messages:
/tmp/ccBsfqcL.s:49783: Error: unsupported instruction `vmovdqu'
/tmp/ccBsfqcL.s:49819: Error: unsupported instruction `vmovdqu'
/tmp/ccBsfqcL.s:49871: Error: unsupported instruction `vmovdqu'
/tmp/ccBsfqcL.s:49878: Error: unsupported instruction `vmovdqu'
/tmp/ccBsfqcL.s:49885: Error: unsupported instruction `vmovdqu' 

@erijo
Copy link
Contributor Author

erijo commented Feb 12, 2020

The new assembly implementations added in b6b3c27 also seems to work.

willbryant added a commit to willbryant/digest-blake3 that referenced this issue Oct 18, 2020
…n CentOS 8

GCC 8.3.1 20191121 (Red Hat 8.3.1-5) was erroring out with:

	Error: unsupported instruction `vmovdqu'

BLAKE3-team/BLAKE3#58 advised this was previously caused by a compiler bug up to 6.3, but that clearly isn't applicable here. But, the fix/workaround posted in comment BLAKE3-team/BLAKE3#58 (comment) works.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants