clang14.0.7 from Android NDK crashes when compiling bf16-gemm-1x4c8-minmax-neonbf16-bfdot.c #4775

georgthegreat · 2023-05-04T16:13:18Z

Hi.

I am trying to update xnnpack to 2023-04-18 / 71f70de in order to update tensorflow-lite to 2.12.0.

Unfortunately clang for android segfaults when compiling src/bf16-gemm/gen/bf16-gemm-1x4c8-minmax-neonbf16-bfdot.c for android-armv7a.

I am able to reproduce this:

${NDK}/llvm-toolchain/bin/clang --target=armv7a-linux-androideabi21 -isystem ${NDK}/sources/cxx-stl/llvm-libc++abi/include -fsigned-char -march=armv7-a -mfloat-abi=softfp -c -o ${SRC}/contrib/restricted/google/xnnpack/src/bf16-gemm/gen/bf16-gemm-6x8c2-minmax-neonbf16-bfdot-lane-ld128.c.o -I${SRC} -I${SRC}/contrib/libs/fxdiv -I${SRC}/contrib/libs/pthreadpool/include -I${SRC}/contrib/restricted/google/xnnpack/include -I${SRC}/contrib/libs/fxdiv/include -I${SRC}/contrib/restricted/cpuinfo/include -I${SRC}/contrib/restricted/google/xnnpack/src -I${SRC}/contrib/libs/cxxsupp/libcxx/include -I${SRC}/contrib/libs/fp16/include -I${SRC}/contrib/libs/zlib/include -I${SRC}/contrib/libs/double-conversion -I${SRC}/contrib/libs/libc_compat/include/ifaddrs -fdebug-prefix-map=${SRC}=/-B -Xclang -fdebug-compilation-dir -Xclang /tmp -pipe -O3 -g -fexceptions -fno-common -fcolor-diagnostics -faligned-allocation -fdebug-default-version=4 -ffunction-sections -fdata-sections -w  -DXNN_ENABLE_ARM_BF16=1 -DXNN_ENABLE_ARM_DOTPROD=1 -DXNN_ENABLE_ARM_FP16_SCALAR=1 -DXNN_ENABLE_ARM_FP16_VECTOR=1 -DXNN_ENABLE_ASSEMBLY=1 -DXNN_ENABLE_DWCONV_MULTIPASS=0 -DXNN_ENABLE_GEMM_M_SPECIALIZATION=1 -DXNN_ENABLE_JIT=0 -DXNN_ENABLE_MEMOPT=1 -DXNN_ENABLE_RISCV_VECTOR=1 -DXNN_ENABLE_SPARSE=1 -DXNN_LOG_LEVEL=0 -Werror-implicit-function-declaration -nostdinc++ -fPIE -fPIE -march=armv8.2-a+bf16 -mfpu=neon-fp-armv8 ${SRC}/contrib/restricted/google/xnnpack/src/bf16-gemm/gen/bf16-gemm-6x8c2-minmax-neonbf16-bfdot-lane-ld128.c

I have tested NDK's r25b and r25c, but the segfault still persists and there is no way to fix this.

fatal error: error in backend: Cannot select: 0x90a21a8: v4bf16 = extract_subvector 0x909e178, Constant:i32<0>, /home/thegeorg/arcadia/contrib/restricted/google/xnnpack/src/bf16-gemm/gen/bf
16-gemm-6x8c2-minmax-neonbf16-bfdot-lane-ld128.c:173:33
  0x909e178: v8bf16,i32,ch = ARMISD::VLD1_UPD<(load (s128) from %ir.362, align 2)> 0x8f06068, 0x8cafc48, 0x8cb05a0, Constant:i32<1>, /home/thegeorg/arcadia/contrib/restricted/google/xnnpack
/src/bf16-gemm/gen/bf16-gemm-6x8c2-minmax-neonbf16-bfdot-lane-ld128.c:163:32
    0x8cafc48: i32,ch = CopyFromReg 0x8f06068, Register:i32 %84, /home/thegeorg/arcadia/contrib/restricted/google/xnnpack/src/bf16-gemm/gen/bf16-gemm-6x8c2-minmax-neonbf16-bfdot-lane-ld128.
c:163:32
      0x8cb0260: i32 = Register %84
    0x8cb05a0: i32,ch = CopyFromReg 0x8f06068, Register:i32 %70, /home/thegeorg/arcadia/contrib/restricted/google/xnnpack/src/bf16-gemm/gen/bf16-gemm-6x8c2-minmax-neonbf16-bfdot-lane-ld128.
c:158:89
      0x8cafe50: i32 = Register %70
    0x909e380: i32 = Constant<1>
  0x90b3a08: i32 = Constant<0>
In function: xnn_bf16_gemm_minmax_ukernel_6x8c2__neonbf16_bfdot_lane_ld128
PLEASE submit a bug report to https://github.com/android-ndk/ndk/issues and include the crash backtrace, preprocessed source, and associated run script.

@Maratyszcza, could you, please, help me to find a workaround for this?

The text was updated successfully, but these errors were encountered:

leleliu008 · 2023-05-10T12:29:19Z

step1:

sed -i 's|-march=armv6 -mfpu=vfp|-march=armv7-a -mfpu=neon|' CMakeLists.txt

step2:
add cmake option: -DXNNPACK_ENABLE_ARM_BF16=OFF

successfully built now, but don't know if the right way.

ngzhian · 2023-05-10T16:29:16Z

Manage to get the repro steps down to:

android-ndk-r25c/toolchains/llvm/prebuilt/linux-x86_64/bin/clang --target=armv7a-linux-androideabi21 -Iinclude -Isrc -O1 -march=armv8.2-a+bf16 -mfpu=neon-fp-armv8 src/bf16-gemm/gen/bf16-gemm-6x8c2-minmax-neonbf16-bfdot-lane-ld128.c

Note that -O1, -O2, -O3 fails, but -O0 is fine.

taking a closer look

ngzhian · 2023-05-10T16:30:44Z

add cmake option: -DXNNPACK_ENABLE_ARM_BF16=OFF is the easiest way, it doesn't include the problematic c file.
@georgthegreat if you are getting blocked, that's probably the easiest way.

ngzhian · 2023-05-10T16:54:57Z

https://godbolt.org/z/656PqT6o1
I think it is a clang bug, 16.0.0 fixes it.

ngzhian · 2023-05-10T17:32:30Z

I found that I can workaround the bug with changing:

const uint32x4_t va0c01 = vdupq_lane_u32(vreinterpret_u32_bf16(vget_low_bf16(va0)), 0);

to

const uint32x4_t va0c01 = vdupq_lane_u32(vget_low_u32(vreinterpretq_u32_bf16(va0)), 0);

just the ordering of interpret and get_low.

ngzhian · 2023-05-10T17:42:00Z

vout0x0123 = vreinterpret_bf16_u16(vext_u16(vreinterpret_u16_bf16(vout0x0123), vreinterpret_u16_bf16(vout0x0123), 2)); is still a problem though

I think i can change it to vout0x0123 = vdup_lane_bf16(vout0x0123, 2);

ngzhian · 2023-05-10T18:15:47Z

Hm, looks like we don't test building of these microkernels: https://github.com/google/XNNPACK/blob/master/scripts/build-android-armv7.sh#L59

We might also need to update the ndk version we are testing with: https://github.com/google/XNNPACK/blob/master/.github/workflows/build.yml#LL172C28-L172C28

georgthegreat · 2023-05-10T18:20:10Z

Hm, looks like we don't test building of these microkernels:

There is even a comment about this specific issue:

# BF16 instructions cause ICE in Android NDK compiler

ngzhian · 2023-05-10T19:53:56Z

I think the easiest fix is for you to add this compile flag when building XNNPACK -DXNNPACK_ENABLE_ARM_BF16=OFF, until the ndk ships a newer version of clang.

georgthegreat · 2023-05-10T20:11:50Z

Yep, this is what I did for now.

georgthegreat · 2023-05-12T15:33:36Z

NDK maintainers claim that canary builds with clang16 can compile this code.
I think there is nothing to be done from xnnpack side, so this may be closed if there is no workaround / nobody wants to search for it.

RobertFlatt · 2023-05-31T23:23:18Z

As I read the posts above there are two possible workarounds, either could be implemented in XNNPACK.

Just waiting for some future NDK where some future Clang will be fixed, is I think an insufficient response.

Maratyszcza · 2023-06-01T08:45:44Z

The recommended work-around is to disable ARM BF16 extensions using either -DXNNPACK_ENABLE_ARM_BF16=OFF option (for CMake) or --define xnn_enable_arm_bf16=false option (for Bazel).

RobertFlatt · 2023-06-11T03:50:28Z

Using tensorflow 2.13.0-rc1 and -DXNNPACK_ENABLE_ARM_BF16=OFF

cmake  -DCMAKE_TOOLCHAIN_FILE=../android-ndk-r25c/build/cmake/android.toolchain.cmake -DANDROID_ABI=armeabi-v7a -DTFLITE_ENABLE_XNNPACK=ON -DXNNPACK_ENABLE_ARM_BF16=OFF ../tensorflow_src/tensorflow/lite  
cmake --build . -j

Does not workaround the issue.

And we should not expect it to because this is defined as the default behavior https://github.com/google/XNNPACK/blob/master/scripts/build-android-armv7.sh#L59 and we know the default fails to build.
Not clear why it was suggested as a workaround.

I've been trying to get some attention paid to this for 3.5 months #4348
Please implement a fix.
It seems reasonable to suggest that: the relationship between the code and the build tools is everywhere and always the responsibility of the programmer responsible for the code.

georgthegreat mentioned this issue May 9, 2023

[Bug]: backend cannot select in neon code android/ndk#1883

Closed

leleliu008 mentioned this issue May 10, 2023

armeabi-v7a assembler error #4348

Open

RobertFlatt mentioned this issue May 31, 2023

armeabi-v7a assembler error tensorflow/tensorflow#59970

Open

williamdias mentioned this issue Jul 25, 2023

Linking an Android library with TFLite GPU using CMake causes undefined symbol errors tensorflow/tensorflow#61312

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

clang14.0.7 from Android NDK crashes when compiling bf16-gemm-1x4c8-minmax-neonbf16-bfdot.c #4775

clang14.0.7 from Android NDK crashes when compiling bf16-gemm-1x4c8-minmax-neonbf16-bfdot.c #4775

georgthegreat commented May 4, 2023

leleliu008 commented May 10, 2023

ngzhian commented May 10, 2023 •

edited

ngzhian commented May 10, 2023

ngzhian commented May 10, 2023

ngzhian commented May 10, 2023 •

edited

ngzhian commented May 10, 2023 •

edited

ngzhian commented May 10, 2023 •

edited

georgthegreat commented May 10, 2023 •

edited

ngzhian commented May 10, 2023

georgthegreat commented May 10, 2023

georgthegreat commented May 12, 2023

RobertFlatt commented May 31, 2023

Maratyszcza commented Jun 1, 2023

RobertFlatt commented Jun 11, 2023 •

edited

clang14.0.7 from Android NDK crashes when compiling bf16-gemm-1x4c8-minmax-neonbf16-bfdot.c #4775

clang14.0.7 from Android NDK crashes when compiling bf16-gemm-1x4c8-minmax-neonbf16-bfdot.c #4775

Comments

georgthegreat commented May 4, 2023

leleliu008 commented May 10, 2023

ngzhian commented May 10, 2023 • edited

ngzhian commented May 10, 2023

ngzhian commented May 10, 2023

ngzhian commented May 10, 2023 • edited

ngzhian commented May 10, 2023 • edited

ngzhian commented May 10, 2023 • edited

georgthegreat commented May 10, 2023 • edited

ngzhian commented May 10, 2023

georgthegreat commented May 10, 2023

georgthegreat commented May 12, 2023

RobertFlatt commented May 31, 2023

Maratyszcza commented Jun 1, 2023

RobertFlatt commented Jun 11, 2023 • edited

ngzhian commented May 10, 2023 •

edited

ngzhian commented May 10, 2023 •

edited

ngzhian commented May 10, 2023 •

edited

ngzhian commented May 10, 2023 •

edited

georgthegreat commented May 10, 2023 •

edited

RobertFlatt commented Jun 11, 2023 •

edited