Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

armeabi-v7a assembler error #59970

Open
RobertFlatt opened this issue Mar 13, 2023 · 21 comments
Open

armeabi-v7a assembler error #59970

RobertFlatt opened this issue Mar 13, 2023 · 21 comments
Assignees
Labels
comp:lite TF Lite related issues comp:lite-xnnpack TensorFlow Lite XNNPack related issues stat:awaiting tensorflower Status - Awaiting response from tensorflower type:build/install Build and install issues type:feature Feature requests

Comments

@RobertFlatt
Copy link

RobertFlatt commented Mar 13, 2023

Click to expand!

Issue Type

Bug

Have you reproduced the bug with TF nightly?

No

Source

source

Tensorflow Version

v2.12.0-rc1

Custom Code

No

OS Platform and Distribution

Ubuntu 22.04

Mobile device

N/A

Python version

N/A

Bazel version

Using CMake

GCC/Compiler version

Clang, NDK 25b

CUDA/cuDNN version

No response

GPU model and memory

No response

Current Behaviour?

Building tensorflow lite (v2.12.0-rc1) for Android armeabi-v7a using CMake and NDK 25b, I get the following invalid assembly code error:

tflite-runtime/tensorflow/lite/tools/pip_package/gen/tflite_pip/python3/cmake_build/xnnpack/src/xnnpack/math.h:311:13: error: invalid output constraint \'=t\' in asm\n      : [i] "=t" (i)

The cause is here (a more recent freeze) https://github.com/google/XNNPACK/blob/test_515720556/src/xnnpack/math.h#L332

Android arm64-v8a builds and runs without error. With an earlier tensorflow lite version (v2.8.0) both armeabi-v7a and arm64-v8a built and ran without error.

As I read it '=t' is documented as a valid constraint for "ARM family", but the assembler thinks this is not the case.
https://gcc.gnu.org/onlinedocs/gcc/Simple-Constraints.html#Simple-Constraints
https://gcc.gnu.org/onlinedocs/gcc/Machine-Constraints.html#Machine-Constraints

Support at XNNPACK said a compiler flag -mfpu=vfp is required to enable the assembly code google/XNNPACK#4348 (comment) , and that the flag was set. Then suggested without reference that this was a Clang bug, and did not offer a workaround.

Further investigation suggested Clang was not the issue google/XNNPACK#4348 (comment)

The CMake build script covers eight conditions for various arm 32 bit devices. Only two of these (both -march=armv6) set the required flag. The -mfpu=vfp flag is not set for -march=armv7-a, which I suspect is the cause of this issue. https://github.com/google/XNNPACK/blob/master/CMakeLists.txt#L546-L553

XNNPACK support responded, but we did not communicate successfully (as shown by google/XNNPACK#4348 (comment) and google/XNNPACK#4348 (comment)) ; and we did not get a resolution. Since tflite depends on XNNPACK, I look for resolution here. Thank you.



### Standalone code to reproduce the issue

```shell
This is a build issue, no extra code.

Relevant log output

tflite-runtime/tensorflow/lite/tools/pip_package/gen/tflite_pip/python3/cmake_build/xnnpack/src/xnnpack/math.h:311:13: error: invalid output constraint \'=t\' in asm\n      : [i] "=t" (i)
@google-ml-butler google-ml-butler bot added type:bug Bug type:support Support issues labels Mar 13, 2023
@pjpratik pjpratik added type:build/install Build and install issues comp:lite TF Lite related issues comp:lite-xnnpack TensorFlow Lite XNNPack related issues and removed type:support Support issues labels Mar 14, 2023
@pjpratik pjpratik assigned sachinprasadhs and unassigned pjpratik Mar 15, 2023
@sachinprasadhs sachinprasadhs added the stat:awaiting tensorflower Status - Awaiting response from tensorflower label Mar 21, 2023
@RobertFlatt
Copy link
Author

Hi,

Is there any way to track stat:awaiting tensorflower ?

Thanks

@sachinprasadhs
Copy link
Contributor

Hi, Any update or information will be posted in this issue thread.

@pjpratik pjpratik self-assigned this Apr 22, 2023
@pjpratik
Copy link
Contributor

Hi @RobertFlatt

I have tried to build for armeabi-v7a using the android cross compilation build instructions on r2.12 and was able build successfully without any error. Please find the screenshot below.

Screenshot 2023-04-22 at 9 11 52 AM

With XNNPACK using the command
cmake -DCMAKE_TOOLCHAIN_FILE=android-ndk-r25/build/cmake/android.toolchain.cmake -DANDROID_ABI=armeabi-v7a tensorflow/lite -DTFLITE_ENABLE_XNNPACK=ON

Can you please let us know if you are still facing the issue?

Thanks.

@pjpratik pjpratik added stat:awaiting response Status - Awaiting response from author and removed stat:awaiting tensorflower Status - Awaiting response from tensorflower labels Apr 22, 2023
@RobertFlatt
Copy link
Author

RobertFlatt commented Apr 25, 2023

Can you please let us know if you are still facing the issue?

yes ☹️

  • I can replicate your response (I used git clone -b v2.12.0 --single-branch https://github.com/tensorflow/tensorflow.git tensorflow_src and ndk 25b), but a following cmake --build . -j fails with the same issue that was reported in the original post:
In file included from /home/bobf/ex/tflite_build/xnnpack/src/qc8-gemm/gen/qc8-gemm-1x1c4-minmax-fp32-armsimd32.c:15:
/home/bobf/ex/tflite_build/xnnpack/src/xnnpack/math.h:316:13: error: invalid output constraint '=t' in asm
      : [i] "=t" (i)
  • FYI my use case is slightly different as I am building a pip package for Python (yes, on Android), but the use case in your post illustrates presumably the same issue.

Edit: same result with NDK "current LTS release" 25c https://github.com/android/ndk/wiki#current-lts-release

@google-ml-butler google-ml-butler bot removed the stat:awaiting response Status - Awaiting response from author label Apr 25, 2023
@sachinprasadhs sachinprasadhs added the stat:awaiting tensorflower Status - Awaiting response from tensorflower label Apr 26, 2023
@pkgoogle
Copy link

I was able to replicate on r2.13 branch:

cmake -DCMAKE_TOOLCHAIN_FILE=~/Android/Sdk/ndk/25.2.9519653/build/cmake/android.toolchain.cmake  -DANDROID_ABI=armeabi-v7a ../tensorflow/lite -DTFLITE_ENABLE_XNNPACK=ON
cmake --build . -j
...
gmake[2]: *** [_deps/xnnpack-build/CMakeFiles/microkernels-all.dir/build.make:16694: _deps/xnnpack-build/CMakeFiles/microkernels-all.dir/src/qs8-igemm/gen/qs8-igemm-1x2c4-minmax-fp32-armsimd32.c.o] Error 1
In file included from In file included from /usr/local/google/home/pisethk/tensorflow/tflite_build/xnnpack/src/qu8-gemm/gen/qu8-gemm-1x1c4-minmax-fp32-armsimd32.c/usr/local/google/home/pisethk/tensorflow/tflite_build/xnnpack/src/qs8-vlrelu/gen/qs8-vlrelu-armsimd32-x4.c::1515:
:
/usr/local/google/home/pisethk/tensorflow/tflite_build/xnnpack/src/xnnpack/math.h:332:13/usr/local/google/home/pisethk/tensorflow/tflite_build/xnnpack/src/xnnpack/math.h: :332:error: 13: invalid output constraint '=t' in asm
error: invalid output constraint '=t' in asm
      : [i] "=t" (i)
            ^
      : [i] "=t" (i)
            ^
gmake[2]: *** [_deps/xnnpack-build/CMakeFiles/microkernels-all.dir/build.make:16680: _deps/xnnpack-build/CMakeFiles/microkernels-all.dir/src/qs8-igemm/gen/qs8-igemm-1x1c4-minmax-fp32-armsimd32.c.o] Error 1
1 error generated.
1 error generated.
1 error generated.
gmake[2]: *** [_deps/xnnpack-build/CMakeFiles/microkernels-all.dir/build.make:16554: _deps/xnnpack-build/CMakeFiles/microkernels-all.dir/src/qc8-gemm/gen/qc8-gemm-2x2c4-minmax-fp32-armsimd32.c.o] Error 1
gmake[2]: *** [_deps/xnnpack-build/CMakeFiles/microkernels-all.dir/build.make:16666: _deps/xnnpack-build/CMakeFiles/microkernels-all.dir/src/qs8-gemm/gen/qs8-gemm-2x2c4-minmax-fp32-armsimd32.c.o] Error 1
In file included from /usr/local/google/home/pisethk/tensorflow/tflite_build/xnnpack/src/qs8-vlrelu/gen/qs8-vlrelu-armsimd32-x8.c:15:
/usr/local/google/home/pisethk/tensorflow/tflite_build/xnnpack/src/xnnpack/math.h:332:13: error: invalid output constraint '=t' in asm
      : [i] "=t" (i)
            ^
1 error generated.
gmake[2]: *** [_deps/xnnpack-build/CMakeFiles/microkernels-all.dir/build.make:16708: _deps/xnnpack-build/CMakeFiles/microkernels-all.dir/src/qs8-igemm/gen/qs8-igemm-2x1c4-minmax-fp32-armsimd32.c.o] Error 1
In file included from /usr/local/google/home/pisethk/tensorflow/tflite_build/xnnpack/src/qu8-igemm/gen/qu8-igemm-1x1c4-minmax-fp32-armsimd32.c:15:
/usr/local/google/home/pisethk/tensorflow/tflite_build/xnnpack/src/xnnpack/math.h:332:13: gmake[2]: *** [_deps/xnnpack-build/CMakeFiles/microkernels-all.dir/build.make:16736: _deps/xnnpack-build/CMakeFiles/microkernels-all.dir/src/qs8-vcvt/gen/qs8-vcvt-armsimd32-x4.c.o] Error 1
error: invalid output constraint '=t' in asm
gmake[2]: *** [_deps/xnnpack-build/CMakeFiles/microkernels-all.dir/build.make:16750: _deps/xnnpack-build/CMakeFiles/microkernels-all.dir/src/qs8-vcvt/gen/qs8-vcvt-armsimd32-x8.c.o] Error 1
      : [i] "=t" (i)
            ^
In file included from /usr/local/google/home/pisethk/tensorflow/tflite_build/xnnpack/src/qu8-gemm/gen/qu8-gemm-2x2c4-minmax-fp32-armsimd32.c:15:
/usr/local/google/home/pisethk/tensorflow/tflite_build/xnnpack/src/xnnpack/math.h:332:13: error: invalid output constraint '=t' in asm
      : [i] "=t" (i)
            ^
1 error generated.
1 error generated.
In file included from /usr/local/google/home/pisethk/tensorflow/tflite_build/xnnpack/src/qu8-vcvt/gen/qu8-vcvt-armsimd32-x4.c:15:
/usr/local/google/home/pisethk/tensorflow/tflite_build/xnnpack/src/xnnpack/math.h:332:13: error: invalid output constraint '=t' in asm
gmake[2]: *** [_deps/xnnpack-build/CMakeFiles/microkernels-all.dir/build.make:16596: _deps/xnnpack-build/CMakeFiles/microkernels-all.dir/src/qc8-igemm/gen/qc8-igemm-2x1c4-minmax-fp32-armsimd32.c.o] Error 1
      : [i] "=t" (i)
            ^
In file included from /usr/local/google/home/pisethk/tensorflow/tflite_build/xnnpack/src/qu8-vlrelu/gen/qu8-vlrelu-armsimd32-x4.c:15:

@RobertFlatt
Copy link
Author

There seems to have been some analysis

google/XNNPACK#4775 (comment)
and
google/XNNPACK#4775 (comment)

But I can't see (I'm outside Google) if this has resulted in an updated implementation.....

And I can infer from some of those posts 'just wait for some future NDK where some future Clang will be fixed', I would consider that an insufficient response.

@pkgoogle
Copy link

pkgoogle commented Jun 1, 2023

@RobertFlatt

Does the recommended work around in both those threads suffice for you so far?

@pkgoogle
Copy link

pkgoogle commented Jun 1, 2023

Hi @RobertFlatt

Android NDK version=25 is actually not currently supported, the official documentation https://www.tensorflow.org/lite/android/lite_build currently recommends version 21e.

Can you try with that version instead?

@pkgoogle pkgoogle added stat:awaiting response Status - Awaiting response from author type:feature Feature requests and removed stat:awaiting tensorflower Status - Awaiting response from tensorflower type:bug Bug labels Jun 1, 2023
@RobertFlatt
Copy link
Author

RobertFlatt commented Jun 2, 2023

@pkgoogle

Thanks for your interest.

For the recommended workaround google/XNNPACK#4775 (comment) -DXNNPACK_ENABLE_ARM_BF16=OFF :

EDIT:

Due to typo in NDK specification, my previous comments (below) were using gcc not Clang, and thus incorrect.

cmake -DCMAKE_TOOLCHAIN_FILE=../android-ndk-r25c/build/cmake/android.toolchain.cmake -DANDROID_ABI=armeabi-v7a -DTFLITE_ENABLE_XNNPACK=ON -DXNNPACK_ENABLE_ARM_BF16=OFF ../tensorflow_src/tensorflow/lite

Fails with

/home/bobf/ex/tflite_build/xnnpack/src/qc8-igemm/gen/qc8-igemm-1x2c4-minmax-fp32-armsimd32.c:15: /home/bobf/ex/tflite_build/xnnpack/src/xnnpack/math.h:316:13: error: invalid output constraint '=t' in asm : [i] "=t" (i) ^

I think the issue still exists. Can you confirm?

==================================
THESE PREVIOUS COMMENTS ARE INCORRECT
The compile flag does enable an error free compile with the test case above, which is a good first step.

But I'm still exploring unexpected issues.

For example the value of the flag (ON/OFF) is ignored, the flag's presence disables the bogus assembly code error - the argument is always interpreted as OFF.

So maybe this will be usable, I need to do more testing.

But I still think the project build script should implement the fix. Because the long term viability and side effects of a workaround are not visible to an end user. And the core issue which is build interaction with Clang is clearly in the domain of the tool developers.

===================================

Android NDK version=25 is actually not currently supported, the official documentation https://www.tensorflow.org/lite/android/lite_build currently recommends version 21e.

I think the referenced page has bit rot.

The default NDK is 25c as shown here https://developer.android.com/ndk/downloads
This is confirmed by the tflite CMake build which defaults to 25c if an NDK is not set.

I use 25c.

@google-ml-butler google-ml-butler bot removed the stat:awaiting response Status - Awaiting response from author label Jun 2, 2023
@sachinprasadhs sachinprasadhs removed their assignment Jun 2, 2023
@pkgoogle
Copy link

pkgoogle commented Jun 2, 2023

Hi @RobertFlatt,

While there is always some bit rot in documentation, I have confirmed that that part is accurate. We more or less only support NDK 19, 20, 21. We do not currently support NDK version=25 for now, as such we can't help with this issue unless we are seeing this issue with the recommended version.

@pkgoogle pkgoogle added the stat:awaiting response Status - Awaiting response from author label Jun 2, 2023
@pkgoogle pkgoogle self-assigned this Jun 2, 2023
@RobertFlatt
Copy link
Author

We do not currently support NDK version=25 for now,

This statement is clearly wrong.

@google-ml-butler google-ml-butler bot removed the stat:awaiting response Status - Awaiting response from author label Jun 3, 2023
@sachinprasadhs
Copy link
Contributor

Hi @RobertFlatt ,

Please check the below configuration file which mentions about the supported NDK versions.

tensorflow/configure.py

Lines 38 to 41 in 6088a22

_SUPPORTED_ANDROID_NDK_VERSIONS = [
19, 20, 21
]

@sachinprasadhs sachinprasadhs added the stat:awaiting response Status - Awaiting response from author label Jun 5, 2023
@RobertFlatt
Copy link
Author

@sachinprasadhs

OK, but that is NDK versions for Bazel https://github.com/tensorflow/tensorflow/blob/master/configure.py#L737-L738
The issue at hand is CMake.

The following simple tests use Tensorflow v2.13.0-rc1

Moving away from the issue at hand we use arm64-v8a to test CMake builds with 3 NDKs

Both NDK LTS r25c and r23b build without error, try:

cmake  -DCMAKE_TOOLCHAIN_FILE=../android-ndk-r25c/build/cmake/android.toolchain.cmake -DANDROID_ABI=arm64-v8a  ../tensorflow_src/tensorflow/lite 
cmake --build . -j
cmake  -DCMAKE_TOOLCHAIN_FILE=../android-ndk-r23b/build/cmake/android.toolchain.cmake -DANDROID_ABI=arm64-v8a  ../tensorflow_src/tensorflow/lite 
cmake --build . -j

However r19c fails with

clang: error: the clang compiler does not support '-march=armv8.2-a+bf16'

cmake  -DCMAKE_TOOLCHAIN_FILE=../android-ndk-r19c/build/cmake/android.toolchain.cmake -DANDROID_ABI=arm64-v8a  ../tensorflow_src/tensorflow/lite 
cmake --build . -j

You can easily validate this for yourself using the commands above.

Returning to the issue at hand we use armeabi-v7a and CMake to test the proposed workaround

The proposed workaround -DXNNPACK_ENABLE_ARM_BF16=OFF

cmake  -DCMAKE_TOOLCHAIN_FILE=../android-ndk-r25c/build/cmake/android.toolchain.cmake -DANDROID_ABI=armeabi-v7a -DTFLITE_ENABLE_XNNPACK=ON -DXNNPACK_ENABLE_ARM_BF16=OFF ../tensorflow_src/tensorflow/lite  
cmake --build . -j

But this fails in the usual asm : [i] "=t" (i) way.

Presumably I don't understand how to use the workaround.

Please explain how to use -DXNNPACK_ENABLE_ARM_BF16=OFF , and if you could test the explaination as well that might save us some back and forth. Thank you.

@google-ml-butler google-ml-butler bot removed the stat:awaiting response Status - Awaiting response from author label Jun 11, 2023
@RobertFlatt
Copy link
Author

RobertFlatt commented Jun 12, 2023

Further investigation reveals -DXNNPACK_ENABLE_ARM_BF16=OFF does not workaround the issue.

And we should not expect it to because this is defined in the default armeabi-v7a build behavior https://github.com/google/XNNPACK/blob/master/scripts/build-android-armv7.sh#L59 and we know the default fails to build.

Not clear why it was suggested as a workaround, as adding it clearly has no impact.

This not-a-workaround was apparently suggested here google/XNNPACK#4775 (comment) by the same account that provided non-resolutions to this issue here google/XNNPACK#4348 (comment) , and here google/XNNPACK#4348 (comment) , and (in retrospect) here google/XNNPACK#4348 (comment) . Two of these non-resolutions are referenced in the first post in this issue.

This code used to work, but somebody broke it between 2.9 and 2.12 .

Enough already! Please, a resolution.

@pkgoogle
Copy link

@terryheo can you please take a look? Also can we verify NDK supported versions for CMake Android workflow?

@pkgoogle pkgoogle added the stat:awaiting tensorflower Status - Awaiting response from tensorflower label Jun 12, 2023
@RobertFlatt
Copy link
Author

@pkgoogle
5 weeks have passed, any clarity on this issue?

skottmckay added a commit to microsoft/onnxruntime that referenced this issue Nov 11, 2023
### Description
<!-- Describe your changes. -->
Use different march flag to workaround what appears to be a clang issue.

See tensorflow/tensorflow#59970 for links to
various relevant pieces of info/discussions.


### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
@derpda
Copy link

derpda commented Dec 8, 2023

Trying to compile v2.15.0 for armeabi-v7a, with NDK 25c (officially supported now I believe) and running into the same problem.
Applying the fixes from microsoft/onnxruntime@8d298f6 to XNNPACK's CMakeLists.txt gets me past the first issue, but the build then later fails on XNNPACK microkernel compilation with below (rather cryptic) error:

Long cryptic error (click me)
[ 23%] Building C object _deps/xnnpack-build/CMakeFiles/microkernels-all.dir/src/bf16-gemm/gen/bf16-gemm-1x4c8-minmax-neonbf16-bfdot.c.o
fatal error: error in backend: Cannot select: 0x83465a8: v4bf16 = ARMISD::VEXT 0x836ab38, 0x836ab38, Constant:i32<2>, xnnpack/src/bf16-gemm/gen/bf16-gemm-1x4c8-minmax-neonbf16-bfdot.c:119:22
  0x836ab38: v4bf16,ch = CopyFromReg 0x84e41b8, Register:v4bf16 %54, xnnpack/src/bf16-gemm/gen/bf16-gemm-1x4c8-minmax-neonbf16-bfdot.c:117:9
    0x851e1f0: v4bf16 = Register %54
  0x836ab38: v4bf16,ch = CopyFromReg 0x84e41b8, Register:v4bf16 %54, xnnpack/src/bf16-gemm/gen/bf16-gemm-1x4c8-minmax-neonbf16-bfdot.c:117:9
    0x851e1f0: v4bf16 = Register %54
  0x836afb0: i32 = Constant<2>
In function: xnn_bf16_gemm_minmax_ukernel_1x4c8__neonbf16_bfdot
PLEASE submit a bug report to https://github.com/android-ndk/ndk/issues and include the crash backtrace, preprocessed source, and associated run script.
Stack dump:
0.	Program arguments: /home/peter/dev/sandbox/tensorflow/android-ndk-r25c/toolchains/llvm/prebuilt/linux-x86_64/bin/clang --target=armv7-none-linux-androideabi26 --sysroot=/home/peter/dev/sandbox/tensorflow/android-ndk-r25c/toolchains/llvm/prebuilt/linux-x86_64/sysroot -DEIGEN_MPL2_ONLY -DFXDIV_USE_INLINE_ASSEMBLY=0 -DNOMINMAX=1 -DPTHREADPOOL_NO_DEPRECATED_API=1 -DXNN_ENABLE_ARM_BF16=1 -DXNN_ENABLE_ARM_DOTPROD=1 -DXNN_ENABLE_ARM_FP16_SCALAR=1 -DXNN_ENABLE_ARM_FP16_VECTOR=1 -DXNN_ENABLE_ARM_I8MM=1 -DXNN_ENABLE_ASSEMBLY=1 -DXNN_ENABLE_CPUINFO=1 -DXNN_ENABLE_DWCONV_MULTIPASS=0 -DXNN_ENABLE_GEMM_M_SPECIALIZATION=1 -DXNN_ENABLE_JIT=0 -DXNN_ENABLE_MEMOPT=1 -DXNN_ENABLE_RISCV_VECTOR=1 -DXNN_ENABLE_SPARSE=1 -I/home/peter/dev/sandbox/tensorflow/tensorflow-2.15.0/third_party/xla/third_party/tsl -I/home/peter/dev/sandbox/tensorflow/build_2.15.0_ndk_25c_armeabi-v7a/opencl_headers -I/home/peter/dev/sandbox/tensorflow/build_2.15.0_ndk_25c_armeabi-v7a/vulkan_headers/include -I/home/peter/dev/sandbox/tensorflow/tensorflow-2.15.0/tensorflow/lite/delegates/gpu/common -I/home/peter/dev/sandbox/tensorflow/tensorflow-2.15.0/tensorflow/lite/delegates/gpu/common/task -I/home/peter/dev/sandbox/tensorflow/build_2.15.0_ndk_25c_armeabi-v7a/xnnpack/src -I/home/peter/dev/sandbox/tensorflow/build_2.15.0_ndk_25c_armeabi-v7a/pthreadpool-source/include -I/home/peter/dev/sandbox/tensorflow/build_2.15.0_ndk_25c_armeabi-v7a/FXdiv-source/include -I/home/peter/dev/sandbox/tensorflow/build_2.15.0_ndk_25c_armeabi-v7a/FP16-source/include -g -DANDROID -fdata-sections -ffunction-sections -funwind-tables -fstack-protector-strong -no-canonical-prefixes -D_FORTIFY_SOURCE=2 -march=armv7-a -mthumb -Wformat -Werror=format-security -O3 -DNDEBUG -std=c99 -fPIC -O2 -pthread -fno-math-errno -marm -march=armv8.2-a+bf16 -mfpu=neon-fp-armv8 -MD -MT _deps/xnnpack-build/CMakeFiles/microkernels-all.dir/src/bf16-gemm/gen/bf16-gemm-1x4c8-minmax-neonbf16-bfdot.c.o -MF CMakeFiles/microkernels-all.dir/src/bf16-gemm/gen/bf16-gemm-1x4c8-minmax-neonbf16-bfdot.c.o.d -o CMakeFiles/microkernels-all.dir/src/bf16-gemm/gen/bf16-gemm-1x4c8-minmax-neonbf16-bfdot.c.o -c /home/peter/dev/sandbox/tensorflow/build_2.15.0_ndk_25c_armeabi-v7a/xnnpack/src/bf16-gemm/gen/bf16-gemm-1x4c8-minmax-neonbf16-bfdot.c
1.	<eof> parser at end of file
2.	Code generation
3.	Running pass 'Function Pass Manager' on module '/home/peter/dev/sandbox/tensorflow/build_2.15.0_ndk_25c_armeabi-v7a/xnnpack/src/bf16-gemm/gen/bf16-gemm-1x4c8-minmax-neonbf16-bfdot.c'.
4.	Running pass 'ARM Instruction Selection' on function '@xnn_bf16_gemm_minmax_ukernel_1x4c8__neonbf16_bfdot'
 #0 0x00000000047d91d8 llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) (/home/peter/dev/sandbox/tensorflow/android-ndk-r25c/toolchains/llvm/prebuilt/linux-x86_64/bin/clang+0x47d91d8)
 #1 0x00000000047d8340 llvm::sys::RunSignalHandlers() (/home/peter/dev/sandbox/tensorflow/android-ndk-r25c/toolchains/llvm/prebuilt/linux-x86_64/bin/clang+0x47d8340)
 #2 0x00000000047a3dc3 (/home/peter/dev/sandbox/tensorflow/android-ndk-r25c/toolchains/llvm/prebuilt/linux-x86_64/bin/clang+0x47a3dc3)
 #3 0x00000000047a3d7b (/home/peter/dev/sandbox/tensorflow/android-ndk-r25c/toolchains/llvm/prebuilt/linux-x86_64/bin/clang+0x47a3d7b)
 #4 0x00000000047d7a87 llvm::sys::Process::Exit(int, bool) (/home/peter/dev/sandbox/tensorflow/android-ndk-r25c/toolchains/llvm/prebuilt/linux-x86_64/bin/clang+0x47d7a87)
 #5 0x00000000040dc70a (/home/peter/dev/sandbox/tensorflow/android-ndk-r25c/toolchains/llvm/prebuilt/linux-x86_64/bin/clang+0x40dc70a)
 #6 0x0000000003083072 llvm::report_fatal_error(llvm::Twine const&, bool) (/home/peter/dev/sandbox/tensorflow/android-ndk-r25c/toolchains/llvm/prebuilt/linux-x86_64/bin/clang+0x3083072)
 #7 0x000000000282b5f5 llvm::SelectionDAGISel::CannotYetSelect(llvm::SDNode*) (/home/peter/dev/sandbox/tensorflow/android-ndk-r25c/toolchains/llvm/prebuilt/linux-x86_64/bin/clang+0x282b5f5)
 #8 0x0000000006cf4e77 (/home/peter/dev/sandbox/tensorflow/android-ndk-r25c/toolchains/llvm/prebuilt/linux-x86_64/bin/clang+0x6cf4e77)
 #9 0x000000000641f425 (/home/peter/dev/sandbox/tensorflow/android-ndk-r25c/toolchains/llvm/prebuilt/linux-x86_64/bin/clang+0x641f425)
#10 0x0000000005e86b63 llvm::SelectionDAGISel::DoInstructionSelection() (/home/peter/dev/sandbox/tensorflow/android-ndk-r25c/toolchains/llvm/prebuilt/linux-x86_64/bin/clang+0x5e86b63)
#11 0x0000000005e8710a llvm::SelectionDAGISel::CodeGenAndEmitDAG() (/home/peter/dev/sandbox/tensorflow/android-ndk-r25c/toolchains/llvm/prebuilt/linux-x86_64/bin/clang+0x5e8710a)
#12 0x0000000006417d3c llvm::SelectionDAGISel::SelectAllBasicBlocks(llvm::Function const&) (/home/peter/dev/sandbox/tensorflow/android-ndk-r25c/toolchains/llvm/prebuilt/linux-x86_64/bin/clang+0x6417d3c)
#13 0x0000000006457ad3 llvm::SelectionDAGISel::runOnMachineFunction(llvm::MachineFunction&) (/home/peter/dev/sandbox/tensorflow/android-ndk-r25c/toolchains/llvm/prebuilt/linux-x86_64/bin/clang+0x6457ad3)
#14 0x00000000064572df (/home/peter/dev/sandbox/tensorflow/android-ndk-r25c/toolchains/llvm/prebuilt/linux-x86_64/bin/clang+0x64572df)
#15 0x0000000005d9faea llvm::MachineFunctionPass::runOnFunction(llvm::Function&) (/home/peter/dev/sandbox/tensorflow/android-ndk-r25c/toolchains/llvm/prebuilt/linux-x86_64/bin/clang+0x5d9faea)
#16 0x0000000005da0113 llvm::FPPassManager::runOnFunction(llvm::Function&) (/home/peter/dev/sandbox/tensorflow/android-ndk-r25c/toolchains/llvm/prebuilt/linux-x86_64/bin/clang+0x5da0113)
#17 0x0000000005d9fc6f llvm::FPPassManager::runOnModule(llvm::Module&) (/home/peter/dev/sandbox/tensorflow/android-ndk-r25c/toolchains/llvm/prebuilt/linux-x86_64/bin/clang+0x5d9fc6f)
#18 0x00000000063aa794 llvm::legacy::PassManagerImpl::run(llvm::Module&) (/home/peter/dev/sandbox/tensorflow/android-ndk-r25c/toolchains/llvm/prebuilt/linux-x86_64/bin/clang+0x63aa794)
#19 0x00000000065d6968 clang::EmitBackendOutput(clang::DiagnosticsEngine&, clang::HeaderSearchOptions const&, clang::CodeGenOptions const&, clang::TargetOptions const&, clang::LangOptions const&, llvm::StringRef, llvm::Module*, clang::BackendAction, std::__1::unique_ptr<llvm::raw_pwrite_stream, std::__1::default_delete<llvm::raw_pwrite_stream> >) (/home/peter/dev/sandbox/tensorflow/android-ndk-r25c/toolchains/llvm/prebuilt/linux-x86_64/bin/clang+0x65d6968)
#20 0x00000000060524d5 (/home/peter/dev/sandbox/tensorflow/android-ndk-r25c/toolchains/llvm/prebuilt/linux-x86_64/bin/clang+0x60524d5)
#21 0x0000000005ea25a9 clang::ParseAST(clang::Sema&, bool, bool) (/home/peter/dev/sandbox/tensorflow/android-ndk-r25c/toolchains/llvm/prebuilt/linux-x86_64/bin/clang+0x5ea25a9)
#22 0x00000000063c128d clang::FrontendAction::Execute() (/home/peter/dev/sandbox/tensorflow/android-ndk-r25c/toolchains/llvm/prebuilt/linux-x86_64/bin/clang+0x63c128d)
#23 0x00000000063c112d clang::CompilerInstance::ExecuteAction(clang::FrontendAction&) (/home/peter/dev/sandbox/tensorflow/android-ndk-r25c/toolchains/llvm/prebuilt/linux-x86_64/bin/clang+0x63c112d)
#24 0x00000000063c1541 clang::ExecuteCompilerInvocation(clang::CompilerInstance*) (/home/peter/dev/sandbox/tensorflow/android-ndk-r25c/toolchains/llvm/prebuilt/linux-x86_64/bin/clang+0x63c1541)
#25 0x00000000066a9f54 cc1_main(llvm::ArrayRef<char const*>, char const*, void*) (/home/peter/dev/sandbox/tensorflow/android-ndk-r25c/toolchains/llvm/prebuilt/linux-x86_64/bin/clang+0x66a9f54)
#26 0x00000000066a6de3 (/home/peter/dev/sandbox/tensorflow/android-ndk-r25c/toolchains/llvm/prebuilt/linux-x86_64/bin/clang+0x66a6de3)
#27 0x00000000066a6c92 (/home/peter/dev/sandbox/tensorflow/android-ndk-r25c/toolchains/llvm/prebuilt/linux-x86_64/bin/clang+0x66a6c92)
#28 0x00000000066a6c61 llvm::CrashRecoveryContext::RunSafely(llvm::function_ref<void ()>) (/home/peter/dev/sandbox/tensorflow/android-ndk-r25c/toolchains/llvm/prebuilt/linux-x86_64/bin/clang+0x66a6c61)
#29 0x00000000066a69f4 clang::driver::CC1Command::Execute(llvm::ArrayRef<llvm::Optional<llvm::StringRef> >, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >*, bool*) const (/home/peter/dev/sandbox/tensorflow/android-ndk-r25c/toolchains/llvm/prebuilt/linux-x86_64/bin/clang+0x66a69f4)
#30 0x00000000066a685f clang::driver::Compilation::ExecuteCommand(clang::driver::Command const&, clang::driver::Command const*&) const (/home/peter/dev/sandbox/tensorflow/android-ndk-r25c/toolchains/llvm/prebuilt/linux-x86_64/bin/clang+0x66a685f)
#31 0x00000000066a66f2 clang::driver::Driver::ExecuteCompilation(clang::driver::Compilation&, llvm::SmallVectorImpl<std::__1::pair<int, clang::driver::Command const*> >&) (/home/peter/dev/sandbox/tensorflow/android-ndk-r25c/toolchains/llvm/prebuilt/linux-x86_64/bin/clang+0x66a66f2)
#32 0x00000000066752ee main (/home/peter/dev/sandbox/tensorflow/android-ndk-r25c/toolchains/llvm/prebuilt/linux-x86_64/bin/clang+0x66752ee)
#33 0x00007f6833640cd0 (/usr/lib/libc.so.6+0x27cd0)
#34 0x00007f6833640d8a __libc_start_main (/usr/lib/libc.so.6+0x27d8a)
#35 0x00000000064cce69 _start (/home/peter/dev/sandbox/tensorflow/android-ndk-r25c/toolchains/llvm/prebuilt/linux-x86_64/bin/clang+0x64cce69)
clang: error: clang frontend command failed with exit code 70 (use -v to see invocation)
Android (9352603, based on r450784d1) clang version 14.0.7 (https://android.googlesource.com/toolchain/llvm-project 4c603efb0cca074e9238af8b4106c30add4418f6)
Target: armv7-none-linux-android26
Thread model: posix
InstalledDir: /home/peter/dev/sandbox/tensorflow/android-ndk-r25c/toolchains/llvm/prebuilt/linux-x86_64/bin
clang: note: diagnostic msg: 
********************

PLEASE ATTACH THE FOLLOWING FILES TO THE BUG REPORT:
Preprocessed source(s) and associated run script(s) are located at:
clang: note: diagnostic msg: /tmp/bf16-gemm-1x4c8-minmax-neonbf16-bfdot-463081.c
clang: note: diagnostic msg: /tmp/bf16-gemm-1x4c8-minmax-neonbf16-bfdot-463081.sh
clang: note: diagnostic msg: 

********************
make[2]: *** [_deps/xnnpack-build/CMakeFiles/microkernels-all.dir/build.make:49874: _deps/xnnpack-build/CMakeFiles/microkernels-all.dir/src/bf16-gemm/gen/bf16-gemm-1x4c8-minmax-neonbf16-bfdot.c.o] Error 70
make[1]: *** [CMakeFiles/Makefile2:6653: _deps/xnnpack-build/CMakeFiles/microkernels-all.dir/all] Error 2
make: *** [Makefile:136: all] Error 2

Any update regarding this?

----- Update
Building with NDK 21e also fails with below error

[  1%] Building C object _deps/xnnpack-build/CMakeFiles/microkernels-prod.dir/src/amalgam/gen/neoni8mm.c.o
clang: error: the clang compiler does not support '-march=armv8.2-a+i8mm'

@pfk-beta
Copy link

I was trying to build according to this docs: https://www.tensorflow.org/lite/guide/build_cmake_arm - but it failed. I was trying to use ndk in the same way as you - the same error invalid output constraint '=t' in asm. Bazel built without error.

@zihaomu
Copy link

zihaomu commented Mar 12, 2024

the same issue, test at r2.16 with ndk:26.1.10909125.

@zihaomu
Copy link

zihaomu commented Mar 12, 2024

Hi @pfk-beta, @RobertFlatt, I got a workaround. You can directly modifiy the xnnpack source code which is inside the tflite build folder.
image

The main idea is to bypass the error code and use the default branch.

@zihaomu
Copy link

zihaomu commented Mar 14, 2024

related onnx issue: google/XNNPACK#6164, pr: google/XNNPACK#6179

kleiti pushed a commit to kleiti/onnxruntime that referenced this issue Mar 22, 2024
### Description
<!-- Describe your changes. -->
Use different march flag to workaround what appears to be a clang issue.

See tensorflow/tensorflow#59970 for links to
various relevant pieces of info/discussions.


### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
comp:lite TF Lite related issues comp:lite-xnnpack TensorFlow Lite XNNPack related issues stat:awaiting tensorflower Status - Awaiting response from tensorflower type:build/install Build and install issues type:feature Feature requests
Projects
None yet
Development

No branches or pull requests

8 participants