Enabling XNNPACK with Raspberry Pi Zero/W

<details><summary>Click to expand!</summary> 
 
 ### Issue Type

Build/Install

### Have you reproduced the bug with TF nightly?

No

### Source

source

### Tensorflow Version

2.12.0

### Custom Code

No

### OS Platform and Distribution

Linux Raspberrypi OS 32-bit (Debian bullseye)

### Mobile device

Raspberry Pi Zero W

### Python version

3.9.2

### Bazel version

cmake 3.18.4

### GCC/Compiler version

GNU c++ (Raspbian 10.2.1-6+rpi1) 10.2.1 20210110

### CUDA/cuDNN version

NA

### GPU model and memory

NA

### Current Behaviour?
The tf-lite build instructions for Raspberry Pi Zero/Zero W state that the following should be part
 of the CFLAGS/CXXFLAGS:
`-march=armv6 -mfpu=vfp -mfloat-abi=hard -funsafe-math-optimizations`

As per the [README for xxnpack](https://github.com/google/XNNPACK#raspberry-pi), XNNPACK supports running on the armv6 with vpf that's the Raspberry Pi Zero W. However all build instructions for Raspberry Pi Zero request explicitly disabling xnnpack. Given the support for rpi0 in xnnpack documentation, I tried to build tf-lite with xnnpack enabled.

When the xnnpack sub-build is enabled, the following conflicting CFLAGS are added to the compiler invocation during the xnnpack sub-build:
``` -marm  -march=armv8.2-a+dotprod -mfpu=neon-fp-armv8```

Please document/extend the cmake and build instructions to allow tf-lite to build correctly with xnnpack enabled for the Raspberry Pi Zero/Zero W.

### Standalone code to reproduce the issue

```shell
cmake \
   -DCMAKE_C_FLAGS='-march=armv6 -mfpu=vfp -mfloat-abi=hard -funsafe-math-optimizations -I/usr/include/python3.9 -I/usr/lib/python3/dist-packages/pybind11/include -I/usr/lib/python3/dist-packages/numpy/core/include' \
   -DCMAKE_CXX_FLAGS='-march=armv6 -mfpu=vfp -mfloat-abi=hard -funsafe-math-optimizations -I/usr/include/python3.9 -I/usr/lib/python3/dist-packages/pybind11/include -I/usr/lib/python3/dist-packages/numpy/core/include'  \
   -DCMAKE_VERBOSE_MAKEFILE:BOOL=ON \
   -DCMAKE_SYSTEM_NAME=Linux \
   -DCMAKE_SYSTEM_PROCESSOR=armv6  \
   -DTFLITE_ENABLE_XNNPACK=ON \
   /home/samveen/tensorflow/build/../tensorflow/lite
...
cmake --build . --verbose -t _pywrap_tensorflow_interpreter_wrapper
```


### Relevant log output

```shell
/usr/bin/gmake  -f _deps/xnnpack-build/CMakeFiles/microkernels-prod.dir/build.make _deps/xnnpack-build/CMakeFiles/microkernels-prod.dir/build
gmake[3]: Entering directory '/home/samveen/tensorflow/build/gen/tflite_pip/python3/cmake_build'
[ 47%] Building C object _deps/xnnpack-build/CMakeFiles/microkernels-prod.dir/src/amalgam/neondot.c.o
cd /home/samveen/tensorflow/build/gen/tflite_pip/python3/cmake_build/_deps/xnnpack-build && /usr/bin/cc -DEIGEN_MPL2_ONLY -DFXDIV_USE_INLINE_ASSEMBLY=0 -DNOMINMAX=1 -DPTHREADPOOL_NO_DEPRECATED_API=1 -DXNN_ENABLE_ARM_BF16=1 -DXNN_ENABLE_ARM_DOTPROD=1 -DXNN_ENABLE_ARM_FP16_SCALAR=1 -DXNN_ENABLE_ARM_FP16_VECTOR=1 -DXNN_ENABLE_ASSEMBLY=1 -DXNN_ENABLE_DWCONV_MULTIPASS=0 -DXNN_ENABLE_GEMM_M_SPECIALIZATION=1 -DXNN_ENABLE_JIT=0 -DXNN_ENABLE_MEMOPT=1 -DXNN_ENABLE_RISCV_VECTOR=1 -DXNN_ENABLE_SPARSE=1 -I/home/samveen/tensorflow/build/gen/tflite_pip/python3/cmake_build/xnnpack/src -I/home/samveen/tensorflow/build/gen/tflite_pip/python3/cmake_build/pthreadpool-source/include -I/home/samveen/tensorflow/build/gen/tflite_pip/python3/cmake_build/FXdiv-source/include -I/home/samveen/tensorflow/build/gen/tflite_pip/python3/cmake_build/FP16-source/include -march=armv6 -mfpu=vfp -mfloat-abi=hard -funsafe-math-optimizations -I/usr/include/python3.9 -I/usr/lib/python3/dist-packages/pybind11/include -I/usr/lib/python3/dist-packages/numpy/core/include -O3 -DNDEBUG -fPIC -Wno-psabi -O2 -pthread -std=c99  -marm  -march=armv8.2-a+dotprod -mfpu=neon-fp-armv8  -o CMakeFiles/microkernels-prod.dir/src/amalgam/neondot.c.o -c /home/samveen/tensorflow/build/gen/tflite_pip/python3/cmake_build/xnnpack/src/amalgam/neondot.c
/tmp/ccotLSur.s: Assembler messages:
/tmp/ccotLSur.s:63: Error: selected processor does not support `vsdot.s8 q8,q12,d7[0]' in ARM mode
/tmp/ccotLSur.s:65: Error: selected processor does not support `vsdot.s8 q9,q10,d7[0]' in ARM mode
/tmp/ccotLSur.s:68: Error: selected processor does not support `vsdot.s8 q11,q10,d7[0]' in ARM mode
/tmp/ccotLSur.s:71: Error: selected processor does not support `vsdot.s8 q14,q10,d7[0]' in ARM mode
/tmp/ccotLSur.s:74: Error: selected processor does not support `vsdot.s8 q8,q10,d7[1]' in ARM mode
/tmp/ccotLSur.s:77: Error: selected processor does not support `vsdot.s8 q9,q10,d7[1]' in ARM mode
/tmp/ccotLSur.s:80: Error: selected processor does not support `vsdot.s8 q11,q10,d7[1]' in ARM mode
...
```
</details>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Enabling XNNPACK with Raspberry Pi Zero/W #60282

Issue Type

Have you reproduced the bug with TF nightly?

Source

Tensorflow Version

Custom Code

OS Platform and Distribution

Mobile device

Python version

Bazel version

GCC/Compiler version

CUDA/cuDNN version

GPU model and memory

Current Behaviour?

Standalone code to reproduce the issue

Relevant log output

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Enabling XNNPACK with Raspberry Pi Zero/W #60282

Description

Issue Type

Have you reproduced the bug with TF nightly?

Source

Tensorflow Version

Custom Code

OS Platform and Distribution

Mobile device

Python version

Bazel version

GCC/Compiler version

CUDA/cuDNN version

GPU model and memory

Current Behaviour?

Standalone code to reproduce the issue

Relevant log output

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions