failing build of recent TensorFlow easyconfigs on AWS Graviton3 (`aarch64/neoverse_v1`) #18899

boegel · 2023-10-01T15:28:10Z

TensorFlow-2.13.0-foss-2022b.eb fails with:

  /mnt/shared/home/boegel/easybuild/graviton2/software/GCCcore/12.2.0/bin/gcc -U_FORTIFY_SOURCE -fstack-protector -Wall -Wunused-but-set-parameter -Wno-free-nonheap-object -fno-omit-frame-pointer -g0 -O2 '-D_FORTIFY_SOURCE=1' -DNDEBUG -ffunction-sections -fdata-sections -MD -MF bazel-out/aarch64-opt/bin/external/XNNPACK/_objs/asm_microkernels/qc8-gemm-1x16c4-minmax-fp32-asm-aarch64-neondot-ld64.pic.d -fPIC '-DBAZEL_CURRENT_REPOSITORY="XNNPACK"' -iquote external/XNNPACK -iquote bazel-out/aarch64-opt/bin/external/XNNPACK -isystem external/XNNPACK/include -isystem bazel-out/aarch64-opt/bin/external/XNNPACK/include -isystem external/XNNPACK/src -isystem bazel-out/aarch64-opt/bin/external/XNNPACK/src -Wno-all -Wno-extra -Wno-deprecated -Wno-deprecated-declarations -Wno-ignored-attributes -Wno-array-bounds -Wunused-result '-Werror=unused-result' -Wswitch '-Werror=switch' '-Wno-error=unused-but-set-variable' -DAUTOLOAD_DYNAMIC_KERNELS -O2 -ftree-vectorize '-mcpu=native' -fno-math-errno -fPIC -fPIC -Iinclude -Isrc '-march=armv8.2-a+fp16+dotprod' -O2 -fno-canonical-system-headers -Wno-builtin-macro-redefined '-D__DATE__="redacted"' '-D__TIMESTAMP__="redacted"' '-D__TIME__="redacted"' -c external/XNNPACK/src/qc8-gemm/gen/qc8-gemm-1x16c4-minmax-fp32-asm-aarch64-neondot-ld64.S -o bazel-out/aarch64-opt/bin/external/XNNPACK/_objs/asm_microkernels/qc8-gemm-1x16c4-minmax-fp32-asm-aarch64-neondot-ld64.pic.o)
# Configuration: 0d4560b813c94b91f71e59ae9e182e4612b959f02afc2e8b9422054dd2bd321f
# Execution platform: @local_execution_config_platform//:platform
cc1: warning: switch '-mcpu=zeus+crypto+sha3+sm4+nodotprod+noprofile+nopauth' conflicts with '-march=armv8.2-a+fp16+dotprod' switch
external/XNNPACK/src/qc8-gemm/gen/qc8-gemm-1x16c4-minmax-fp32-asm-aarch64-neondot-ld64.S: Assembler messages:
external/XNNPACK/src/qc8-gemm/gen/qc8-gemm-1x16c4-minmax-fp32-asm-aarch64-neondot-ld64.S:54: Error: selected processor does not support `sdot v28.4s,v16.16b,v0.4b[0]'
external/XNNPACK/src/qc8-gemm/gen/qc8-gemm-1x16c4-minmax-fp32-asm-aarch64-neondot-ld64.S:56: Error: selected processor does not support `sdot v29.4s,v17.16b,v0.4b[0]'
external/XNNPACK/src/qc8-gemm/gen/qc8-gemm-1x16c4-minmax-fp32-asm-aarch64-neondot-ld64.S:58: Error: selected processor does not support `sdot v30.4s,v18.16b,v0.4b[0]'
external/XNNPACK/src/qc8-gemm/gen/qc8-gemm-1x16c4-minmax-fp32-asm-aarch64-neondot-ld64.S:60: Error: selected processor does not support `sdot v31.4s,v19.16b,v0.4b[0]'
external/XNNPACK/src/qc8-gemm/gen/qc8-gemm-1x16c4-minmax-fp32-asm-aarch64-neondot-ld64.S:62: Error: selected processor does not support `sdot v28.4s,v4.16b,v0.4b[1]'
external/XNNPACK/src/qc8-gemm/gen/qc8-gemm-1x16c4-minmax-fp32-asm-aarch64-neondot-ld64.S:64: Error: selected processor does not support `sdot v29.4s,v5.16b,v0.4b[1]'
external/XNNPACK/src/qc8-gemm/gen/qc8-gemm-1x16c4-minmax-fp32-asm-aarch64-neondot-ld64.S:66: Error: selected processor does not support `sdot v30.4s,v6.16b,v0.4b[1]'
external/XNNPACK/src/qc8-gemm/gen/qc8-gemm-1x16c4-minmax-fp32-asm-aarch64-neondot-ld64.S:68: Error: selected processor does not support `sdot v31.4s,v7.16b,v0.4b[1]'
external/XNNPACK/src/qc8-gemm/gen/qc8-gemm-1x16c4-minmax-fp32-asm-aarch64-neondot-ld64.S:125: Error: selected processor does not support `sdot v28.4s,v16.16b,v0.4b[0]'
external/XNNPACK/src/qc8-gemm/gen/qc8-gemm-1x16c4-minmax-fp32-asm-aarch64-neondot-ld64.S:127: Error: selected processor does not support `sdot v29.4s,v17.16b,v0.4b[0]'
external/XNNPACK/src/qc8-gemm/gen/qc8-gemm-1x16c4-minmax-fp32-asm-aarch64-neondot-ld64.S:129: Error: selected processor does not support `sdot v30.4s,v18.16b,v0.4b[0]'
external/XNNPACK/src/qc8-gemm/gen/qc8-gemm-1x16c4-minmax-fp32-asm-aarch64-neondot-ld64.S:131: Error: selected processor does not support `sdot v31.4s,v19.16b,v0.4b[0]'
SUBCOMMAND: # @XNNPACK//:asm_microkernels [action 'Compiling src/qc8-gemm/gen/qc8-gemm-4x16-minmax-fp32-asm-aarch64-neon-mlal-lane-cortex-a53.S', configuration: 0d4560b813c94b91f71e59ae9e182e4612b959f02afc2e8b9422054dd2bd321f, execution platform: @local_execution_config_platform//:platform]

Similar problems were observed in EESSI build environment for:

TensorFlow-2.13.0-foss-2022b.eb (see {2023.06}[foss/2022b] TensorFlow 2.13.0 EESSI/software-layer#347)
TensorFlow-2.11.0-foss-2022a.eb (see {2023.06}[foss/2022a] TensorFlow 2.11.0 EESSI/software-layer#346)
TensorFlow-2.8.4-foss-2021b.eb (see {2023.06}[foss/2021b] TensorFlow 2.8.4 EESSI/software-layer#343)
TensorFlow-2.7.1-foss-2021b.eb (see {2023.06}[foss/2021b] TensorFlow v2.7.1 EESSI/software-layer#321)

The text was updated successfully, but these errors were encountered:

boegel · 2023-10-01T15:29:54Z

This warning:

cc1: warning: switch '-mcpu=zeus+crypto+sha3+sm4+nodotprod+noprofile+nopauth' conflicts with '-march=armv8.2-a+fp16+dotprod' switch

suggests that the -mcpu=native that is used by EasyBuild conflicts with the -march=armv8.2-a+fp16+dotprod that the TensorFlow build systems somehow injects, in particular the nodotprod vs dotprod part...

boegel · 2023-10-01T16:11:51Z

Seems like tensorflow/tensorflow#53449 fixes this, which is included in TensorFlow v2.14.0, I'll give that a try with TensorFlow 2.13.0 and older...

boegel · 2023-10-01T18:22:03Z

It looks like tensorflow/tensorflow#53449 is specific to building TensorFlow Lite with CMake, so it doesn't help at all when building TensorFlow with Bazel like we're doing.

@Flamefire Any suggestions here? How can I avoid that the -mcpu=native that EasyBuild includes in $CXXFLAGS is passed down to the XNNPACK component of TensorFlow (while keeping it for everything else)?

boegel · 2023-10-01T19:22:45Z

Hmm, I was going to try and add mcpu=native to nocopts in build_defs.bzl in XNNPACK, but it seems like the support for nocopts was removed a long time ago in Bazel even though the docs still mention it 🤦, see bazelbuild/bazel#8706

boegel · 2023-10-02T06:41:52Z

Also asked about this upstream at XNNPACK: google/XNNPACK#5566

boegel · 2023-10-02T14:58:13Z

A workaround here could be to replace the use of -mcpu=native with -march=native -mtune=native by configuring EasyBuild with --optarch="march=native -mtune=native" when installing TensorFlow on aarch64/* (except aarch64/generic), but I'm not sure that's wise, see also https://community.arm.com/arm-community-blogs/b/tools-software-ides-blog/posts/compiler-flags-across-architectures-march-mtune-and-mcpu

boegel · 2023-10-04T06:52:07Z

The XNNPACK issue is fixed with the changes in easybuilders/easybuild-easyblocks#3011, but there's more trouble, in particular this:

  /mnt/shared/home/boegel/easybuild/neoverse_v1/software/GCCcore/11.3.0/bin/gcc -U_FORTIFY_SOURCE -fstack-protector -Wall -Wunused-but-set-parameter -Wno-free-nonheap-object -fno-omit-frame-pointer -g0 -O2 '-D_FORTIFY_SOURCE=1' -DNDEBUG -ffunction-sections -fdata-sections '-std=c++0x' -MD -MF bazel-out/aarch64-opt/bin/tensorflow/compiler/xla/service/cpu/_objs/runtime_single_threaded_conv2d/runtime_single_threaded_conv2d.pic.d '-frandom-seed=bazel-out/aarch64-opt/bin/tensorflow/compiler/xla/service/cpu/_objs/runtime_single_threaded_conv2d/runtime_single_threaded_conv2d.pic.o' -fPIC -DEIGEN_MPL2_ONLY '-DEIGEN_MAX_ALIGN_BYTES=64' -DGEMM_KERNEL_H '-DEIGEN_ALTIVEC_USE_CUSTOM_PACK=0' '-DEIGEN_NEON_GEBP_NR=4' -iquote . -iquote bazel-out/aarch64-opt/bin -iquote external/com_google_absl -iquote bazel-out/aarch64-opt/bin/external/com_google_absl -iquote external/eigen_archive -iquote bazel-out/aarch64-opt/bin/external/eigen_archive -isystem external/eigen_archive -isystem bazel-out/aarch64-opt/bin/external/eigen_archive -Wno-all -Wno-extra -Wno-deprecated -Wno-deprecated-declarations -Wno-ignored-attributes -Wno-unknown-warning -Wno-array-parameter -Wno-stringop-overflow -Wno-array-bounds -Wunused-result '-Werror=unused-result' -DAUTOLOAD_DYNAMIC_KERNELS -O2 -ftree-vectorize -fno-math-errno -fPIC -fPIC '-std=c++17' -DEIGEN_AVOID_STL_ARRAY '-mcpu=native' -fno-canonical-system-headers -Wno-builtin-macro-redefined '-D__DATE__="redacted"' '-D__TIMESTAMP__="redacted"' '-D__TIME__="redacted"' -c tensorflow/compiler/xla/service/cpu/runtime_single_threaded_conv2d.cc -o bazel-out/aarch64-opt/bin/tensorflow/compiler/xla/service/cpu/_objs/runtime_single_threaded_conv2d/runtime_single_threaded_conv2d.pic.o)
# Configuration: 86547f283a1d059053877fee5e6069b279735e8af740087f6f2b9dde52ab09ac
# Execution platform: @local_execution_config_platform//:platform
/tmp/eb-z0316qi3/ccd9Mksr.s: Assembler messages:
/tmp/eb-z0316qi3/ccd9Mksr.s:254: Error: register number out of range 0 to 15 at operand 3 -- `fmla v20.8h,v23.8h,v22.h[0]'
/tmp/eb-z0316qi3/ccd9Mksr.s:258: Error: register number out of range 0 to 15 at operand 3 -- `fmla v17.8h,v23.8h,v22.h[1]'

It seems like this requires an update of Eigen, or a backport of https://gitlab.com/libeigen/eigen/-/merge_requests/1104

Flamefire · 2023-10-24T08:35:35Z

Yes it seems a patch to (the downloaded) Eigen will solve this. See e.g. TensorFlow-2.11.0_fix-eigen-gemm-on-PPC.patch how to do that. Basically:

find the Eigen commit used by TF by looking at the tf_http_archive( name = "eigen_archive", in TF (might even be in a variable EIGEN_COMMIT)
checkout that eigen version
apply/backport the fix
save the diff as a patch to tensorflow/third_party/eigen3
add the patch to the patch_file list argument of tf_gttp_archive

I just checked TF 2.13.0 and it uses Eigen b0f877f8e01e90a5b0f3a79d46ea234899f8b499 where that code fixed by your linked MR is not present anymore, i.e. it already contains that MR.

So sorry, I have no idea where/why that is an issue with TF 2.13.0

boegel added the problem report label Oct 1, 2023

boegel added this to the 4.x milestone Oct 1, 2023

boegel mentioned this issue Oct 1, 2023

Fix XNNPACK build failure with when -mcpu compiler switch is set tensorflow/tensorflow#53449

Merged

boegel mentioned this issue Oct 2, 2023

filtering out -mcpu=native when building with Bazel on Arm 64-bit (aarch64) google/XNNPACK#5566

Closed

This was referenced Oct 2, 2023

{2023.06}[foss/2021b] TensorFlow 2.8.4 EESSI/software-layer#343

Closed

{2023.06}[foss/2022a] TensorFlow 2.11.0 EESSI/software-layer#346

Closed

boegel mentioned this issue Oct 3, 2023

enhance TensorFlow easyblock to avoid use of -mcpu=native for XNNPACK component when building on aarch64 easybuilders/easybuild-easyblocks#3011

Merged

boegel changed the title ~~failing installation of recent TensorFlow easyconfigs on AWS Graviton3 (aarch64/neoverse_v1)~~ failing build of recent TensorFlow easyconfigs on AWS Graviton3 (aarch64/neoverse_v1) Oct 11, 2023

akesandgren closed this as completed in easybuilders/easybuild-easyblocks#3011 Oct 25, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

failing build of recent TensorFlow easyconfigs on AWS Graviton3 (`aarch64/neoverse_v1`) #18899

failing build of recent TensorFlow easyconfigs on AWS Graviton3 (`aarch64/neoverse_v1`) #18899

boegel commented Oct 1, 2023

boegel commented Oct 1, 2023

boegel commented Oct 1, 2023 •

edited

boegel commented Oct 1, 2023

boegel commented Oct 1, 2023

boegel commented Oct 2, 2023

boegel commented Oct 2, 2023

boegel commented Oct 4, 2023

Flamefire commented Oct 24, 2023

failing build of recent TensorFlow easyconfigs on AWS Graviton3 (aarch64/neoverse_v1) #18899

failing build of recent TensorFlow easyconfigs on AWS Graviton3 (aarch64/neoverse_v1) #18899

Comments

boegel commented Oct 1, 2023

boegel commented Oct 1, 2023

boegel commented Oct 1, 2023 • edited

boegel commented Oct 1, 2023

boegel commented Oct 1, 2023

boegel commented Oct 2, 2023

boegel commented Oct 2, 2023

boegel commented Oct 4, 2023

Flamefire commented Oct 24, 2023

failing build of recent TensorFlow easyconfigs on AWS Graviton3 (`aarch64/neoverse_v1`) #18899

failing build of recent TensorFlow easyconfigs on AWS Graviton3 (`aarch64/neoverse_v1`) #18899

boegel commented Oct 1, 2023 •

edited