Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

failing build of recent TensorFlow easyconfigs on AWS Graviton3 (aarch64/neoverse_v1) #18899

Closed
boegel opened this issue Oct 1, 2023 · 8 comments · Fixed by easybuilders/easybuild-easyblocks#3011
Milestone

Comments

@boegel
Copy link
Member

boegel commented Oct 1, 2023

TensorFlow-2.13.0-foss-2022b.eb fails with:

  /mnt/shared/home/boegel/easybuild/graviton2/software/GCCcore/12.2.0/bin/gcc -U_FORTIFY_SOURCE -fstack-protector -Wall -Wunused-but-set-parameter -Wno-free-nonheap-object -fno-omit-frame-pointer -g0 -O2 '-D_FORTIFY_SOURCE=1' -DNDEBUG -ffunction-sections -fdata-sections -MD -MF bazel-out/aarch64-opt/bin/external/XNNPACK/_objs/asm_microkernels/qc8-gemm-1x16c4-minmax-fp32-asm-aarch64-neondot-ld64.pic.d -fPIC '-DBAZEL_CURRENT_REPOSITORY="XNNPACK"' -iquote external/XNNPACK -iquote bazel-out/aarch64-opt/bin/external/XNNPACK -isystem external/XNNPACK/include -isystem bazel-out/aarch64-opt/bin/external/XNNPACK/include -isystem external/XNNPACK/src -isystem bazel-out/aarch64-opt/bin/external/XNNPACK/src -Wno-all -Wno-extra -Wno-deprecated -Wno-deprecated-declarations -Wno-ignored-attributes -Wno-array-bounds -Wunused-result '-Werror=unused-result' -Wswitch '-Werror=switch' '-Wno-error=unused-but-set-variable' -DAUTOLOAD_DYNAMIC_KERNELS -O2 -ftree-vectorize '-mcpu=native' -fno-math-errno -fPIC -fPIC -Iinclude -Isrc '-march=armv8.2-a+fp16+dotprod' -O2 -fno-canonical-system-headers -Wno-builtin-macro-redefined '-D__DATE__="redacted"' '-D__TIMESTAMP__="redacted"' '-D__TIME__="redacted"' -c external/XNNPACK/src/qc8-gemm/gen/qc8-gemm-1x16c4-minmax-fp32-asm-aarch64-neondot-ld64.S -o bazel-out/aarch64-opt/bin/external/XNNPACK/_objs/asm_microkernels/qc8-gemm-1x16c4-minmax-fp32-asm-aarch64-neondot-ld64.pic.o)
# Configuration: 0d4560b813c94b91f71e59ae9e182e4612b959f02afc2e8b9422054dd2bd321f
# Execution platform: @local_execution_config_platform//:platform
cc1: warning: switch '-mcpu=zeus+crypto+sha3+sm4+nodotprod+noprofile+nopauth' conflicts with '-march=armv8.2-a+fp16+dotprod' switch
external/XNNPACK/src/qc8-gemm/gen/qc8-gemm-1x16c4-minmax-fp32-asm-aarch64-neondot-ld64.S: Assembler messages:
external/XNNPACK/src/qc8-gemm/gen/qc8-gemm-1x16c4-minmax-fp32-asm-aarch64-neondot-ld64.S:54: Error: selected processor does not support `sdot v28.4s,v16.16b,v0.4b[0]'
external/XNNPACK/src/qc8-gemm/gen/qc8-gemm-1x16c4-minmax-fp32-asm-aarch64-neondot-ld64.S:56: Error: selected processor does not support `sdot v29.4s,v17.16b,v0.4b[0]'
external/XNNPACK/src/qc8-gemm/gen/qc8-gemm-1x16c4-minmax-fp32-asm-aarch64-neondot-ld64.S:58: Error: selected processor does not support `sdot v30.4s,v18.16b,v0.4b[0]'
external/XNNPACK/src/qc8-gemm/gen/qc8-gemm-1x16c4-minmax-fp32-asm-aarch64-neondot-ld64.S:60: Error: selected processor does not support `sdot v31.4s,v19.16b,v0.4b[0]'
external/XNNPACK/src/qc8-gemm/gen/qc8-gemm-1x16c4-minmax-fp32-asm-aarch64-neondot-ld64.S:62: Error: selected processor does not support `sdot v28.4s,v4.16b,v0.4b[1]'
external/XNNPACK/src/qc8-gemm/gen/qc8-gemm-1x16c4-minmax-fp32-asm-aarch64-neondot-ld64.S:64: Error: selected processor does not support `sdot v29.4s,v5.16b,v0.4b[1]'
external/XNNPACK/src/qc8-gemm/gen/qc8-gemm-1x16c4-minmax-fp32-asm-aarch64-neondot-ld64.S:66: Error: selected processor does not support `sdot v30.4s,v6.16b,v0.4b[1]'
external/XNNPACK/src/qc8-gemm/gen/qc8-gemm-1x16c4-minmax-fp32-asm-aarch64-neondot-ld64.S:68: Error: selected processor does not support `sdot v31.4s,v7.16b,v0.4b[1]'
external/XNNPACK/src/qc8-gemm/gen/qc8-gemm-1x16c4-minmax-fp32-asm-aarch64-neondot-ld64.S:125: Error: selected processor does not support `sdot v28.4s,v16.16b,v0.4b[0]'
external/XNNPACK/src/qc8-gemm/gen/qc8-gemm-1x16c4-minmax-fp32-asm-aarch64-neondot-ld64.S:127: Error: selected processor does not support `sdot v29.4s,v17.16b,v0.4b[0]'
external/XNNPACK/src/qc8-gemm/gen/qc8-gemm-1x16c4-minmax-fp32-asm-aarch64-neondot-ld64.S:129: Error: selected processor does not support `sdot v30.4s,v18.16b,v0.4b[0]'
external/XNNPACK/src/qc8-gemm/gen/qc8-gemm-1x16c4-minmax-fp32-asm-aarch64-neondot-ld64.S:131: Error: selected processor does not support `sdot v31.4s,v19.16b,v0.4b[0]'
SUBCOMMAND: # @XNNPACK//:asm_microkernels [action 'Compiling src/qc8-gemm/gen/qc8-gemm-4x16-minmax-fp32-asm-aarch64-neon-mlal-lane-cortex-a53.S', configuration: 0d4560b813c94b91f71e59ae9e182e4612b959f02afc2e8b9422054dd2bd321f, execution platform: @local_execution_config_platform//:platform]

Similar problems were observed in EESSI build environment for:

@boegel boegel added this to the 4.x milestone Oct 1, 2023
@boegel
Copy link
Member Author

boegel commented Oct 1, 2023

This warning:

cc1: warning: switch '-mcpu=zeus+crypto+sha3+sm4+nodotprod+noprofile+nopauth' conflicts with '-march=armv8.2-a+fp16+dotprod' switch

suggests that the -mcpu=native that is used by EasyBuild conflicts with the -march=armv8.2-a+fp16+dotprod that the TensorFlow build systems somehow injects, in particular the nodotprod vs dotprod part...

@boegel
Copy link
Member Author

boegel commented Oct 1, 2023

Seems like tensorflow/tensorflow#53449 fixes this, which is included in TensorFlow v2.14.0, I'll give that a try with TensorFlow 2.13.0 and older...

@boegel
Copy link
Member Author

boegel commented Oct 1, 2023

It looks like tensorflow/tensorflow#53449 is specific to building TensorFlow Lite with CMake, so it doesn't help at all when building TensorFlow with Bazel like we're doing.

@Flamefire Any suggestions here? How can I avoid that the -mcpu=native that EasyBuild includes in $CXXFLAGS is passed down to the XNNPACK component of TensorFlow (while keeping it for everything else)?

@boegel
Copy link
Member Author

boegel commented Oct 1, 2023

Hmm, I was going to try and add mcpu=native to nocopts in build_defs.bzl in XNNPACK, but it seems like the support for nocopts was removed a long time ago in Bazel even though the docs still mention it 🤦, see bazelbuild/bazel#8706

@boegel
Copy link
Member Author

boegel commented Oct 2, 2023

Also asked about this upstream at XNNPACK: google/XNNPACK#5566

@boegel
Copy link
Member Author

boegel commented Oct 2, 2023

A workaround here could be to replace the use of -mcpu=native with -march=native -mtune=native by configuring EasyBuild with --optarch="march=native -mtune=native" when installing TensorFlow on aarch64/* (except aarch64/generic), but I'm not sure that's wise, see also https://community.arm.com/arm-community-blogs/b/tools-software-ides-blog/posts/compiler-flags-across-architectures-march-mtune-and-mcpu

@boegel
Copy link
Member Author

boegel commented Oct 4, 2023

The XNNPACK issue is fixed with the changes in easybuilders/easybuild-easyblocks#3011, but there's more trouble, in particular this:

  /mnt/shared/home/boegel/easybuild/neoverse_v1/software/GCCcore/11.3.0/bin/gcc -U_FORTIFY_SOURCE -fstack-protector -Wall -Wunused-but-set-parameter -Wno-free-nonheap-object -fno-omit-frame-pointer -g0 -O2 '-D_FORTIFY_SOURCE=1' -DNDEBUG -ffunction-sections -fdata-sections '-std=c++0x' -MD -MF bazel-out/aarch64-opt/bin/tensorflow/compiler/xla/service/cpu/_objs/runtime_single_threaded_conv2d/runtime_single_threaded_conv2d.pic.d '-frandom-seed=bazel-out/aarch64-opt/bin/tensorflow/compiler/xla/service/cpu/_objs/runtime_single_threaded_conv2d/runtime_single_threaded_conv2d.pic.o' -fPIC -DEIGEN_MPL2_ONLY '-DEIGEN_MAX_ALIGN_BYTES=64' -DGEMM_KERNEL_H '-DEIGEN_ALTIVEC_USE_CUSTOM_PACK=0' '-DEIGEN_NEON_GEBP_NR=4' -iquote . -iquote bazel-out/aarch64-opt/bin -iquote external/com_google_absl -iquote bazel-out/aarch64-opt/bin/external/com_google_absl -iquote external/eigen_archive -iquote bazel-out/aarch64-opt/bin/external/eigen_archive -isystem external/eigen_archive -isystem bazel-out/aarch64-opt/bin/external/eigen_archive -Wno-all -Wno-extra -Wno-deprecated -Wno-deprecated-declarations -Wno-ignored-attributes -Wno-unknown-warning -Wno-array-parameter -Wno-stringop-overflow -Wno-array-bounds -Wunused-result '-Werror=unused-result' -DAUTOLOAD_DYNAMIC_KERNELS -O2 -ftree-vectorize -fno-math-errno -fPIC -fPIC '-std=c++17' -DEIGEN_AVOID_STL_ARRAY '-mcpu=native' -fno-canonical-system-headers -Wno-builtin-macro-redefined '-D__DATE__="redacted"' '-D__TIMESTAMP__="redacted"' '-D__TIME__="redacted"' -c tensorflow/compiler/xla/service/cpu/runtime_single_threaded_conv2d.cc -o bazel-out/aarch64-opt/bin/tensorflow/compiler/xla/service/cpu/_objs/runtime_single_threaded_conv2d/runtime_single_threaded_conv2d.pic.o)
# Configuration: 86547f283a1d059053877fee5e6069b279735e8af740087f6f2b9dde52ab09ac
# Execution platform: @local_execution_config_platform//:platform
/tmp/eb-z0316qi3/ccd9Mksr.s: Assembler messages:
/tmp/eb-z0316qi3/ccd9Mksr.s:254: Error: register number out of range 0 to 15 at operand 3 -- `fmla v20.8h,v23.8h,v22.h[0]'
/tmp/eb-z0316qi3/ccd9Mksr.s:258: Error: register number out of range 0 to 15 at operand 3 -- `fmla v17.8h,v23.8h,v22.h[1]'

It seems like this requires an update of Eigen, or a backport of https://gitlab.com/libeigen/eigen/-/merge_requests/1104

@boegel boegel changed the title failing installation of recent TensorFlow easyconfigs on AWS Graviton3 (aarch64/neoverse_v1) failing build of recent TensorFlow easyconfigs on AWS Graviton3 (aarch64/neoverse_v1) Oct 11, 2023
@Flamefire
Copy link
Contributor

Yes it seems a patch to (the downloaded) Eigen will solve this. See e.g. TensorFlow-2.11.0_fix-eigen-gemm-on-PPC.patch how to do that. Basically:

  • find the Eigen commit used by TF by looking at the tf_http_archive( name = "eigen_archive", in TF (might even be in a variable EIGEN_COMMIT)
  • checkout that eigen version
  • apply/backport the fix
  • save the diff as a patch to tensorflow/third_party/eigen3
  • add the patch to the patch_file list argument of tf_gttp_archive

I just checked TF 2.13.0 and it uses Eigen b0f877f8e01e90a5b0f3a79d46ea234899f8b499 where that code fixed by your linked MR is not present anymore, i.e. it already contains that MR.

So sorry, I have no idea where/why that is an issue with TF 2.13.0

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
2 participants