Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gfortran 11 ARM-darwin (Apple M1) build failure #3222

Closed
fxcoudert opened this issue May 6, 2021 · 21 comments · Fixed by #3223
Closed

gfortran 11 ARM-darwin (Apple M1) build failure #3222

fxcoudert opened this issue May 6, 2021 · 21 comments · Fixed by #3223

Comments

@fxcoudert
Copy link

We are seeing an openblas build failure (in Homebrew: Homebrew/homebrew-core#74843 (comment)) with the latest gcc/gfortran 11.1 port to Apple M1 (ARM-darwin). The failure is:

2021-05-04T22:06:45.3605470Z ranlib ../../../libopenblasp-r0.3.15.a
2021-05-04T22:06:45.3606170Z make[1]: warning: -jN forced in submake: disabling jobserver mode.
2021-05-04T22:06:45.3606690Z perl ./gensymbol osx arm64 _ 0 0  0 0 0 0 "" "" 1 0 1 1 1 1 > osx.def
2021-05-04T22:06:45.3611100Z gfortran -O2 -Wall -frecursive -fno-optimize-sibling-calls -fopenmp -fPIC -march=armv8.3-a  -all_load -headerpad_max_install_names -install_name "/private/tmp/openblas-20210504-81004-e68xc6/OpenBLAS-0.3.15/exports/../libopenblas.0.dylib" -dynamiclib -o ../libopenblasp-r0.3.15.dylib ../libopenblasp-r0.3.15.a -Wl,-exported_symbols_list,osx.def  -L/opt/homebrew/Cellar/gcc/11.1.0/lib/gcc/11/gcc/aarch64-apple-darwin20/11.1.0 -L/opt/homebrew/Cellar/gcc/11.1.0/lib/gcc/11/gcc/aarch64-apple-darwin20/11.1.0/../../.. -Wl,-rpath,,loader_path -Wl,-rpath,/opt/homebrew/Cellar/gcc/11.1.0/lib/gcc/11/gcc/aarch64-apple-darwin20/11.1.0 -Wl,-rpath,/opt/homebrew/Cellar/gcc/11.1.0/lib/gcc/11/gcc/aarch64-apple-darwin20/11.1.0/../../.. -L/opt/homebrew/Cellar/gcc/11.1.0/lib/gcc/11/gcc/aarch64-apple-darwin20/11.1.0 -L/opt/homebrew/Cellar/gcc/11.1.0/lib/gcc/11/gcc/aarch64-apple-darwin20/11.1.0/../../.. -Wl,-rpath,,loader_path -Wl,-rpath,/opt/homebrew/Cellar/gcc/11.1.0/lib/gcc/11/gcc/aarch64-apple-darwin20/11.1.0 -Wl,-rpath,/opt/homebrew/Cellar/gcc/11.1.0/lib/gcc/11/gcc/aarch64-apple-darwin20/11.1.0/../../..  -lgfortran -lgomp -lm -lSystem -lgfortran -lgomp -lm -lSystem  
2021-05-04T22:06:45.3615250Z ld: file not found: loader_path
2021-05-04T22:06:45.3615640Z collect2: error: ld returned 1 exit status
2021-05-04T22:06:45.3616290Z make[1]: *** [libopenblasp-r0.3.15.dylib] Error 1
2021-05-04T22:06:45.3616750Z make: *** [shared] Error 2

The compiler appears otherwise functional, so I think it's specific to OpenBLAS. The problem is the -Wl,-rpath,,loader_path argument that is passed (twice), and I do not understand where it comes from.

The compiler driver itself can, in some circumstances, pass -Wl,-rpath,@loader_path which is a valid option. But somehow it seems that this @ gets mangled into a , making the whole thing invalid.

I haven't managed, however, to find where in OpenBLAS build machinery, this could be happening. Can someone point me in the right direction.

@fxcoudert
Copy link
Author

The only place I can see is this:
https://github.com/xianyi/OpenBLAS/blob/5f998efd7b4edbeef7d68f282e5581145640eff4/f_check#L273

which is using @ as a special character for its weird parsing of arguments. I'm not sure how or why it gets triggered (are we passing a -rpath with no -Wl in the first place?) but in our case it will definitely screw up the existing @

@fxcoudert
Copy link
Author

If that's the case, a possible patch is:

diff --git a/f_check b/f_check
index 2c0d7fcb..20990e05 100644
--- a/f_check
+++ b/f_check
@@ -314,11 +314,11 @@ if ($link ne "") {
 
     $link =~ s/\-Y\sP\,/\-Y/g;
     
-    $link =~ s/\-R\s*/\-rpath\@/g;
+    $link =~ s/\-R\s*/\-rpath#/g;
 
-    $link =~ s/\-rpath\s+/\-rpath\@/g;
+    $link =~ s/\-rpath\s+/\-rpath#/g;
 
-    $link =~ s/\-rpath-link\s+/\-rpath-link\@/g;
+    $link =~ s/\-rpath-link\s+/\-rpath-link#/g;
 
     @flags = split(/[\s\,\n]/, $link);
     # remove leading and trailing quotes from each flag.
@@ -344,13 +344,13 @@ if ($link ne "") {
        }
 
 
-       if ($flags =~ /^\-rpath\@/) {
-           $flags =~ s/\@/\,/g;
+       if ($flags =~ /^\-rpath#/) {
+           $flags =~ s/#/\,/g;
            $linker_L .= "-Wl,". $flags . " " ;
        }
 
-       if ($flags =~ /^\-rpath-link\@/) {
-           $flags =~ s/\@/\,/g;
+       if ($flags =~ /^\-rpath-link#/) {
+           $flags =~ s/#/\,/g;
            $linker_L .= "-Wl,". $flags . " " ;
        }
        if ($flags =~ /-lgomp/ && $ENV{"CC"} =~ /clang/) {

unless # needs escaping into \# in Perl regexps?

@martin-frbg
Copy link
Collaborator

martin-frbg commented May 6, 2021

Probably sufficient to remove (or enclose in an if ($vendor == OPEN64)) the https://github.com/xianyi/OpenBLAS/blob/f497bb949bf262d3a97d33958f43945384428043/f_check#L347-L355
(which git blame tells me were put in 8 years ago to fix some problem with Open64).
I do not recall encountering this with the M1 in the gcc compile farm - not sure right now which compiler version that has/had installed (and it appears to be down today)

@carlocab
Copy link

carlocab commented May 6, 2021

Thanks, @martin-frbg. I've tried applying the patch you suggest (i.e. just removing those two if blocks). Do you know why we're hitting this now but not previously?

@martin-frbg
Copy link
Collaborator

No idea actually - maybe something changed in the default arguments used by gfortran in 11.1 ? What f_check does here is parse verbose output from building a small test program, so something in the behaviour of the compiler seems to have changed.

@fxcoudert
Copy link
Author

fxcoudert commented May 6, 2021

@martin-frbg Yes, in the latest version of GCC M1 port, the driver will pass -Wl,-rpath,@loader in some cases (I don't think it did that previously).

If you remove the two if blocks, I think you probably want to rework the code above (https://github.com/xianyi/OpenBLAS/blob/f497bb949bf262d3a97d33958f43945384428043/f_check#L317 and later lines) not to introduce @ in the first place

@martin-frbg
Copy link
Collaborator

I was not sure if the @ was used as a placeholder, or if it had a specific meaning in perl regex... something else will probably be needed as a substitute (unless enclosing all the naughty code in "if not open64" is sufficient as a workaround)

@martin-frbg
Copy link
Collaborator

The gccfarm machine (still) has gcc-10 from homebrew (10.2.1 alias Homebrew GCC 10.2.0_4)

@fxcoudert
Copy link
Author

@martin-frbg yes, because we can't ship GCC 11.1.0 in Homebrew until we fix this openblas failure :)

@martin-frbg
Copy link
Collaborator

martin-frbg commented May 7, 2021

Heh. So now I have the added superpower of halting compiler deployment ? Hopefully #3223 fixes it without new side effects. ( I realize now that it is basically the same as your proposal, just using a different character for the replacement - too busy with completely unrelated work yesterday)

@sidgupta234
Copy link

Has this been resolved? Seems to be getting similar error.

perl ./gensymbol osx arm64 _ 0 0  0 0 0 0 "" "" 1 0 1 1 1 1 > osx.def
gfortran -O2 -Wall -frecursive -fno-optimize-sibling-calls -fPIC -march=armv8-a  -all_load -headerpad_max_install_names -install_name "/Users/siddharthg/Github-dev/kaldi/tools/OpenBLAS/exports/../libopenblas.0.dylib" -dynamiclib -o ../libopenblas_armv8-r0.3.13.dylib ../libopenblas_armv8-r0.3.13.a -Wl,-exported_symbols_list,osx.def  -L/opt/homebrew/Cellar/gcc/11.3.0_2/bin/../lib/gcc/11/gcc/aarch64-apple-darwin21/11 -L/opt/homebrew/Cellar/gcc/11.3.0_2/bin/../lib/gcc/11/gcc -L/opt/homebrew/Cellar/gcc/11.3.0_2/bin/../lib/gcc/11/gcc/aarch64-apple-darwin21/11/../../.. -Wl,-rpath,,loader_path -Wl,-rpath,/opt/homebrew/Cellar/gcc/11.3.0_2/lib/gcc/11/gcc/aarch64-apple-darwin21/11 -Wl,-rpath,/opt/homebrew/Cellar/gcc/11.3.0_2/lib/gcc/11/gcc -Wl,-rpath,/opt/homebrew/Cellar/gcc/11.3.0_2/lib/gcc/11  -lgfortran -lemutls_w -lquadmath -lemutls_w -lSystem
ld: file not found: loader_path
collect2: error: ld returned 1 exit status
make[1]: *** [libopenblas_armv8-r0.3.13.dylib] Error 1
make: *** [shared] Error 2

@martin-frbg
Copy link
Collaborator

Has this been resolved? Seems to be getting similar error.
make[1]: *** [libopenblas_armv8- r0.3.13 .dylib] Error 1

Looks like you are trying to build an outdated version that is even older than the one the problem was originally seen with. Fix went into 0.3.16, current version is 0.3.20

@sidgupta234
Copy link

sidgupta234 commented Jun 28, 2022

Fix went into 0.3.16, current version is 0.3.20

Made changes to extras/install_openblas.sh, changed OPENBLAS_VERSION=0.3.13 to OPENBLAS_VERSION=0.3.20. Getting the following error,

Undefined symbols for architecture arm64:
  "___chkstk_darwin", referenced from:
      _sgemv_ in libopenblas_neoversen2-r0.3.20.a(sgemv.o)
      _sger_ in libopenblas_neoversen2-r0.3.20.a(sger.o)
      _cblas_sgemv in libopenblas_neoversen2-r0.3.20.a(cblas_sgemv.o)
      _cblas_sger in libopenblas_neoversen2-r0.3.20.a(cblas_sger.o)
      _dgemv_ in libopenblas_neoversen2-r0.3.20.a(dgemv.o)
      _dger_ in libopenblas_neoversen2-r0.3.20.a(dger.o)
      _cblas_dgemv in libopenblas_neoversen2-r0.3.20.a(cblas_dgemv.o)
      ...
ld: symbol(s) not found for architecture arm64
collect2: error: ld returned 1 exit status
make[1]: *** [libopenblas_neoversen2-r0.3.20.dylib] Error 1
make: *** [shared] Error 2

@martin-frbg
Copy link
Collaborator

Try setting MACOSX_DEPLOYMENT_TARGET=11.0 this is probably a bug in xcode

@sidgupta234
Copy link

Still getting the same error, the script to install looks like the following now:

#!/usr/bin/env bash

OPENBLAS_VERSION=0.3.20
MACOSX_DEPLOYMENT_TARGET=11.0
WGET=${WGET:-wget}

set -e

if ! command -v gfortran 2>/dev/null; then
  echo "$0: gfortran is not installed.  Please install it, e.g. by:"
  echo " apt-get install gfortran"
  echo "(if on Debian or Ubuntu), or:"
  echo " yum install gcc-gfortran"
  echo "(if on RedHat/CentOS).  On a Mac, if brew is installed, it's:"
  echo " brew install gfortran"
  exit 1
fi


tarball=OpenBLAS-$OPENBLAS_VERSION.tar.gz

rm -rf xianyi-OpenBLAS-* OpenBLAS OpenBLAS-*.tar.gz

if [ -d "$DOWNLOAD_DIR" ]; then
  cp -p "$DOWNLOAD_DIR/$tarball" .
else
  url=$($WGET -qO- "https://api.github.com/repos/xianyi/OpenBLAS/releases/tags/v${OPENBLAS_VERSION}" | python -c 'import sys,json;print(json.load(sys.stdin)["tarball_url"])')
  test -n "$url"
  $WGET -t3 -nv -O $tarball "$url"
fi

tar xzf $tarball
mv xianyi-OpenBLAS-* OpenBLAS

make PREFIX=$(pwd)/OpenBLAS/install USE_LOCKING=1 USE_THREAD=0 -C OpenBLAS all install
if [ $? -eq 0 ]; then
   echo "OpenBLAS is installed successfully."
   rm $tarball
fi

@martin-frbg
Copy link
Collaborator

strange. maybe export the MACOSX_DEPLOYMENT_TARGET, or try putting it on the command line before the make (as MACOSX_DEPLOYMENT_TARGET=11.0 make PREFIX=... (this problem has come up before, and so far the solution worked. less sure now if it is a bug or a feature, seeing that we have a default deployment target in Makefile.system that has been updated a few times in the past already)

@sidgupta234
Copy link

Thanks a lot! I am on version 12.4 so setting MACOSX_DEPLOYMENT_TARGET=12.4 worked!

@martin-frbg
Copy link
Collaborator

Confusing that 11.0 was not sufficient. I was just about to make that the new default, but if it needs to match whatever latest version one has installed now there seems to be little point.

@sidgupta234
Copy link

In the installation script, cant we take the value of the version from the client system using sw_vers -productVersion and then pass it to MACOSX_DEPLOYMENT_TARGET?

@martin-frbg
Copy link
Collaborator

You're telling me - I'm not a Mac user so no idea how the version mismatch comes about (or whether sw_vers and deployment target always have to be the same)

@sidgupta234
Copy link

whether sw_vers and deployment target always have to be the same

Ah. yeah, that's the confusion! I have been Linux throughout, only now have started with Mac. Hope someone from MacOS community may shed a concrete light.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants