Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

toolchain: Error linking ELPA with MKL #376

Closed
roastduck opened this issue May 22, 2019 · 7 comments
Closed

toolchain: Error linking ELPA with MKL #376

roastduck opened this issue May 22, 2019 · 7 comments
Milestone

Comments

@roastduck
Copy link

I'm using CP2K 6.1. When building ELPA using the toolchain script, it fails to link with MKL, reporting as follows:

  FCLD     test_real_double_hermitian_multiply_1stage_all_layouts
/opt/spack/spack-avx512/opt/spack/linux-debian9-x86_64/gcc-8.2.0/intel-mkl-2019.1.144-a4wlgyiw6bj7f3too3q6wf4ly3n5mjc4/                compilers_and_libraries_2019.1.144/linux/mkl/lib/intel64/libmkl_sequential.a(dlaed3_seq.o): In function `mkl_lapack_dlaed3':
dlaed3_omp_gen.f:(.text+0x3): undefined reference to `mkl_lapack_xdlaed3'
/opt/spack/spack-avx512/opt/spack/linux-debian9-x86_64/gcc-8.2.0/intel-mkl-2019.1.144-a4wlgyiw6bj7f3too3q6wf4ly3n5mjc4/                compilers_and_libraries_2019.1.144/linux/mkl/lib/intel64/libmkl_sequential.a(dlaed2_seq.o): In function `mkl_lapack_dlaed2':
dlaed2_omp_gen.f:(.text+0x3): undefined reference to `mkl_lapack_xdlaed2'
/opt/spack/spack-avx512/opt/spack/linux-debian9-x86_64/gcc-8.2.0/intel-mkl-2019.1.144-a4wlgyiw6bj7f3too3q6wf4ly3n5mjc4/                compilers_and_libraries_2019.1.144/linux/mkl/lib/intel64/libmkl_sequential.a(dgeqrf_pf_seq.o): In function `mkl_lapack_dgeqrf_pf':
dgeqrf_pf_omp.c:(.text+0xa3): undefined reference to `mkl_lapack_xdgeqrf_pf'
dgeqrf_pf_omp.c:(.text+0xd9): undefined reference to `mkl_lapack_xdgeqrf_pf'

The error message is long, so I only pasted the first few errors.

I dag into this error, and found it caused by the lost of -Wl,--start-group and -Wl,--end-group pair. install_mkl.sh prepares the library string with -Wl,--start-group and -Wl,--end-group, so that MKL libraries can be linked in any order. But when configuring ELPA, it lost this pair somewhere. According to the config.log generated by ELPA, it kept -Wl,--start-group and -Wl,--end-group for the first few command, until configuring ac_cv_fc_libs.

I searched around the master branch and found this issues seems not be fixed yet.

@OndrejMarsalek
Copy link
Contributor

I came across the same issue in the current development version. The group delimiters get somehow misplaced along the way. Looking at it closer, I wonder if it is a deeper issue. In my case, the configure call reported in config.log is:

$ ../configure --prefix=/home/software/cp2k/cp2k-dev-217922840/tools/toolchain/install/elpa-2017.05.003 --libdir=/home/software/cp2k/cp2k-dev-217922840/tools/toolchain/install/elpa-2017.05.003/lib --enable-openmp=no --enable-shared=no --enable-static=yes --disable-option-checking --enable-avx=yes --enable-avx2=yes --enable-avx512=no FC=mpif90 CC=mpicc CXX=mpic++ FCFLAGS=-O2 -ftree-vectorize -g -fno-omit-frame-pointer -march=native  -m64 -I/home/software/spack/opt/spack/linux-debian9-x86_64/gcc-8.2.0/intel-mkl-2019.3.199-juiukqyaqu2fmzenwuxv6uyvelyrdxwm/compilers_and_libraries_2019.3.199/linux/mkl/include -I/home/software/spack/opt/spack/linux-debian9-x86_64/gcc-8.2.0/intel-mkl-2019.3.199-juiukqyaqu2fmzenwuxv6uyvelyrdxwm/compilers_and_libraries_2019.3.199/linux/mkl/include/fftw  -ffree-line-length-none -mavx2 -mfma -msse4 CFLAGS=-O2 -ftree-vectorize -g -fno-omit-frame-pointer -march=native  -m64 -I/home/software/spack/opt/spack/linux-debian9-x86_64/gcc-8.2.0/intel-mkl-2019.3.199-juiukqyaqu2fmzenwuxv6uyvelyrdxwm/compilers_and_libraries_2019.3.199/linux/mkl/include -I/home/software/spack/opt/spack/linux-debian9-x86_64/gcc-8.2.0/intel-mkl-2019.3.199-juiukqyaqu2fmzenwuxv6uyvelyrdxwm/compilers_and_libraries_2019.3.199/linux/mkl/include/fftw  -mavx2 -mfma -msse4 CXXFLAGS=-O2 -ftree-vectorize -g -fno-omit-frame-pointer -march=native  -m64 -I/home/software/spack/opt/spack/linux-debian9-x86_64/gcc-8.2.0/intel-mkl-2019.3.199-juiukqyaqu2fmzenwuxv6uyvelyrdxwm/compilers_and_libraries_2019.3.199/linux/mkl/include -I/home/software/spack/opt/spack/linux-debian9-x86_64/gcc-8.2.0/intel-mkl-2019.3.199-juiukqyaqu2fmzenwuxv6uyvelyrdxwm/compilers_and_libraries_2019.3.199/linux/mkl/include/fftw  -mavx2 -mfma -msse4 LDFLAGS=-Wl,--enable-new-dtags  -L'/home/software/cp2k/cp2k-dev-217922840/tools/toolchain/install/scalapack-2.0.2/lib' -Wl,-rpath='/home/software/cp2k/cp2k-dev-217922840/tools/toolchain/install/scalapack-2.0.2/lib'  LIBS=-lscalapack -Wl,--start-group /home/software/spack/opt/spack/linux-debian9-x86_64/gcc-8.2.0/intel-mkl-2019.3.199-juiukqyaqu2fmzenwuxv6uyvelyrdxwm/compilers_and_libraries_2019.3.199/linux/mkl/lib/intel64/libmkl_gf_lp64.a /home/software/spack/opt/spack/linux-debian9-x86_64/gcc-8.2.0/intel-mkl-2019.3.199-juiukqyaqu2fmzenwuxv6uyvelyrdxwm/compilers_and_libraries_2019.3.199/linux/mkl/lib/intel64/libmkl_core.a /home/software/spack/opt/spack/linux-debian9-x86_64/gcc-8.2.0/intel-mkl-2019.3.199-juiukqyaqu2fmzenwuxv6uyvelyrdxwm/compilers_and_libraries_2019.3.199/linux/mkl/lib/intel64/libmkl_sequential.a -Wl,--end-group -lpthread -lm -ldl

which contains the following env var:

LIBS=-lscalapack -Wl,--start-group /home/software/spack/opt/spack/linux-debian9-x86_64/gcc-8.2.0/intel-mkl-2019.3.199-juiukqyaqu2fmzenwuxv6uyvelyrdxwm/compilers_and_libraries_2019.3.199/linux/mkl/lib/intel64/libmkl_gf_lp64.a /home/software/spack/opt/spack/linux-debian9-x86_64/gcc-8.2.0/intel-mkl-2019.3.199-juiukqyaqu2fmzenwuxv6uyvelyrdxwm/compilers_and_libraries_2019.3.199/linux/mkl/lib/intel64/libmkl_core.a /home/software/spack/opt/spack/linux-debian9-x86_64/gcc-8.2.0/intel-mkl-2019.3.199-juiukqyaqu2fmzenwuxv6uyvelyrdxwm/compilers_and_libraries_2019.3.199/linux/mkl/lib/intel64/libmkl_sequential.a -Wl,--end-group -lpthread -lm -ldl

That is unexpected to me, as it links to vanilla ScaLAPACK, as opposed to the one from MKL (-lmkl_scalapack_lp64) as seen in multiple arch files. Should I be calling the toolchain installer differently? My line was:

./install_cp2k_toolchain.sh --mpi-mode=mpich --with-mpich=system --with-mkl=system --with-cmake=system --with-sirius=no

@OndrejMarsalek
Copy link
Contributor

It looks like the issue with ScaLAPACK from Netlib vs MKL is a separate thing (#435), but the issue with ELPA and MKL remains.

@OndrejMarsalek
Copy link
Contributor

I was able to solve the issue in a hackish way by following the instructions in the documentation:
https://gitlab.mpcdf.mpg.de/elpa/elpa/wikis/INSTALL
and setting SCALAPACK_LDFLAGS and SCALAPACK_FCFLAGS as recommended, only replacing MKL_HOME with MKLROOT.

A proper solution would get these from the variables the toolkit determines, but it at least shows the direction to take.

@OndrejMarsalek
Copy link
Contributor

Confusingly, I see a lot of warnings of the type:

*** Warning: Linking the shared library libelpa_openmp.la against the
*** static library /home/software/spack/opt/spack/linux-debian9-x86_64/gcc-8.2.0/intel-mkl-2019.3.199-bln5npz5ynyysxfs6dsmwaq2cyggl2eg/compilers_and_libraries_2019.3.199/linux/mkl/lib/intel64/libmkl_blacs_openmpi_lp64.a is not portable!

even though it is configured with --enable-shared=no.

@OndrejMarsalek
Copy link
Contributor

OndrejMarsalek commented Jun 28, 2019

It looks like ELPA just does not like those static libraries and --start-group/--end-group tags. When I change install_mkl.sh so that MKL_LIBS gets dynamic libraries instead, ELPA builds fine. Also, the above not portable warnings go away. CP2K itself also builds, it is just dynamically linked against MKL - I don't know if the current toolchain intentionally avoids that.

@dev-zero dev-zero added this to the v7.1 milestone Jul 12, 2019
@dev-zero dev-zero added this to Needs triage in Toolchain Bugfixing Dec 3, 2019
@dev-zero dev-zero moved this from Needs triage to High priority in Toolchain Bugfixing Dec 3, 2019
@OndrejMarsalek
Copy link
Contributor

It seems this should be resolved by 0d7f2f7, shall we close it?

@oschuett
Copy link
Member

Yes, this seems to have been fixed.

Toolchain Bugfixing automation moved this from High priority to Closed Mar 29, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
No open projects
Development

No branches or pull requests

4 participants