Skip to content
This repository has been archived by the owner on Nov 17, 2023. It is now read-only.

OpenMP Error #17641

Closed
icemelon opened this issue Feb 20, 2020 · 65 comments · Fixed by #17751
Closed

OpenMP Error #17641

icemelon opened this issue Feb 20, 2020 · 65 comments · Fixed by #17751
Labels
Bug CMake CMake related bugs/issues/improvements

Comments

@icemelon
Copy link
Member

Description

Compiled MxNet has duplicate OpenMP library link to both libomp and libiomp.

Error Message

(Paste the complete error message. Please also include stack trace by setting environment variable DMLC_LOG_STACK_TRACE_DEPTH=10 before running your script.)

OMP: Error #15: Initializing libiomp5.so, but found libomp.so already initialized.
OMP: Hint This means that multiple copies of the OpenMP runtime have been linked
into the program. That is dangerous, since it can degrade performance or cause
incorrect results. The best thing to do is to ensure that only a single OpenMP
runtime is linked into the process, e.g. by avoiding static linking of the
OpenMP runtime in any library. As an unsafe, unsupported, undocumented
workaround you can set the environment variable KMP_DUPLICATE_LIB_OK=TRUE to
allow the program to continue to execute, but that may cause crashes or silently
produce incorrect results. For more information, please see
http://www.intel.com/software/products/support/.

To Reproduce

I have both Intel MKL and MKLDNN library installed on Ubuntu 18.04. Use the following config to compile MxNet will lead the error shown above.

cmake -DUSE_CUDA=0 -DUSE_CUDNN=0 -DUSE_MKLDNN=1 -DCMAKE_BUILD_TYPE=Release -GNinja ..
ninja -v

What have you tried to solve it?

After I deleted 3rdparty/openmp, and recompiled mxnet, this error no longer occurs.

Environment

Ubuntu 18.04, installed with Intel MKL and MKLDNN library.

@icemelon icemelon added the Bug label Feb 20, 2020
@sl1pkn07
Copy link
Contributor

sl1pkn07 commented Feb 21, 2020

that is because git clone --recursive pull openmp, and the cmake script don't take care if use system openMP or not

https://github.com/apache/incubator-mxnet/blob/9dcf71d8fe33f77ed316a95fcffaf1f7f883ff70/CMakeLists.txt#L390-L430

unfortunately, openmp dont have pkg-config file, or cmake files for interact with cmake

seems need a own cmake module for search it

@TaoLv
Copy link
Member

TaoLv commented Feb 21, 2020

Same issue: #17366

@sl1pkn07
Copy link
Contributor

this case also same effects with intel-dnnl.

some distros already provide a package intel-dnnl, but mxnet force download the sources again

@leezu
Copy link
Contributor

leezu commented Feb 21, 2020

@cjolivier01 you previously vetoed changing the omp configuration in cmake build, due to a race condition that had not been fixed. As that has been fixed, are you OK with proceeding to prefer system OMP for the CMake build by default? Or what is your recommendation?

Static build should still statically build omp.

@leezu leezu added the CMake CMake related bugs/issues/improvements label Feb 21, 2020
@leezu
Copy link
Contributor

leezu commented Feb 21, 2020

@sl1pkn07 given the rapid development of intel-dnnl, MXNet expects a fixed version of intel-dnnl. It's quite unlikely that the system provides that particular version, but patches to improve detection are welcome. Do you want to contribute a PR? But let's track this in a separate issue.

@cjolivier01
Copy link
Member

  1. What is pulling in libiomp5.so ?
  2. Since when is libomp being linked in statically? I am not aware of this ever being the case.

@cjolivier01
Copy link
Member

cjolivier01 commented Feb 21, 2020

btw, cmake files have min cmake at 3.13, but default 18.04 cmake install is cmake 3.10. Does anyone know what the deal is with 3.13? Ubuntu 18.04 is a pretty widely-used release...
I changed back to 3.10 and it seems to build ok.

@cjolivier01
Copy link
Member

@cjolivier01 you previously vetoed changing the omp configuration in cmake build, due to a race condition that had not been fixed. As that has been fixed, are you OK with proceeding to prefer system OMP for the CMake build by default? Or what is your recommendation?

Static build should still statically build omp.

Not actually. Due to no legitimate reason to remove it.

@leezu
Copy link
Contributor

leezu commented Feb 21, 2020

What is pulling in libiomp5.so ?

MKL

btw, cmake files have min cmake at 3.13, but default 18.04 cmake install is cmake 3.10. Does anyone know what the deal is with 3.13? Ubuntu 18.04 is a pretty widely-used release...
I changed back to 3.10 and it seems to build ok.

Just pip install cmake as per the doc https://mxnet.apache.org/get_started/ubuntu_setup. It'd be harder to explain when users require 3.13 and when 3.X, or 3.Y, than to uniformly require a recent version. There are various bugs fixed in 3.13 that affect MXNet use-cases (eg cuda, https://cmake.org/cmake/help/latest/policy/CMP0077.html for llvm openmp subproject)

Not actually. Due to no legitimate reason to remove it.

Speed up developer build. No need to build llvm openmp if system openmp is present.

@cjolivier01
Copy link
Member

openmp is like a 4-5-second build.

On my desktop machine it's < 3:
real 0m2.940s
user 0m42.446s
sys 0m5.442s

@cjolivier01
Copy link
Member

I installed mkl, but it does not appear to pick it up. is there a way to force it?

@cjolivier01
Copy link
Member

cjolivier01 commented Feb 21, 2020

Actually, i don;t see this behavior when it does pull in mkl/pulling in the other omp (this is Ubuntu 18.04):

[chriso@chriso-ripper:~/src/mxnet/build (master)]ldd libmxnet.so 
        linux-vdso.so.1 (0x00007ffcbdf3b000)
        libmkl_rt.so => /opt/intel/mkl/lib/intel64/libmkl_rt.so (0x00007fb399dd8000)
        librt.so.1 => /lib/x86_64-linux-gnu/librt.so.1 (0x00007fb399bd0000)
        libomp.so => /home/chriso/src/mxnet/build/3rdparty/openmp/runtime/src/libomp.so (0x00007fb3998ea000)
        libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007fb3996e6000)
        libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007fb3994c7000)
        libstdc++.so.6 => /usr/lib/x86_64-linux-gnu/libstdc++.so.6 (0x00007fb39913e000)
        libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007fb398da0000)
        libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007fb398b88000)
        libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007fb398797000)
        /lib64/ld-linux-x86-64.so.2 (0x00007fb3a078c000)

I don;t show libmkl_rt.so pulling in libiomp5:

[chriso@chriso-ripper:~/src/mxnet/build (master)]ldd /opt/intel/mkl/lib/intel64/libmkl_rt.so
        linux-vdso.so.1 (0x00007ffd6c5cc000)
        libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007fc85058d000)
        libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007fc85019c000)
        /lib64/ld-linux-x86-64.so.2 (0x00007fc850e71000)

@leezu
Copy link
Contributor

leezu commented Feb 21, 2020

I think libmkl_rt may dlopen libiomp as per oneapi-src/oneDNN#230 (comment), but I haven't looked into this further yet

@cjolivier01
Copy link
Member

Linking in any version of omp statically would probably be a bad idea, since startup order would be important.

@cjolivier01
Copy link
Member

Clearly it does not:
[chriso@chriso-ripper:~/src/mxnet/build (master)]ldd /opt/intel/mkl/lib/intel64/libmkl_rt.so
linux-vdso.so.1 (0x00007ffd6c5cc000)
libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007fc85058d000)
libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007fc85019c000)
/lib64/ld-linux-x86-64.so.2 (0x00007fc850e71000)

@leezu
Copy link
Contributor

leezu commented Feb 21, 2020

Yes, that's why dlopen.

@sl1pkn07
Copy link
Contributor

and with readelf -a <lib> | grep NEEDED ?

@cjolivier01
Copy link
Member

can you supply a script to reproduc this error? I am not able to reproduce.

@sl1pkn07
Copy link
Contributor

i'm using system openmp and no mkl-dnnl. sorry @icemelon9?

@leezu
Copy link
Contributor

leezu commented Feb 21, 2020

@sl1pkn07 please open a separate issue for your problem. This issue is about MKL.

@leezu
Copy link
Contributor

leezu commented Feb 21, 2020

@icemelon9 please provide the a reproducer to trigger the error message.

@icemelon
Copy link
Member Author

Sorry about the late response. Here's the script to reproduce the error message.

import numpy as np
import mxnet as mx

a = mx.nd.array(np.random.uniform(size=(1024, 128)).astype('float32'))
b = mx.nd.array(np.random.uniform(size=(128, 1024)).astype('float32'))
c = mx.nd.dot(a, b)
c.wait_to_read()

The following shows shared library used by libmxnet on my machine.

mxnet git:(master) ldd build/libmxnet.so
        linux-vdso.so.1 (0x00007ffd8b467000)
        libmkl_rt.so => /opt/intel/mkl/lib/intel64/libmkl_rt.so (0x00007f1abac81000)
        librt.so.1 => /lib/x86_64-linux-gnu/librt.so.1 (0x00007f1abaa79000)
        libopencv_imgcodecs.so.3.2 => /usr/lib/x86_64-linux-gnu/libopencv_imgcodecs.so.3.2 (0x00007f1aba840000)
        libopencv_imgproc.so.3.2 => /usr/lib/x86_64-linux-gnu/libopencv_imgproc.so.3.2 (0x00007f1aba2ef000)
        libopencv_core.so.3.2 => /usr/lib/x86_64-linux-gnu/libopencv_core.so.3.2 (0x00007f1ab9eb4000)
        libomp.so => /home/ubuntu/repo/mxnet/build/3rdparty/openmp/runtime/src/libomp.so (0x00007f1ab9bce000)
        libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007f1ab99ca000)
        libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007f1ab97ab000)
        libstdc++.so.6 => /usr/lib/x86_64-linux-gnu/libstdc++.so.6 (0x00007f1ab9422000)
        libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007f1ab9084000)
        libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007f1ab8e6c000)
        libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f1ab8a7b000)
        ...
mxnet git:(master) readelf -a build/libmxnet.so| grep NEEDED
 0x0000000000000001 (NEEDED)             Shared library: [libmkl_rt.so]
 0x0000000000000001 (NEEDED)             Shared library: [librt.so.1]
 0x0000000000000001 (NEEDED)             Shared library: [libopencv_imgcodecs.so.3.2]
 0x0000000000000001 (NEEDED)             Shared library: [libopencv_imgproc.so.3.2]
 0x0000000000000001 (NEEDED)             Shared library: [libopencv_core.so.3.2]
 0x0000000000000001 (NEEDED)             Shared library: [libomp.so]
 0x0000000000000001 (NEEDED)             Shared library: [libdl.so.2]
 0x0000000000000001 (NEEDED)             Shared library: [libpthread.so.0]
 0x0000000000000001 (NEEDED)             Shared library: [libstdc++.so.6]
 0x0000000000000001 (NEEDED)             Shared library: [libm.so.6]
 0x0000000000000001 (NEEDED)             Shared library: [libgcc_s.so.1]
 0x0000000000000001 (NEEDED)             Shared library: [libc.so.6]
 0x0000000000000001 (NEEDED)             Shared library: [ld-linux-x86-64.so.2]

@cjolivier01
Copy link
Member

 0x0000000000000001 (NEEDED)             Shared library: [libmkl_rt.so]
 0x0000000000000001 (NEEDED)             Shared library: [librt.so.1]
 0x0000000000000001 (NEEDED)             Shared library: [libopencv_imgcodecs.so.3.2]
 0x0000000000000001 (NEEDED)             Shared library: [libopencv_imgproc.so.3.2]
 0x0000000000000001 (NEEDED)             Shared library: [libopencv_core.so.3.2]
 0x0000000000000001 (NEEDED)             Shared library: [libomp.so]
 0x0000000000000001 (NEEDED)             Shared library: [libdl.so.2]
 0x0000000000000001 (NEEDED)             Shared library: [libpthread.so.0]
 0x0000000000000001 (NEEDED)             Shared library: [libstdc++.so.6]
 0x0000000000000001 (NEEDED)             Shared library: [libm.so.6]
 0x0000000000000001 (NEEDED)             Shared library: [libgcc_s.so.1]
 0x0000000000000001 (NEEDED)             Shared library: [libc.so.6]
 0x0000000000000001 (NEEDED)             Shared library: [ld-linux-x86-64.so.2]

@cjolivier01
Copy link
Member

(pytorch) [chriso@chriso-ripper:~/src/mxnet (master)]python
Python 3.6.10 |Anaconda, Inc.| (default, Jan  7 2020, 21:14:29) 
[GCC 7.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import numpy as np
>>> import mxnet as mx
>>> 
>>> a = mx.nd.array(np.random.uniform(size=(1024, 128)).astype('float32'))
>>> b = mx.nd.array(np.random.uniform(size=(128, 1024)).astype('float32'))
>>> c = mx.nd.dot(a, b)
>>> c.wait_to_read()
>>> exit()

Stll can't reproduce.
Can you send entire cmake config log?

@leezu
Copy link
Contributor

leezu commented Feb 24, 2020

CI can also reproduce this issue. I switched CI to testing CMake builds instead of Makefile build in #17645 and the Python MKLDNN + MKL Pipeline fails with this issue: Log of test failure and Raw log of test failure

and Raw log of build

@cjolivier01 the build log contains the output of cmake configuration.

That pipeline relies on the following build

build_ubuntu_cpu_mkldnn_mkl() {
    set -ex
    cd /work/build
    cmake \
        -DCMAKE_BUILD_TYPE="RelWithDebInfo" \
        -DUSE_MKL_IF_AVAILABLE=ON \
        -DBLAS="MKL" \
        -DUSE_TVM_OP=ON \
        -DUSE_CUDA=OFF \
        -DUSE_CPP_PACKAGE=ON \
        -G Ninja /work/mxnet
    ninja
}

@icemelon
Copy link
Member Author

Here is the cmake log.

build git:(master) ✗ cmake -DUSE_CUDA=0 -DUSE_CUDNN=0 -DUSE_MKLDNN=1 -DCMAKE_BUILD_TYPE=Release -GNinja ..
-- The C compiler identification is GNU 7.4.0
-- The CXX compiler identification is GNU 7.4.0
-- Check for working C compiler: /usr/bin/cc
-- Check for working C compiler: /usr/bin/cc -- works
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Detecting C compile features
-- Detecting C compile features - done
-- Check for working CXX compiler: /usr/bin/c++
-- Check for working CXX compiler: /usr/bin/c++ -- works
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- CMAKE_CROSSCOMPILING FALSE
-- CMAKE_HOST_SYSTEM_PROCESSOR x86_64
-- CMAKE_SYSTEM_PROCESSOR x86_64
-- CMAKE_SYSTEM_NAME Linux
-- CMake version '3.16.4' using generator 'Ninja'
-- Performing Test SUPPORT_CXX11
-- Performing Test SUPPORT_CXX11 - Success
-- Performing Test SUPPORT_CXX0X
-- Performing Test SUPPORT_CXX0X - Success
-- MKL-DNN compat: set DNNL_BUILD_EXAMPLES to MKLDNN_BUILD_EXAMPLES with value `OFF`
-- MKL-DNN compat: set DNNL_BUILD_TESTS to MKLDNN_BUILD_TESTS with value `OFF`
-- MKL-DNN compat: set DNNL_ENABLE_JIT_PROFILING to MKLDNN_ENABLE_JIT_PROFILING with value `OFF`
-- MKL-DNN compat: set DNNL_LIBRARY_TYPE to MKLDNN_LIBRARY_TYPE with value `STATIC`
-- MKL-DNN compat: set DNNL_ARCH_OPT_FLAGS to MKLDNN_ARCH_OPT_FLAGS with value ``
-- Looking for pthread.h
-- Looking for pthread.h - found
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD - Failed
-- Looking for pthread_create in pthreads
-- Looking for pthread_create in pthreads - not found
-- Looking for pthread_create in pthread
-- Looking for pthread_create in pthread - found
-- Found Threads: TRUE
-- Found OpenMP_C: -fopenmp (found version "4.5")
-- Found OpenMP_CXX: -fopenmp (found version "4.5")
-- Found OpenMP: TRUE (found version "4.5")
-- GPU support is disabled
-- Could NOT find Doxygen (missing: DOXYGEN_EXECUTABLE)
-- Found Git: /usr/bin/git (found version "2.17.1")
-- Intel(R) VTune(TM) Amplifier JIT profiling disabled
-- Found MKL: /opt/intel/mkl/include
-- Found MKL (include: /opt/intel/mkl/include, lib: /opt/intel/mkl/lib/intel64/libmkl_rt.so
-- Found OpenCV: /usr (found version "3.2.0") found components: core highgui imgproc imgcodecs
-- OpenCV 3.2.0 found (/usr/share/OpenCV)
--  OpenCV_LIBS=opencv_core;opencv_highgui;opencv_imgproc;opencv_imgcodecs
-- Performing Test OPENMP_HAVE_WERROR_FLAG
-- Performing Test OPENMP_HAVE_WERROR_FLAG - Success
-- Performing Test OPENMP_HAVE_STD_GNUPP11_FLAG
-- Performing Test OPENMP_HAVE_STD_GNUPP11_FLAG - Success
-- Performing Test OPENMP_HAVE_STD_CPP11_FLAG
-- Performing Test OPENMP_HAVE_STD_CPP11_FLAG - Success
-- Found PythonInterp: /home/ubuntu/anaconda3/envs/tvm/bin/python (found version "3.6.9")
-- Cannot find llvm-lit.
-- Please put llvm-lit in your PATH, set OPENMP_LLVM_LIT_EXECUTABLE to its full path, or point OPENMP_LLVM_TOOLS_DIR to its directory.
CMake Warning at 3rdparty/openmp/cmake/OpenMPTesting.cmake:22 (message):
  The check targets will not be available!
Call Stack (most recent call first):
  3rdparty/openmp/cmake/OpenMPTesting.cmake:40 (find_standalone_test_dependencies)
  3rdparty/openmp/CMakeLists.txt:49 (include)


-- Performing Test LIBOMP_HAVE_FNO_EXCEPTIONS_FLAG
-- Performing Test LIBOMP_HAVE_FNO_EXCEPTIONS_FLAG - Success
-- Performing Test LIBOMP_HAVE_FNO_RTTI_FLAG
-- Performing Test LIBOMP_HAVE_FNO_RTTI_FLAG - Success
-- Performing Test LIBOMP_HAVE_X_CPP_FLAG
-- Performing Test LIBOMP_HAVE_X_CPP_FLAG - Success
-- Performing Test LIBOMP_HAVE_WCAST_QUAL_FLAG
-- Performing Test LIBOMP_HAVE_WCAST_QUAL_FLAG - Success
-- Performing Test LIBOMP_HAVE_WNO_UNUSED_FUNCTION_FLAG
-- Performing Test LIBOMP_HAVE_WNO_UNUSED_FUNCTION_FLAG - Success
-- Performing Test LIBOMP_HAVE_WNO_UNUSED_LOCAL_TYPEDEF_FLAG
-- Performing Test LIBOMP_HAVE_WNO_UNUSED_LOCAL_TYPEDEF_FLAG - Failed
-- Performing Test LIBOMP_HAVE_WNO_UNUSED_VALUE_FLAG
-- Performing Test LIBOMP_HAVE_WNO_UNUSED_VALUE_FLAG - Success
-- Performing Test LIBOMP_HAVE_WNO_UNUSED_VARIABLE_FLAG
-- Performing Test LIBOMP_HAVE_WNO_UNUSED_VARIABLE_FLAG - Success
-- Performing Test LIBOMP_HAVE_WNO_SWITCH_FLAG
-- Performing Test LIBOMP_HAVE_WNO_SWITCH_FLAG - Success
-- Performing Test LIBOMP_HAVE_WNO_COVERED_SWITCH_DEFAULT_FLAG
-- Performing Test LIBOMP_HAVE_WNO_COVERED_SWITCH_DEFAULT_FLAG - Failed
-- Performing Test LIBOMP_HAVE_WNO_DEPRECATED_REGISTER_FLAG
-- Performing Test LIBOMP_HAVE_WNO_DEPRECATED_REGISTER_FLAG - Failed
-- Performing Test LIBOMP_HAVE_WNO_SIGN_COMPARE_FLAG
-- Performing Test LIBOMP_HAVE_WNO_SIGN_COMPARE_FLAG - Success
-- Performing Test LIBOMP_HAVE_WNO_GNU_ANONYMOUS_STRUCT_FLAG
-- Performing Test LIBOMP_HAVE_WNO_GNU_ANONYMOUS_STRUCT_FLAG - Failed
-- Performing Test LIBOMP_HAVE_WNO_UNKNOWN_PRAGMAS_FLAG
-- Performing Test LIBOMP_HAVE_WNO_UNKNOWN_PRAGMAS_FLAG - Success
-- Performing Test LIBOMP_HAVE_WNO_MISSING_FIELD_INITIALIZERS_FLAG
-- Performing Test LIBOMP_HAVE_WNO_MISSING_FIELD_INITIALIZERS_FLAG - Success
-- Performing Test LIBOMP_HAVE_WNO_MISSING_BRACES_FLAG
-- Performing Test LIBOMP_HAVE_WNO_MISSING_BRACES_FLAG - Success
-- Performing Test LIBOMP_HAVE_WNO_COMMENT_FLAG
-- Performing Test LIBOMP_HAVE_WNO_COMMENT_FLAG - Success
-- Performing Test LIBOMP_HAVE_WNO_SELF_ASSIGN_FLAG
-- Performing Test LIBOMP_HAVE_WNO_SELF_ASSIGN_FLAG - Failed
-- Performing Test LIBOMP_HAVE_WNO_VLA_EXTENSION_FLAG
-- Performing Test LIBOMP_HAVE_WNO_VLA_EXTENSION_FLAG - Failed
-- Performing Test LIBOMP_HAVE_WNO_FORMAT_PEDANTIC_FLAG
-- Performing Test LIBOMP_HAVE_WNO_FORMAT_PEDANTIC_FLAG - Failed
-- Performing Test LIBOMP_HAVE_WSTRINGOP_OVERFLOW_FLAG
-- Performing Test LIBOMP_HAVE_WSTRINGOP_OVERFLOW_FLAG - Success
-- Performing Test LIBOMP_HAVE_MSSE2_FLAG
-- Performing Test LIBOMP_HAVE_MSSE2_FLAG - Success
-- Performing Test LIBOMP_HAVE_FTLS_MODEL_FLAG
-- Performing Test LIBOMP_HAVE_FTLS_MODEL_FLAG - Success
-- Performing Test LIBOMP_HAVE_MMIC_FLAG
-- Performing Test LIBOMP_HAVE_MMIC_FLAG - Failed
-- Performing Test LIBOMP_HAVE_M32_FLAG
-- Performing Test LIBOMP_HAVE_M32_FLAG - Failed
-- Performing Test LIBOMP_HAVE_X_FLAG
-- Performing Test LIBOMP_HAVE_X_FLAG - Success
-- Performing Test LIBOMP_HAVE_WARN_SHARED_TEXTREL_FLAG
-- Performing Test LIBOMP_HAVE_WARN_SHARED_TEXTREL_FLAG - Success
-- Performing Test LIBOMP_HAVE_AS_NEEDED_FLAG
-- Performing Test LIBOMP_HAVE_AS_NEEDED_FLAG - Success
-- Performing Test LIBOMP_HAVE_VERSION_SCRIPT_FLAG
-- Performing Test LIBOMP_HAVE_VERSION_SCRIPT_FLAG - Success
-- Performing Test LIBOMP_HAVE_STATIC_LIBGCC_FLAG
-- Performing Test LIBOMP_HAVE_STATIC_LIBGCC_FLAG - Success
-- Performing Test LIBOMP_HAVE_Z_NOEXECSTACK_FLAG
-- Performing Test LIBOMP_HAVE_Z_NOEXECSTACK_FLAG - Success
-- Performing Test LIBOMP_HAVE_FINI_FLAG
-- Performing Test LIBOMP_HAVE_FINI_FLAG - Success
-- Found Perl: /usr/bin/perl (found version "5.26.1")
-- Performing Test LIBOMP_HAVE_VERSION_SYMBOLS
-- Performing Test LIBOMP_HAVE_VERSION_SYMBOLS - Success
-- Performing Test LIBOMP_HAVE___BUILTIN_FRAME_ADDRESS
-- Performing Test LIBOMP_HAVE___BUILTIN_FRAME_ADDRESS - Success
-- Performing Test LIBOMP_HAVE_WEAK_ATTRIBUTE
-- Performing Test LIBOMP_HAVE_WEAK_ATTRIBUTE - Success
-- Looking for include files windows.h, psapi.h
-- Looking for include files windows.h, psapi.h - not found
-- Looking for EnumProcessModules in psapi
-- Looking for EnumProcessModules in psapi - not found
-- LIBOMP: Operating System     -- Linux
-- LIBOMP: Target Architecture  -- x86_64
-- LIBOMP: Build Type           -- Release
-- LIBOMP: Library Kind         -- SHARED
-- LIBOMP: Library Type         -- normal
-- LIBOMP: Fortran Modules      -- FALSE
-- LIBOMP: Build                -- 20140926
-- LIBOMP: Use Stats-gathering  -- FALSE
-- LIBOMP: Use Debugger-support -- FALSE
-- LIBOMP: Use ITT notify       -- TRUE
-- LIBOMP: Use OMPT-support     -- TRUE
-- LIBOMP: Use OMPT-optional  -- TRUE
-- LIBOMP: Use Adaptive locks   -- TRUE
-- LIBOMP: Use quad precision   -- TRUE
-- LIBOMP: Use TSAN-support     -- FALSE
-- LIBOMP: Use Hwloc library    -- FALSE
-- Looking for sqrt in m
-- Looking for sqrt in m - found
-- Looking for __atomic_load_1
-- Looking for __atomic_load_1 - not found
-- Looking for __atomic_load_1 in atomic
-- Looking for __atomic_load_1 in atomic - found
-- check-libomp does nothing.
-- check-ompt does nothing.
-- check-openmp does nothing.
USE_LAPACK is ON
CMake Warning at 3rdparty/googletest/googletest/CMakeLists.txt:47 (project):
  VERSION keyword not followed by a value or was followed by a value that
  expanded to nothing.


-- Found GTest: gtest
-- Looking for clock_gettime in rt
-- Looking for clock_gettime in rt - found
-- Looking for fopen64
-- Looking for fopen64 - not found
-- Looking for C++ include cxxabi.h
-- Looking for C++ include cxxabi.h - found
-- Looking for nanosleep
-- Looking for nanosleep - found
-- Looking for backtrace
-- Looking for backtrace - found
-- backtrace facility detected in default set of libraries
-- Found Backtrace: /usr/include
-- Check if the system is big endian
-- Searching 16 bit integer
-- Looking for sys/types.h
-- Looking for sys/types.h - found
-- Looking for stdint.h
-- Looking for stdint.h - found
-- Looking for stddef.h
-- Looking for stddef.h - found
-- Check size of unsigned short
-- Check size of unsigned short - done
-- Using unsigned short
-- Check if the system is big endian - little endian
-- /home/ubuntu/repo/mxnet/3rdparty/dmlc-core/cmake/build_config.h.in -> include/dmlc/build_config.h
-- Performing Test SUPPORT_MSSE2
-- Performing Test SUPPORT_MSSE2 - Success
-- Found OpenMP_C: -fopenmp
-- Found OpenMP_CXX: -fopenmp
-- Found OpenMP: TRUE
-- Performing Test SUPPORT_MSSE3
-- Performing Test SUPPORT_MSSE3 - Success
-- Determining F16C support
-- Performing Test COMPILER_SUPPORT_MF16C
-- Performing Test COMPILER_SUPPORT_MF16C - Success
-- Configuring done
-- Generating done
-- Build files have been written to: /home/ubuntu/repo/mxnet/build

@cjolivier01
Copy link
Member

This is with latest 2020 version of mkl?

@cjolivier01
Copy link
Member

@TaoLv
what version of gcc?

@cjolivier01 this will happen with any version of gcc

@TaoLv which version of gcc?

@leezu
Copy link
Contributor

leezu commented Feb 25, 2020

even if I remove the openmp build in CMakeLists.txt and build with clang, I get that warning, since it pulls in libomp from clang

@cjolivier01 How can we stop clang from pulling in libomp? It doesn't have an effect when static linking, as the symbols are already resolved, but it would be better to not pull libomp in in the first place.

@cjolivier01
Copy link
Member

Stopping clang/others from linking to omp seems like the tail wagging the dog. I think we should consider other options, such as making the static mkl build work, or somehow stopping mkl from being so "clever".

@TaoLv
Copy link
Member

TaoLv commented Feb 26, 2020

@TaoLv which version of gcc?

It's 4.8.5 on centos. Do you want me to try a higher version or exclude opencv from the build?

-DMKL_USE_STATIC_LIBS=ON is currently broken on my system and in particular doesn't link iomp statically.

@leezu, with this flag on, I would expect only MKL libraries to be statically linked while omp runtime is dynamically linked to mxnet.so. That's how we handle the omp linkage of DNNL.

@TaoLv
Copy link
Member

TaoLv commented Feb 26, 2020

@leezu what's the error of statically linking MKL libraries?

@TaoLv
Copy link
Member

TaoLv commented Feb 26, 2020

@leezu, previously we thought intel is not distributing iomp static library: #8532 (comment). But from the linked issue, even we fix the omp runtime conflict inside mxnet, we may still encounter conflicts in down stream projects.

@cjolivier01
Copy link
Member

cjolivier01 commented Feb 26, 2020

@leezu what's the error of statically linking MKL libraries?

for me it was some link error on some secondary thing like cpp unit test or something like that. libmxnet.so built successfully and the test script was successful. probably not too hard to fix.

@TaoLv
Copy link
Member

TaoLv commented Feb 26, 2020

Thank you @cjolivier01 . That's exactly what I just observed.

@leezu
Copy link
Contributor

leezu commented Feb 26, 2020

@TaoLv @cjolivier01 its not hard to fix. If you look above, I posted the patch to fix it 12 hours ago.

An improved version of that patch is in #17645

@TaoLv
Copy link
Member

TaoLv commented Feb 26, 2020

Thanks @leezu! It seems that we have got a consensus to address this issue?

@cjolivier01
Copy link
Member

there’s a lot of stuff in that PR,would prefer a more targeted PR.
Also, I think the best fix is link statically to
MKL as we discussed, which as far as I can tell is not addressed in this PR, although I didn’t read every line. i will review some more when i get to my day job, but it would be better to have a small, targeted PR.

@cjolivier01
Copy link
Member

cjolivier01 commented Feb 26, 2020

i will post a pr in the next day or two that addresses this and also clang issue as well as transitive omp dependencies which may also cause the error due to mkl behaving foolishly.

@leezu
Copy link
Contributor

leezu commented Feb 26, 2020

@cjolivier01 the PR coniststs of two commits. Only the second commit is related to omp and implements the conclusion from the discussion in this issue.

I have removed the second commit and disabled testing the MKL cmake builds. I look forward to your improved fix, thanks for contributing that.

leezu added a commit that referenced this issue Feb 28, 2020
The following Makefile based builds are preserved
1) staticbuild scripts
2) Docs builds. Language binding specific build logic requires further changes
3) Jetson build. Jetpack 3.3 toolchain based on Cuda 9.0 causes 'Internal
   Compiler Error (codegen): "there was an error in verifying the lgenfe
   output!"' errors with cmake. This seems to be a known issue in Cuda 9.0 and
   we need to update Jetpack toolchain to work around it.
4) MKL builds. Waiting for fix of #17641

All Makefile based builds are marked with a "Makefile" postfix in the title.

Improvements to CMake build
- Enable -Werror for RelWithDebugInfo build in analogy to "make DEV=1" build
- Add USE_LIBJPEG_TURBO to CMake build
- Improve finding Python 3 executable

Changes to CI setup
- Install protobuf and zmq where missing
- Install up-to-date CMake on Centos 7
- Don't use RelWithDebInfo on Android builds, as gcc 4.9 throws
  -Wdelete-non-virtual-dtor

Code changes
- Disable warnings introduced by GCC7 at via #pragma GCC diagnostic
@leezu
Copy link
Contributor

leezu commented Mar 2, 2020

@cjolivier01 thanks for volunteering to contribute the PR! Do you have any status update?

@leezu
Copy link
Contributor

leezu commented Mar 3, 2020

@cjolivier01 please prioritize the PR, as this affects other users. For example #17733

Let me know if I may resubmit the MKL static linkage commit earlier included in #17645.

@cjolivier01
Copy link
Member

cjolivier01 commented Mar 3, 2020 via email

leezu added a commit to leezu/mxnet that referenced this issue Mar 3, 2020
leezu added a commit that referenced this issue Mar 4, 2020
* Fix MKL static link & default to static link on unix

Fixes #17641

* Test cmake MKL build on CI
MoisesHer pushed a commit to MoisesHer/incubator-mxnet that referenced this issue Apr 10, 2020
The following Makefile based builds are preserved
1) staticbuild scripts
2) Docs builds. Language binding specific build logic requires further changes
3) Jetson build. Jetpack 3.3 toolchain based on Cuda 9.0 causes 'Internal
   Compiler Error (codegen): "there was an error in verifying the lgenfe
   output!"' errors with cmake. This seems to be a known issue in Cuda 9.0 and
   we need to update Jetpack toolchain to work around it.
4) MKL builds. Waiting for fix of apache#17641

All Makefile based builds are marked with a "Makefile" postfix in the title.

Improvements to CMake build
- Enable -Werror for RelWithDebugInfo build in analogy to "make DEV=1" build
- Add USE_LIBJPEG_TURBO to CMake build
- Improve finding Python 3 executable

Changes to CI setup
- Install protobuf and zmq where missing
- Install up-to-date CMake on Centos 7
- Don't use RelWithDebInfo on Android builds, as gcc 4.9 throws
  -Wdelete-non-virtual-dtor

Code changes
- Disable warnings introduced by GCC7 at via #pragma GCC diagnostic
MoisesHer pushed a commit to MoisesHer/incubator-mxnet that referenced this issue Apr 10, 2020
* Fix MKL static link & default to static link on unix

Fixes apache#17641

* Test cmake MKL build on CI
anirudh2290 pushed a commit to anirudh2290/mxnet that referenced this issue May 29, 2020
The following Makefile based builds are preserved
1) staticbuild scripts
2) Docs builds. Language binding specific build logic requires further changes
3) Jetson build. Jetpack 3.3 toolchain based on Cuda 9.0 causes 'Internal
   Compiler Error (codegen): "there was an error in verifying the lgenfe
   output!"' errors with cmake. This seems to be a known issue in Cuda 9.0 and
   we need to update Jetpack toolchain to work around it.
4) MKL builds. Waiting for fix of apache#17641

All Makefile based builds are marked with a "Makefile" postfix in the title.

Improvements to CMake build
- Enable -Werror for RelWithDebugInfo build in analogy to "make DEV=1" build
- Add USE_LIBJPEG_TURBO to CMake build
- Improve finding Python 3 executable

Changes to CI setup
- Install protobuf and zmq where missing
- Install up-to-date CMake on Centos 7
- Don't use RelWithDebInfo on Android builds, as gcc 4.9 throws
  -Wdelete-non-virtual-dtor

Code changes
- Disable warnings introduced by GCC7 at via #pragma GCC diagnostic
anirudh2290 pushed a commit to anirudh2290/mxnet that referenced this issue May 29, 2020
* Fix MKL static link & default to static link on unix

Fixes apache#17641

* Test cmake MKL build on CI
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Bug CMake CMake related bugs/issues/improvements
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants