Skip to content
This repository has been archived by the owner on Nov 17, 2023. It is now read-only.

Unable to cross compile for aarch64 cuda with USE_CPP_PACKAGE=ON #20222

Closed
cyrusbehr opened this issue Apr 27, 2021 · 3 comments
Closed

Unable to cross compile for aarch64 cuda with USE_CPP_PACKAGE=ON #20222

cyrusbehr opened this issue Apr 27, 2021 · 3 comments

Comments

@cyrusbehr
Copy link

cyrusbehr commented Apr 27, 2021

This issue is building off of a previous issue here.

My goal is to cross compile on an x86 host for an aarch64 target, and build mxnet with cuda + cpp support.

As per the suggestion in that previous issue, I am compiling the library in the mxnet Dockerfile.build.jetson image which can be found here.

When I use the build commands in the linked issue, the compilation succeeds:

    cmake \
        -DCMAKE_TOOLCHAIN_FILE=${CMAKE_TOOLCHAIN_FILE} \
        -DUSE_CUDA=ON \
        -DMXNET_CUDA_ARCH="5.2" \
        -DUSE_OPENCV=OFF \
        -DUSE_OPENMP=ON \
        -DUSE_LAPACK=OFF \
        -DUSE_BLAS=Open \
        -DCMAKE_BUILD_TYPE=Release \
        -G Ninja /work/mxnet

However, I need to build the CPP package.
I am therefore using the following cmake command:

  cmake\
 -DCMAKE_TOOLCHAIN_FILE=${CMAKE_TOOLCHAIN_FILE} \
 -DUSE_OPENMP=ON \
 -DUSE_BLAS=Open \
 -DUSE_CUDA=ON\
 -DUSE_CUDNN=ON\
 -DMXNET_CUDA_ARCH="5.3;6.2;7.2"\
 -DENABLE_CUDA_RTC=OFF\
 -DCMAKE_BUILD_TYPE=Release\
 -DUSE_F16C=OFF\
 -GNinja\
 -DUSE_LAPACK=OFF\
 -DUSE_JEMALLOC=OFF\
 -DUSE_CPP_PACKAGE=ON\
 -DUSE_SIGNAL_HANDLER=OFF\
 -DUSE_OPENCV=OFF\
 -DUSE_MKL_IF_AVAILABLE=OFF\
 -DUSE_MKLDNN=OFF\
 -DBUILD_CPP_EXAMPLES=OFF\
 -DCMAKE_INSTALL_PREFIX=./packaged\
 ..

The build gets to step 579/601 then fails with the following message:

[579/601] cd /mxnet/cpp-package/scripts && echo Running:\ OpWrapperGenerator.py && python OpWrapperGenerator.py /mxnet/build_aarch64_cuda/libmxnet.so
FAILED: cpp-package/CMakeFiles/cpp_package_op_h ../cpp-package/include/mxnet-cpp/op.h cpp-package/MAIN_DEPENDENCY cpp-package/mxnet 
cd /mxnet/cpp-package/scripts && echo Running:\ OpWrapperGenerator.py && python OpWrapperGenerator.py /mxnet/build_aarch64_cuda/libmxnet.so
Running: OpWrapperGenerator.py
Traceback (most recent call last):
  File "OpWrapperGenerator.py", line 433, in <module>
    raise(e)
OSError: /mxnet/build_aarch64_cuda/libmxnet.so: cannot open shared object file: No such file or directory
ninja: build stopped: subcommand failed.

Note, if I run ls -la on the build directory, it shows that libmxnet.so is indeed there:

root@aa10ef4d3442:/mxnet/build_aarch64_cuda# ls -la
total 1348340
drwxr-xr-x 10 root root      4096 Apr 27 01:29 .
drwxr-xr-x 31 root root      4096 Apr 27 00:54 ..
-rw-r--r--  1 root root   1760312 Apr 27 01:29 .ninja_deps
-rw-r--r--  1 root root     62173 Apr 27 01:29 .ninja_log
drwxr-xr-x  5 root root      4096 Apr 27 00:54 3rdparty
-rw-r--r--  1 root root     43678 Apr 27 00:54 CMakeCache.txt
drwxr-xr-x 14 root root      4096 Apr 27 00:54 CMakeFiles
-rw-r--r--  1 root root       384 Apr 27 00:54 CTestTestfile.cmake
-rw-r--r--  1 root root      2677 Apr 27 00:54 DartConfiguration.tcl
drwxr-xr-x  3 root root      4096 Apr 27 00:54 Testing
drwxr-xr-x  2 root root      4096 Apr 27 00:54 bin
-rw-r--r--  1 root root    925363 Apr 27 00:54 build.ninja
drwxr-xr-x  2 root root      4096 Apr 27 00:54 cmake
-rw-r--r--  1 root root      4581 Apr 27 00:54 cmake_install.cmake
drwxr-xr-x  3 root root      4096 Apr 27 00:54 cpp-package
-rw-r--r--  1 root root         0 Apr 27 00:54 dummy.c
drwxr-xr-x  2 root root      4096 Apr 27 01:24 lib
-rwxr-xr-x  1 root root    918400 Apr 27 01:23 libcustomop_gpu_lib.so
-rwxr-xr-x  1 root root    229752 Apr 27 00:54 libcustomop_lib.so
-rw-r--r--  1 root root 919883526 Apr 27 01:29 libmxnet.a
-rwxr-xr-x  1 root root 455852192 Apr 27 01:29 libmxnet.so
-rwxr-xr-x  1 root root    207504 Apr 27 00:54 libpass_lib.so
-rwxr-xr-x  1 root root    257408 Apr 27 00:54 libsubgraph_lib.so
-rwxr-xr-x  1 root root    233936 Apr 27 01:24 libtransposecsr_lib.so
-rwxr-xr-x  1 root root    234272 Apr 27 00:54 libtransposerowsp_lib.so
drwxr-xr-x  3 root root      4096 Apr 27 00:54 tests

This is using version 1.8.0 of mxnet. This looks like a bug to me. Thoughts?

@leezu perhaps you can comment, as you were very helpful with the previous issue.

EDIT
I tried building without CUDA and that also failed. The issue has to do with enabling the cpp package when cross compiling.

@barry-jin
Copy link
Contributor

Hi @cyrusbehr , the cross-compilation issue for CPP-PACKAGE exists since #13303, because OpWrapperGenerator.py will try to dlopen libmxnet.so(aarch64 target) on an x86 host. The work around may be compiling twice: Firstly to compile a x86 target and use OpWrapperGenerator.py to generate cpp-package required op.h. Then command out https://github.com/apache/incubator-mxnet/blob/2d9c3b5e0a12bcafeb2f23eb25f5ef40fd7977b9/cpp-package/CMakeLists.txt#L15
and cross-compile an aarch64 target.

@cyrusbehr
Copy link
Author

Ok sounds good, let me give that a shot.

@cyrusbehr
Copy link
Author

This did the trick, thanks.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

2 participants