Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

amp_C.cpython-38-x86_64-linux-gnu.so: undefined symbol: _ZNSt15__exception_ptr13exception_ptr9_M_addrefEv #1335

Closed
stas00 opened this issue Mar 17, 2022 · 3 comments
Labels
bug Something isn't working

Comments

@stas00
Copy link
Contributor

stas00 commented Mar 17, 2022

Describe the Bug

amp_C fails to load when apex is built against pytorch-1.10.2

python -c "import torch; import amp_C"
Traceback (most recent call last):
  File "<string>", line 1, in <module>
ImportError: /home/stas/anaconda3/envs/py38-pt110/lib/python3.8/site-packages/amp_C.cpython-38-x86_64-linux-gnu.so: undefined symbol: _ZNSt15__exception_ptr13exception_ptr9_M_addrefEv

It works just fine if I build against pytorch-1.11.0

Also I have just moved from ubuntu 20.04 to 21.10, and I had no problem using apex built against pt-1.10 until now.

Minimal Steps/Code to Reproduce the Bug

Install:

conda install pytorch==1.10.2 torchtext torchaudio cudatoolkit=11.3 -c pytorch
git clone https://github.com/NVIDIA/apex/
cd apex
rm -rf build
MAX_JOBS=12 TORCH_CUDA_ARCH_LIST="6.1;8.6" pip install --global-option="--cpp_ext" --global-option="--cuda_ext" --no-cache -v --disable-pip-version-check . 

Diagnosis:

LD_LIBRARY_PATH=/mnt/nvme0/anaconda3/envs/py38-pt110/lib/python3.8/site-packages/torch/lib/ ldd /home/stas/anaconda3/envs/py38-pt110/lib/python3.8/site-packages/amp_C.cpython-38-x86_64-linux-gnu.so
        linux-vdso.so.1 (0x00007ffebf1f9000)
        libgtk3-nocsd.so.0 => /lib/x86_64-linux-gnu/libgtk3-nocsd.so.0 (0x00007ff38e806000)
        libc10.so => /mnt/nvme0/anaconda3/envs/py38-pt110/lib/python3.8/site-packages/torch/lib/libc10.so (0x00007ff38e586000)
        libtorch.so => /mnt/nvme0/anaconda3/envs/py38-pt110/lib/python3.8/site-packages/torch/lib/libtorch.so (0x00007ff38e384000)
        libtorch_cpu.so => /mnt/nvme0/anaconda3/envs/py38-pt110/lib/python3.8/site-packages/torch/lib/libtorch_cpu.so (0x00007ff382d67000)
        libtorch_python.so => /mnt/nvme0/anaconda3/envs/py38-pt110/lib/python3.8/site-packages/torch/lib/libtorch_python.so (0x00007ff381874000)
        libcudart.so.11.0 => /home/stas/anaconda3/envs/py38-pt110/lib/libcudart.so.11.0 (0x00007ff3815d7000)
        libc10_cuda.so => /mnt/nvme0/anaconda3/envs/py38-pt110/lib/python3.8/site-packages/torch/lib/libc10_cuda.so (0x00007ff381388000)
        libtorch_cuda_cu.so => /mnt/nvme0/anaconda3/envs/py38-pt110/lib/python3.8/site-packages/torch/lib/libtorch_cuda_cu.so (0x00007ff33c0cd000)
        libtorch_cuda_cpp.so => /mnt/nvme0/anaconda3/envs/py38-pt110/lib/python3.8/site-packages/torch/lib/libtorch_cuda_cpp.so (0x00007ff2f41cc000)
        libstdc++.so.6 => /home/stas/anaconda3/envs/py38-pt110/lib/libstdc++.so.6 (0x00007ff2f4055000)
        libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007ff2f3f71000)
        libgcc_s.so.1 => /home/stas/anaconda3/envs/py38-pt110/lib/libgcc_s.so.1 (0x00007ff2f3f5d000)
        libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007ff2f3d35000)
        libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007ff2f3d30000)
        libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007ff2f3d2b000)
        libgomp.so.1 => /mnt/nvme0/anaconda3/envs/py38-pt110/lib/python3.8/site-packages/torch/lib/../../../../libgomp.so.1 (0x00007ff2f3cfc000)
        /lib64/ld-linux-x86-64.so.2 (0x00007ff38ee0b000)
        libtorch_cuda.so => /mnt/nvme0/anaconda3/envs/py38-pt110/lib/python3.8/site-packages/torch/lib/libtorch_cuda.so (0x00007ff2f3ae8000)
        librt.so.1 => /lib/x86_64-linux-gnu/librt.so.1 (0x00007ff2f3ae3000)
        libmkl_intel_lp64.so => /mnt/nvme0/anaconda3/envs/py38-pt110/lib/python3.8/site-packages/torch/lib/../../../../libmkl_intel_lp64.so (0x00007ff2f2f44000)
        libmkl_gnu_thread.so => /mnt/nvme0/anaconda3/envs/py38-pt110/lib/python3.8/site-packages/torch/lib/../../../../libmkl_gnu_thread.so (0x00007ff2f13b9000)
        libmkl_core.so => /mnt/nvme0/anaconda3/envs/py38-pt110/lib/python3.8/site-packages/torch/lib/../../../../libmkl_core.so (0x00007ff2ecf49000)
        libshm.so => /mnt/nvme0/anaconda3/envs/py38-pt110/lib/python3.8/site-packages/torch/lib/libshm.so (0x00007ff2ecd41000)
        libnvToolsExt.so.1 => /mnt/nvme0/anaconda3/envs/py38-pt110/lib/python3.8/site-packages/torch/lib/../../../../libnvToolsExt.so.1 (0x00007ff2ecb37000)
        libcusparse.so.11 => /mnt/nvme0/anaconda3/envs/py38-pt110/lib/python3.8/site-packages/torch/lib/../../../../libcusparse.so.11 (0x00007ff2de84b000)
        libcurand.so.10 => /mnt/nvme0/anaconda3/envs/py38-pt110/lib/python3.8/site-packages/torch/lib/../../../../libcurand.so.10 (0x00007ff2d90bc000)
        libcusolver.so.11 => /mnt/nvme0/anaconda3/envs/py38-pt110/lib/python3.8/site-packages/torch/lib/../../../../libcusolver.so.11 (0x00007ff2cc1f5000)
        libcublas.so.11 => /mnt/nvme0/anaconda3/envs/py38-pt110/lib/python3.8/site-packages/torch/lib/../../../../libcublas.so.11 (0x00007ff2c4bb2000)
        libcufft.so.10 => /mnt/nvme0/anaconda3/envs/py38-pt110/lib/python3.8/site-packages/torch/lib/../../../../libcufft.so.10 (0x00007ff2b93b7000)
        libcublasLt.so.11 => /mnt/nvme0/anaconda3/envs/py38-pt110/lib/python3.8/site-packages/torch/lib/../../../../libcublasLt.so.11 (0x00007ff2a89a5000)

missing symbol:

nm /home/stas/anaconda3/envs/py38-pt110/lib/python3.8/site-packages/amp_C.cpython-38-x86_64-linux-gnu.so | grep _ZNSt15__exception_ptr13exception_ptr9_M_addrefEv
                 U _ZNSt15__exception_ptr13exception_ptr9_M_addrefEv

I have rebuilt it about 5 times to make sure but something is still missing.

Thank you!

Environment

Collecting environment information...
PyTorch version: 1.10.2
Is debug build: False
CUDA used to build PyTorch: 11.3
ROCM used to build PyTorch: N/A

OS: Ubuntu 21.10 (x86_64)
GCC version: (Ubuntu 11.2.0-7ubuntu2) 11.2.0
Clang version: 13.0.0-2
CMake version: version 3.21.3
Libc version: glibc-2.34

Python version: 3.8.11 (default, Aug  3 2021, 15:09:35)  [GCC 7.5.0] (64-bit runtime)
Python platform: Linux-5.13.0-35-generic-x86_64-with-glibc2.17
Is CUDA available: True
CUDA runtime version: 11.5.119
GPU models and configuration: 
GPU 0: NVIDIA GeForce GTX 1070 Ti
GPU 1: NVIDIA GeForce RTX 3090

Nvidia driver version: 510.47.03
cuDNN version: Probably one of the following:
/usr/lib/x86_64-linux-gnu/libcudnn.so.7.6.5
/usr/lib/x86_64-linux-gnu/libcudnn.so.8.3.2
/usr/lib/x86_64-linux-gnu/libcudnn_adv_infer.so.8.3.2
/usr/lib/x86_64-linux-gnu/libcudnn_adv_train.so.8.3.2
/usr/lib/x86_64-linux-gnu/libcudnn_cnn_infer.so.8.3.2
/usr/lib/x86_64-linux-gnu/libcudnn_cnn_train.so.8.3.2
/usr/lib/x86_64-linux-gnu/libcudnn_ops_infer.so.8.3.2
/usr/lib/x86_64-linux-gnu/libcudnn_ops_train.so.8.3.2
HIP runtime version: N/A
MIOpen runtime version: N/A

Versions of relevant libraries:
[pip3] mypy-extensions==0.4.3
[pip3] numpy==1.21.2
[pip3] torch==1.10.2
[pip3] torch-scatter==2.0.9
[pip3] torchaudio==0.10.2
[pip3] torchtext==0.11.2
[pip3] torchvision==0.11.3
[conda] blas                      1.0                         mkl
[conda] cudatoolkit               11.3.1               h2bc3f7f_2
[conda] ffmpeg                    4.3                  hf484d3e_0    pytorch
[conda] mkl                       2021.4.0           h06a4308_640
[conda] mkl-service               2.4.0            py38h7f8727e_0
[conda] mkl_fft                   1.3.1            py38hd3c417c_0
[conda] mkl_random                1.2.2            py38h51133e4_0
[conda] mypy-extensions           0.4.3                    pypi_0    pypi
[conda] numpy                     1.20.3                   pypi_0    pypi
[conda] numpy-base                1.21.2           py38h79a1101_0
[conda] pytorch                   1.10.2          py3.8_cuda11.3_cudnn8.2.0_0    pytorch
[conda] pytorch-mutex             1.0                        cuda    pytorch
[conda] torch-scatter             2.0.9                    pypi_0    pypi
[conda] torchaudio                0.10.2               py38_cu113    pytorch
[conda] torchtext                 0.11.2                     py38    pytorch
[conda] torchvision               0.11.3               py38_cu113    pytorch
@ptrblck
Copy link
Contributor

ptrblck commented Mar 18, 2022

I cannot reproduce the issue in Ubuntu20.04 with CUDA11.3 (same runtime as used for the conda binaries) using the provided commands.

Also I have just moved from ubuntu 20.04 to 21.10

Since Ubuntu 21.10 ships with GCC11, you might be hitting issues such as this one.
Based on your setup information you are using CUDA11.5.1, which supports Ubuntu 18.04 and 20.04 as described here, and I would stick to the supported OS releases.

@stas00
Copy link
Contributor Author

stas00 commented Mar 18, 2022

Yes, I didn't have that issue with Ubuntu 20.04. Thank you very much for identifying the underlying issue, @ptrblck!

And yes, I used the official 20.04 cuda deb files to install on ubuntu 21.10 since that's the highest .deb files available.

But surprisingly it works just fine with pt-1.11, so it's OK - I just can't reproduce an issue for someone which required me to use pt-1.10.

Ubuntu 22.04 should be out in a month or so, so I guess I can re-test once NVIDIA releases cuda .deb files for it.

So I will close this for now.

@stas00 stas00 closed this as completed Mar 18, 2022
@stas00
Copy link
Contributor Author

stas00 commented Mar 18, 2022

so no, my pt-1.11 apex build just happened to work because I built it before the upgrade to Ubuntu-21.10. Now I tried to rebuild it and it failed the same way as in the OP.

So here is the proper solution for those on Ubuntu 21.10 which comes with gcc/g++-11.

You need to install gcc/g++-10 and switch to this version:

sudo apt -y install gcc-10 g++-10
sudo apt -y install gcc-11 g++-11
sudo update-alternatives --install /usr/bin/gcc gcc /usr/bin/gcc-10 10
sudo update-alternatives --install /usr/bin/g++ g++ /usr/bin/g++-10 10
sudo update-alternatives --install /usr/bin/gcc gcc /usr/bin/gcc-11 11
sudo update-alternatives --install /usr/bin/g++ g++ /usr/bin/g++-11 11
# now choose the default gcc / g++ to be version 10 when prompted
sudo update-alternatives --config gcc
sudo update-alternatives --config g++
# test
gcc --version
g++ --version

Derived from this guide

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants