New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Install error when compile the lib #214
Comments
I think I have found the problem. It caused by the wrong CUDA version since my server has multi-CUDA versions. So when I changed CUDA path in .bashrc, apex could be compiled. |
I'm currently adding logic to the setup.py that will print a warning if the version of Cuda that's being used to compile the extensions is different from the version of Cuda that was used to compile the Pytorch binaries present on your system, which should help catch cases like this. |
Hi, I have probably the same problem with you... ...
Installing collected packages: apex
Running setup.py install for apex ... error
Complete output from command /opt/.miniconda/bin/python -u -c "import setuptools, tokenize;__file__='/tmp/pip-req-build-b0pvvy97/setup.py';f=getattr(tokenize, 'open', open)(__file__);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, __file__, 'exec'))" --cpp_ext --cuda_ext install --record /tmp/pip-record-4os3snlg/install-record.txt --single-version-externally-managed --compile:
torch.__version__ = 1.0.1.post2
Compiling cuda extensions with
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2018 NVIDIA Corporation
Built on Sat_Aug_25_21:08:01_CDT_2018
Cuda compilation tools, release 10.0, V10.0.130
from /usr/local/cuda/bin
Pytorch binaries were compiled with Cuda 10.0.130
running install
running build
running build_py
copying apex/__init__.py -> build/lib.linux-x86_64-3.7/apex
copying apex/parallel/sync_batchnorm.py -> build/lib.linux-x86_64-3.7/apex/parallel
copying apex/parallel/__init__.py -> build/lib.linux-x86_64-3.7/apex/parallel
copying apex/parallel/optimized_sync_batchnorm_kernel.py -> build/lib.linux-x86_64-3.7/apex/parallel
copying apex/parallel/optimized_sync_batchnorm.py -> build/lib.linux-x86_64-3.7/apex/parallel
copying apex/parallel/distributed.py -> build/lib.linux-x86_64-3.7/apex/parallel
copying apex/amp/_amp_state.py -> build/lib.linux-x86_64-3.7/apex/amp
copying apex/amp/handle.py -> build/lib.linux-x86_64-3.7/apex/amp
copying apex/amp/frontend.py -> build/lib.linux-x86_64-3.7/apex/amp
copying apex/amp/__init__.py -> build/lib.linux-x86_64-3.7/apex/amp
copying apex/amp/scaler.py -> build/lib.linux-x86_64-3.7/apex/amp
copying apex/amp/utils.py -> build/lib.linux-x86_64-3.7/apex/amp
copying apex/amp/wrap.py -> build/lib.linux-x86_64-3.7/apex/amp
copying apex/amp/rnn_compat.py -> build/lib.linux-x86_64-3.7/apex/amp
copying apex/amp/_initialize.py -> build/lib.linux-x86_64-3.7/apex/amp
copying apex/amp/amp.py -> build/lib.linux-x86_64-3.7/apex/amp
copying apex/amp/opt.py -> build/lib.linux-x86_64-3.7/apex/amp
copying apex/fp16_utils/__init__.py -> build/lib.linux-x86_64-3.7/apex/fp16_utils
copying apex/fp16_utils/fp16util.py -> build/lib.linux-x86_64-3.7/apex/fp16_utils
copying apex/fp16_utils/fp16_optimizer.py -> build/lib.linux-x86_64-3.7/apex/fp16_utils
creating build/lib.linux-x86_64-3.7/apex/multi_tensor_apply
copying apex/multi_tensor_apply/__init__.py -> build/lib.linux-x86_64-3.7/apex/multi_tensor_apply
copying apex/multi_tensor_apply/multi_tensor_apply.py -> build/lib.linux-x86_64-3.7/apex/multi_tensor_apply
copying apex/normalization/fused_layer_norm.py -> build/lib.linux-x86_64-3.7/apex/normalization
copying apex/optimizers/fused_adam.py -> build/lib.linux-x86_64-3.7/apex/optimizers
copying apex/optimizers/fp16_optimizer.py -> build/lib.linux-x86_64-3.7/apex/optimizers
copying apex/amp/lists/torch_overrides.py -> build/lib.linux-x86_64-3.7/apex/amp/lists
running build_ext
building 'amp_C' extension
gcc -pthread -B /opt/.miniconda/compiler_compat -Wl,--sysroot=/ -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -I/opt/.miniconda/lib/python3.7/site-packages/torch/lib/include -I/opt/.miniconda/lib/python3.7/site-packages/torch/lib/include/torch/csrc/api/include -I/opt/.miniconda/lib/python3.7/site-packages/torch/lib/include/TH -I/opt/.miniconda/lib/python3.7/site-packages/torch/lib/include/THC -I/usr/local/cuda/include -I/opt/.miniconda/include/python3.7m -c csrc/amp_C_frontend.cpp -o build/temp.linux-x86_64-3.7/csrc/amp_C_frontend.o -O3 -DTORCH_API_INCLUDE_EXTENSION_H -DTORCH_EXTENSION_NAME=amp_C -D_GLIBCXX_USE_CXX11_ABI=0 -std=c++11
cc1plus: warning: command line option ‘-Wstrict-prototypes’ is valid for C/ObjC but not for C++
/usr/local/cuda/bin/nvcc -I/opt/.miniconda/lib/python3.7/site-packages/torch/lib/include -I/opt/.miniconda/lib/python3.7/site-packages/torch/lib/include/torch/csrc/api/include -I/opt/.miniconda/lib/python3.7/site-packages/torch/lib/include/TH -I/opt/.miniconda/lib/python3.7/site-packages/torch/lib/include/THC -I/usr/local/cuda/include -I/opt/.miniconda/include/python3.7m -c csrc/multi_tensor_scale_kernel.cu -o build/temp.linux-x86_64-3.7/csrc/multi_tensor_scale_kernel.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --compiler-options '-fPIC' -lineinfo -O3 --use_fast_math -DTORCH_API_INCLUDE_EXTENSION_H -DTORCH_EXTENSION_NAME=amp_C -D_GLIBCXX_USE_CXX11_ABI=0 -std=c++11
/usr/local/cuda/bin/nvcc -I/opt/.miniconda/lib/python3.7/site-packages/torch/lib/include -I/opt/.miniconda/lib/python3.7/site-packages/torch/lib/include/torch/csrc/api/include -I/opt/.miniconda/lib/python3.7/site-packages/torch/lib/include/TH -I/opt/.miniconda/lib/python3.7/site-packages/torch/lib/include/THC -I/usr/local/cuda/include -I/opt/.miniconda/include/python3.7m -c csrc/multi_tensor_axpby_kernel.cu -o build/temp.linux-x86_64-3.7/csrc/multi_tensor_axpby_kernel.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --compiler-options '-fPIC' -lineinfo -O3 --use_fast_math -DTORCH_API_INCLUDE_EXTENSION_H -DTORCH_EXTENSION_NAME=amp_C -D_GLIBCXX_USE_CXX11_ABI=0 -std=c++11
g++ -pthread -shared -B /opt/.miniconda/compiler_compat -L/opt/.miniconda/lib -Wl,-rpath=/opt/.miniconda/lib -Wl,--no-as-needed -Wl,--sysroot=/ build/temp.linux-x86_64-3.7/csrc/amp_C_frontend.o build/temp.linux-x86_64-3.7/csrc/multi_tensor_scale_kernel.o build/temp.linux-x86_64-3.7/csrc/multi_tensor_axpby_kernel.o -L/usr/local/cuda/lib64 -lcudart -o build/lib.linux-x86_64-3.7/amp_C.cpython-37m-x86_64-linux-gnu.so
/opt/.miniconda/compiler_compat/ld: build/temp.linux-x86_64-3.7/csrc/amp_C_frontend.o: unable to initialize decompress status for section .debug_info
/opt/.miniconda/compiler_compat/ld: build/temp.linux-x86_64-3.7/csrc/amp_C_frontend.o: unable to initialize decompress status for section .debug_info
/opt/.miniconda/compiler_compat/ld: build/temp.linux-x86_64-3.7/csrc/amp_C_frontend.o: unable to initialize decompress status for section .debug_info
/opt/.miniconda/compiler_compat/ld: build/temp.linux-x86_64-3.7/csrc/amp_C_frontend.o: unable to initialize decompress status for section .debug_info
build/temp.linux-x86_64-3.7/csrc/amp_C_frontend.o: file not recognized: file format not recognized
collect2: error: ld returned 1 exit status
error: command 'g++' failed with exit status 1
----------------------------------------
Command "/opt/.miniconda/bin/python -u -c "import setuptools, tokenize;__file__='/tmp/pip-req-build-b0pvvy97/setup.py';f=getattr(tokenize, 'open', open)(__file__);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, __file__, 'exec'))" --cpp_ext --cuda_ext install --record /tmp/pip-record-4os3snlg/install-record.txt --single-version-externally-managed --compile" failed with error code 1 in /tmp/pip-req-build-b0pvvy97/
|
Update: Without using It works with AMP but warns as |
@moskomule I think install apex with --cuda_ext --cpp_ext is necessary, I guess this problem is related with your cuda setting. Since in my case, firstly I check my path using "nvcc -V", it is CUDA-9.0 but I found the link in ~/.bashrc is invalid. Maybe you should check this. |
Thanks, in case of the warning above, I used |
@moskomule You should make sure to use the pip install command
instead of
Also, before reinstalling Apex, you need to make sure any old conflicting installs are removed, and if you installed using the direct setup.py command, you also need to make sure stale
|
Thank you. So far, I found it fails to build on Ubuntu18.04 but success on Ubuntu16.04. |
I meet with this problem |
@DangerousY Could you please post the complete stack trace so that we could have a look? |
Hi,I have executed the above command, but the program is reporting following error. |
@chccgiven This error is usually thrown, if you forget the folder location at the end of the |
@ptrblck good afternoon! Try to install apex though always get this error:
error Do you know what the issue may be? Thanks in advance! |
Hi, when I try to build the newest version apex, I met some error like the following info.
" python -u -c "import setuptools, tokenize;file='/tmp/pip-req-build-hq7t6roo/setup.py';f=getattr(tokenize, 'open', open)(file);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, file, 'exec'))" --cpp_ext --cuda_ext install --record /tmp/pip-record-ic8t29gs/install-record.txt --single-version-externally-managed --compile" failed with error code 1 in /tmp/pip-req-build-hq7t6roo/ "
I make sure that I follow the readme.md but the error could not be solved. Can you give me some suggestions about how to handle it? Thank you very much!
The text was updated successfully, but these errors were encountered: