Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Install error when compile the lib #214

Closed
Lausannen opened this issue Mar 21, 2019 · 13 comments
Closed

Install error when compile the lib #214

Lausannen opened this issue Mar 21, 2019 · 13 comments

Comments

@Lausannen
Copy link

Hi, when I try to build the newest version apex, I met some error like the following info.
" python -u -c "import setuptools, tokenize;file='/tmp/pip-req-build-hq7t6roo/setup.py';f=getattr(tokenize, 'open', open)(file);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, file, 'exec'))" --cpp_ext --cuda_ext install --record /tmp/pip-record-ic8t29gs/install-record.txt --single-version-externally-managed --compile" failed with error code 1 in /tmp/pip-req-build-hq7t6roo/ "
I make sure that I follow the readme.md but the error could not be solved. Can you give me some suggestions about how to handle it? Thank you very much!

@Lausannen
Copy link
Author

I think I have found the problem. It caused by the wrong CUDA version since my server has multi-CUDA versions. So when I changed CUDA path in .bashrc, apex could be compiled.

@mcarilli
Copy link
Contributor

I'm currently adding logic to the setup.py that will print a warning if the version of Cuda that's being used to compile the extensions is different from the version of Cuda that was used to compile the Pytorch binaries present on your system, which should help catch cases like this.

@moskomule
Copy link

moskomule commented Mar 25, 2019

Hi, I have probably the same problem with you...

...
Installing collected packages: apex
  Running setup.py install for apex ... error
    Complete output from command /opt/.miniconda/bin/python -u -c "import setuptools, tokenize;__file__='/tmp/pip-req-build-b0pvvy97/setup.py';f=getattr(tokenize, 'open', open)(__file__);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, __file__, 'exec'))" --cpp_ext --cuda_ext install --record /tmp/pip-record-4os3snlg/install-record.txt --single-version-externally-managed --compile:
    torch.__version__  =  1.0.1.post2

    Compiling cuda extensions with
    nvcc: NVIDIA (R) Cuda compiler driver
    Copyright (c) 2005-2018 NVIDIA Corporation
    Built on Sat_Aug_25_21:08:01_CDT_2018
    Cuda compilation tools, release 10.0, V10.0.130
    from /usr/local/cuda/bin

    Pytorch binaries were compiled with Cuda 10.0.130

    running install
    running build
    running build_py
    copying apex/__init__.py -> build/lib.linux-x86_64-3.7/apex
    copying apex/parallel/sync_batchnorm.py -> build/lib.linux-x86_64-3.7/apex/parallel
    copying apex/parallel/__init__.py -> build/lib.linux-x86_64-3.7/apex/parallel
    copying apex/parallel/optimized_sync_batchnorm_kernel.py -> build/lib.linux-x86_64-3.7/apex/parallel
    copying apex/parallel/optimized_sync_batchnorm.py -> build/lib.linux-x86_64-3.7/apex/parallel
    copying apex/parallel/distributed.py -> build/lib.linux-x86_64-3.7/apex/parallel
    copying apex/amp/_amp_state.py -> build/lib.linux-x86_64-3.7/apex/amp
    copying apex/amp/handle.py -> build/lib.linux-x86_64-3.7/apex/amp
    copying apex/amp/frontend.py -> build/lib.linux-x86_64-3.7/apex/amp
    copying apex/amp/__init__.py -> build/lib.linux-x86_64-3.7/apex/amp
    copying apex/amp/scaler.py -> build/lib.linux-x86_64-3.7/apex/amp
    copying apex/amp/utils.py -> build/lib.linux-x86_64-3.7/apex/amp
    copying apex/amp/wrap.py -> build/lib.linux-x86_64-3.7/apex/amp
    copying apex/amp/rnn_compat.py -> build/lib.linux-x86_64-3.7/apex/amp
    copying apex/amp/_initialize.py -> build/lib.linux-x86_64-3.7/apex/amp
    copying apex/amp/amp.py -> build/lib.linux-x86_64-3.7/apex/amp
    copying apex/amp/opt.py -> build/lib.linux-x86_64-3.7/apex/amp
    copying apex/fp16_utils/__init__.py -> build/lib.linux-x86_64-3.7/apex/fp16_utils
    copying apex/fp16_utils/fp16util.py -> build/lib.linux-x86_64-3.7/apex/fp16_utils
    copying apex/fp16_utils/fp16_optimizer.py -> build/lib.linux-x86_64-3.7/apex/fp16_utils
    creating build/lib.linux-x86_64-3.7/apex/multi_tensor_apply
    copying apex/multi_tensor_apply/__init__.py -> build/lib.linux-x86_64-3.7/apex/multi_tensor_apply
    copying apex/multi_tensor_apply/multi_tensor_apply.py -> build/lib.linux-x86_64-3.7/apex/multi_tensor_apply
    copying apex/normalization/fused_layer_norm.py -> build/lib.linux-x86_64-3.7/apex/normalization
    copying apex/optimizers/fused_adam.py -> build/lib.linux-x86_64-3.7/apex/optimizers
    copying apex/optimizers/fp16_optimizer.py -> build/lib.linux-x86_64-3.7/apex/optimizers
    copying apex/amp/lists/torch_overrides.py -> build/lib.linux-x86_64-3.7/apex/amp/lists
    running build_ext
    building 'amp_C' extension
    gcc -pthread -B /opt/.miniconda/compiler_compat -Wl,--sysroot=/ -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -I/opt/.miniconda/lib/python3.7/site-packages/torch/lib/include -I/opt/.miniconda/lib/python3.7/site-packages/torch/lib/include/torch/csrc/api/include -I/opt/.miniconda/lib/python3.7/site-packages/torch/lib/include/TH -I/opt/.miniconda/lib/python3.7/site-packages/torch/lib/include/THC -I/usr/local/cuda/include -I/opt/.miniconda/include/python3.7m -c csrc/amp_C_frontend.cpp -o build/temp.linux-x86_64-3.7/csrc/amp_C_frontend.o -O3 -DTORCH_API_INCLUDE_EXTENSION_H -DTORCH_EXTENSION_NAME=amp_C -D_GLIBCXX_USE_CXX11_ABI=0 -std=c++11
    cc1plus: warning: command line option ‘-Wstrict-prototypes’ is valid for C/ObjC but not for C++
    /usr/local/cuda/bin/nvcc -I/opt/.miniconda/lib/python3.7/site-packages/torch/lib/include -I/opt/.miniconda/lib/python3.7/site-packages/torch/lib/include/torch/csrc/api/include -I/opt/.miniconda/lib/python3.7/site-packages/torch/lib/include/TH -I/opt/.miniconda/lib/python3.7/site-packages/torch/lib/include/THC -I/usr/local/cuda/include -I/opt/.miniconda/include/python3.7m -c csrc/multi_tensor_scale_kernel.cu -o build/temp.linux-x86_64-3.7/csrc/multi_tensor_scale_kernel.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --compiler-options '-fPIC' -lineinfo -O3 --use_fast_math -DTORCH_API_INCLUDE_EXTENSION_H -DTORCH_EXTENSION_NAME=amp_C -D_GLIBCXX_USE_CXX11_ABI=0 -std=c++11
    /usr/local/cuda/bin/nvcc -I/opt/.miniconda/lib/python3.7/site-packages/torch/lib/include -I/opt/.miniconda/lib/python3.7/site-packages/torch/lib/include/torch/csrc/api/include -I/opt/.miniconda/lib/python3.7/site-packages/torch/lib/include/TH -I/opt/.miniconda/lib/python3.7/site-packages/torch/lib/include/THC -I/usr/local/cuda/include -I/opt/.miniconda/include/python3.7m -c csrc/multi_tensor_axpby_kernel.cu -o build/temp.linux-x86_64-3.7/csrc/multi_tensor_axpby_kernel.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --compiler-options '-fPIC' -lineinfo -O3 --use_fast_math -DTORCH_API_INCLUDE_EXTENSION_H -DTORCH_EXTENSION_NAME=amp_C -D_GLIBCXX_USE_CXX11_ABI=0 -std=c++11
    g++ -pthread -shared -B /opt/.miniconda/compiler_compat -L/opt/.miniconda/lib -Wl,-rpath=/opt/.miniconda/lib -Wl,--no-as-needed -Wl,--sysroot=/ build/temp.linux-x86_64-3.7/csrc/amp_C_frontend.o build/temp.linux-x86_64-3.7/csrc/multi_tensor_scale_kernel.o build/temp.linux-x86_64-3.7/csrc/multi_tensor_axpby_kernel.o -L/usr/local/cuda/lib64 -lcudart -o build/lib.linux-x86_64-3.7/amp_C.cpython-37m-x86_64-linux-gnu.so
    /opt/.miniconda/compiler_compat/ld: build/temp.linux-x86_64-3.7/csrc/amp_C_frontend.o: unable to initialize decompress status for section .debug_info
    /opt/.miniconda/compiler_compat/ld: build/temp.linux-x86_64-3.7/csrc/amp_C_frontend.o: unable to initialize decompress status for section .debug_info
    /opt/.miniconda/compiler_compat/ld: build/temp.linux-x86_64-3.7/csrc/amp_C_frontend.o: unable to initialize decompress status for section .debug_info
    /opt/.miniconda/compiler_compat/ld: build/temp.linux-x86_64-3.7/csrc/amp_C_frontend.o: unable to initialize decompress status for section .debug_info
    build/temp.linux-x86_64-3.7/csrc/amp_C_frontend.o: file not recognized: file format not recognized
    collect2: error: ld returned 1 exit status
    error: command 'g++' failed with exit status 1

    ----------------------------------------
Command "/opt/.miniconda/bin/python -u -c "import setuptools, tokenize;__file__='/tmp/pip-req-build-b0pvvy97/setup.py';f=getattr(tokenize, 'open', open)(__file__);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, __file__, 'exec'))" --cpp_ext --cuda_ext install --record /tmp/pip-record-4os3snlg/install-record.txt --single-version-externally-managed --compile" failed with error code 1 in /tmp/pip-req-build-b0pvvy97/

@moskomule
Copy link

Update: Without using tmux, I could install apex.

It works with AMP but warns as Warning: multi_tensor_applier fused unscale kernel is unavailable, possibly because apex was installed without --cuda_ext --cpp_ext. Using Python fallback. Original ImportError was: ImportError('/opt/.miniconda/lib/python3.7/site-packages/amp_C.cpython-37m-x86_64-linux-gnu.so: undefined symbol: __cudaPopCallConfiguration').

@Lausannen
Copy link
Author

@moskomule I think install apex with --cuda_ext --cpp_ext is necessary, I guess this problem is related with your cuda setting. Since in my case, firstly I check my path using "nvcc -V", it is CUDA-9.0 but I found the link in ~/.bashrc is invalid. Maybe you should check this.

@moskomule
Copy link

moskomule commented Mar 25, 2019

Thanks, in case of the warning above, I used --cuda_ext --cpp_ext to install and the installation itself seemed to finish successfully. But when running AMP, the warning above appeared.

@mcarilli
Copy link
Contributor

mcarilli commented Mar 25, 2019

@moskomule You should make sure to use the pip install command

$ pip install -v --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" .

instead of

$ python setup.py install --cpp_ext --cuda_ext

Also, before reinstalling Apex, you need to make sure any old conflicting installs are removed, and if you installed using the direct setup.py command, you also need to make sure stale apex/build and apex.egg-info are removed. Try

$ pip uninstall apex
$ pip uninstall apex (repeat until you're sure it's uninstalled...)
$ cd apex
$ rm -rf build
$ rm -rf apex.egg-info
$ pip install -v --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" .

@moskomule
Copy link

Thank you. So far, I found it fails to build on Ubuntu18.04 but success on Ubuntu16.04.

@DangerousY
Copy link

@moskomule You should make sure to use the pip install command

$ pip install -v --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" .

instead of

$ python setup.py install --cpp_ext --cuda_ext

Also, before reinstalling Apex, you need to make sure any old conflicting installs are removed, and if you installed using the direct setup.py command, you also need to make sure stale apex/build and apex.egg-info are removed. Try

$ pip uninstall apex
$ pip uninstall apex (repeat until you're sure it's uninstalled...)
$ cd apex
$ rm -rf build
$ rm -rf apex.egg-info
$ pip install -v --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" .

I meet with this problem
ERROR: Command errored out with exit status 1: /home/zyx/anaconda3/envs/maskrcnn_benchmark/bin/python -u -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'/tmp/pip-req-build-i7tiph4m/setup.py'"'"'; file='"'"'/tmp/pip-req-build-i7tiph4m/setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(file);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, file, '"'"'exec'"'"'))' --cpp_ext --cuda_ext install --record /tmp/pip-record-4x7z98cw/install-record.txt --single-version-externally-managed --compile Check the logs for full command output.

@ptrblck
Copy link
Contributor

ptrblck commented Sep 26, 2019

@DangerousY Could you please post the complete stack trace so that we could have a look?

@chccgiven
Copy link

@moskomule You should make sure to use the pip install command

$ pip install -v --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" .

instead of

$ python setup.py install --cpp_ext --cuda_ext

Also, before reinstalling Apex, you need to make sure any old conflicting installs are removed, and if you installed using the direct setup.py command, you also need to make sure stale apex/build and apex.egg-info are removed. Try

$ pip uninstall apex
$ pip uninstall apex (repeat until you're sure it's uninstalled...)
$ cd apex
$ rm -rf build
$ rm -rf apex.egg-info
$ pip install -v --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" .

Hi,I have executed the above command, but the program is reporting following error.
ERROR: You must give at least one requirement to install (see "pip help install")
My Ubuntu version is 18.04, can you help me? Thank you!

@ptrblck
Copy link
Contributor

ptrblck commented Sep 30, 2019

@chccgiven This error is usually thrown, if you forget the folder location at the end of the pip install command (the dot at the end or ./ alternatively).

@maschasap
Copy link

maschasap commented Jun 5, 2020

@ptrblck good afternoon! Try to install apex though always get this error:

/tmp/pip-jp2_qt25-build/setup.py:51: UserWarning: Option --pyprof not specified. Not installing PyProf dependencies!
  warnings.warn("Option --pyprof not specified. Not installing PyProf dependencies!")

Compiling cuda extensions with
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2019 NVIDIA Corporation
Built on Sun_Jul_28_19:07:16_PDT_2019
Cuda compilation tools, release 10.1, V10.1.243
from /usr/local/cuda/bin

Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/tmp/pip-jp2_qt25-build/setup.py", line 130, in <module>
    check_cuda_torch_binary_vs_bare_metal(torch.utils.cpp_extension.CUDA_HOME)
  File "/tmp/pip-jp2_qt25-build/setup.py", line 85, in check_cuda_torch_binary_vs_bare_metal
    "https://github.com/NVIDIA/apex/pull/323#discussion_r287021798.  "
RuntimeError: Cuda extensions are being compiled with a version of Cuda that does not match the version used to compile Pytorch binaries.  Pytorch binaries were compiled with Cuda 10.2.
In some cases, a minor-version mismatch will not cause later errors:  https://github.com/NVIDIA/apex/pull/323#discussion_r287021798.  You can try commenting out this check (at your own risk).

error
Cleaning up...
Removing source in /tmp/pip-jp2_qt25-build
Command "/raid/akim/myenv/bin/python3.6 -u -c "import setuptools, tokenize;file='/tmp/pip-jp2_qt25-build/setup.py';f=getattr(tokenize, 'open', open)(file);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, file, 'exec'))" --cpp_ext --cuda_ext install --record /tmp/pip-ylohle8g-record/install-record.txt --single-version-externally-managed --compile --install-headers /raid/akim/myenv/include/site/python3.6/apex" failed with error code 1 in /tmp/pip-jp2_qt25-build/
Exception information:
Traceback (most recent call last):
File "/raid/akim/myenv/lib/python3.6/site-packages/pip/basecommand.py", line 215, in main
status = self.run(options, args)
File "/raid/akim/myenv/lib/python3.6/site-packages/pip/commands/install.py", line 360, in run
prefix=options.prefix_path,
File "/raid/akim/myenv/lib/python3.6/site-packages/pip/req/req_set.py", line 784, in install
**kwargs
File "/raid/akim/myenv/lib/python3.6/site-packages/pip/req/req_install.py", line 878, in install
spinner=spinner,
File "/raid/akim/myenv/lib/python3.6/site-packages/pip/utils/init.py", line 725, in call_subprocess
% (command_desc, proc.returncode, cwd))
pip.exceptions.InstallationError: Command "/raid/akim/myenv/bin/python3.6 -u -c "import setuptools, tokenize;file='/tmp/pip-jp2_qt25-build/setup.py';f=getattr(tokenize, 'open', open)(file);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, file, 'exec'))" --cpp_ext --cuda_ext install --record /tmp/pip-ylohle8g-record/install-record.txt --single-version-externally-managed --compile --install-headers /raid/akim/myenv/include/site/python3.6/apex" failed with error code 1 in /tmp/pip-jp2_qt25-build/

Do you know what the issue may be? Thanks in advance!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants