Install error when compile the lib #214

Lausannen · 2019-03-21T12:11:24Z

Hi, when I try to build the newest version apex, I met some error like the following info.
" python -u -c "import setuptools, tokenize;file='/tmp/pip-req-build-hq7t6roo/setup.py';f=getattr(tokenize, 'open', open)(file);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, file, 'exec'))" --cpp_ext --cuda_ext install --record /tmp/pip-record-ic8t29gs/install-record.txt --single-version-externally-managed --compile" failed with error code 1 in /tmp/pip-req-build-hq7t6roo/ "
I make sure that I follow the readme.md but the error could not be solved. Can you give me some suggestions about how to handle it? Thank you very much!

Lausannen · 2019-03-21T16:16:58Z

I think I have found the problem. It caused by the wrong CUDA version since my server has multi-CUDA versions. So when I changed CUDA path in .bashrc, apex could be compiled.

mcarilli · 2019-03-21T23:01:54Z

I'm currently adding logic to the setup.py that will print a warning if the version of Cuda that's being used to compile the extensions is different from the version of Cuda that was used to compile the Pytorch binaries present on your system, which should help catch cases like this.

moskomule · 2019-03-25T03:51:01Z

Hi, I have probably the same problem with you...

...
Installing collected packages: apex
  Running setup.py install for apex ... error
    Complete output from command /opt/.miniconda/bin/python -u -c "import setuptools, tokenize;__file__='/tmp/pip-req-build-b0pvvy97/setup.py';f=getattr(tokenize, 'open', open)(__file__);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, __file__, 'exec'))" --cpp_ext --cuda_ext install --record /tmp/pip-record-4os3snlg/install-record.txt --single-version-externally-managed --compile:
    torch.__version__  =  1.0.1.post2

    Compiling cuda extensions with
    nvcc: NVIDIA (R) Cuda compiler driver
    Copyright (c) 2005-2018 NVIDIA Corporation
    Built on Sat_Aug_25_21:08:01_CDT_2018
    Cuda compilation tools, release 10.0, V10.0.130
    from /usr/local/cuda/bin

    Pytorch binaries were compiled with Cuda 10.0.130

    running install
    running build
    running build_py
    copying apex/__init__.py -> build/lib.linux-x86_64-3.7/apex
    copying apex/parallel/sync_batchnorm.py -> build/lib.linux-x86_64-3.7/apex/parallel
    copying apex/parallel/__init__.py -> build/lib.linux-x86_64-3.7/apex/parallel
    copying apex/parallel/optimized_sync_batchnorm_kernel.py -> build/lib.linux-x86_64-3.7/apex/parallel
    copying apex/parallel/optimized_sync_batchnorm.py -> build/lib.linux-x86_64-3.7/apex/parallel
    copying apex/parallel/distributed.py -> build/lib.linux-x86_64-3.7/apex/parallel
    copying apex/amp/_amp_state.py -> build/lib.linux-x86_64-3.7/apex/amp
    copying apex/amp/handle.py -> build/lib.linux-x86_64-3.7/apex/amp
    copying apex/amp/frontend.py -> build/lib.linux-x86_64-3.7/apex/amp
    copying apex/amp/__init__.py -> build/lib.linux-x86_64-3.7/apex/amp
    copying apex/amp/scaler.py -> build/lib.linux-x86_64-3.7/apex/amp
    copying apex/amp/utils.py -> build/lib.linux-x86_64-3.7/apex/amp
    copying apex/amp/wrap.py -> build/lib.linux-x86_64-3.7/apex/amp
    copying apex/amp/rnn_compat.py -> build/lib.linux-x86_64-3.7/apex/amp
    copying apex/amp/_initialize.py -> build/lib.linux-x86_64-3.7/apex/amp
    copying apex/amp/amp.py -> build/lib.linux-x86_64-3.7/apex/amp
    copying apex/amp/opt.py -> build/lib.linux-x86_64-3.7/apex/amp
    copying apex/fp16_utils/__init__.py -> build/lib.linux-x86_64-3.7/apex/fp16_utils
    copying apex/fp16_utils/fp16util.py -> build/lib.linux-x86_64-3.7/apex/fp16_utils
    copying apex/fp16_utils/fp16_optimizer.py -> build/lib.linux-x86_64-3.7/apex/fp16_utils
    creating build/lib.linux-x86_64-3.7/apex/multi_tensor_apply
    copying apex/multi_tensor_apply/__init__.py -> build/lib.linux-x86_64-3.7/apex/multi_tensor_apply
    copying apex/multi_tensor_apply/multi_tensor_apply.py -> build/lib.linux-x86_64-3.7/apex/multi_tensor_apply
    copying apex/normalization/fused_layer_norm.py -> build/lib.linux-x86_64-3.7/apex/normalization
    copying apex/optimizers/fused_adam.py -> build/lib.linux-x86_64-3.7/apex/optimizers
    copying apex/optimizers/fp16_optimizer.py -> build/lib.linux-x86_64-3.7/apex/optimizers
    copying apex/amp/lists/torch_overrides.py -> build/lib.linux-x86_64-3.7/apex/amp/lists
    running build_ext
    building 'amp_C' extension
    gcc -pthread -B /opt/.miniconda/compiler_compat -Wl,--sysroot=/ -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -I/opt/.miniconda/lib/python3.7/site-packages/torch/lib/include -I/opt/.miniconda/lib/python3.7/site-packages/torch/lib/include/torch/csrc/api/include -I/opt/.miniconda/lib/python3.7/site-packages/torch/lib/include/TH -I/opt/.miniconda/lib/python3.7/site-packages/torch/lib/include/THC -I/usr/local/cuda/include -I/opt/.miniconda/include/python3.7m -c csrc/amp_C_frontend.cpp -o build/temp.linux-x86_64-3.7/csrc/amp_C_frontend.o -O3 -DTORCH_API_INCLUDE_EXTENSION_H -DTORCH_EXTENSION_NAME=amp_C -D_GLIBCXX_USE_CXX11_ABI=0 -std=c++11
    cc1plus: warning: command line option ‘-Wstrict-prototypes’ is valid for C/ObjC but not for C++
    /usr/local/cuda/bin/nvcc -I/opt/.miniconda/lib/python3.7/site-packages/torch/lib/include -I/opt/.miniconda/lib/python3.7/site-packages/torch/lib/include/torch/csrc/api/include -I/opt/.miniconda/lib/python3.7/site-packages/torch/lib/include/TH -I/opt/.miniconda/lib/python3.7/site-packages/torch/lib/include/THC -I/usr/local/cuda/include -I/opt/.miniconda/include/python3.7m -c csrc/multi_tensor_scale_kernel.cu -o build/temp.linux-x86_64-3.7/csrc/multi_tensor_scale_kernel.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --compiler-options '-fPIC' -lineinfo -O3 --use_fast_math -DTORCH_API_INCLUDE_EXTENSION_H -DTORCH_EXTENSION_NAME=amp_C -D_GLIBCXX_USE_CXX11_ABI=0 -std=c++11
    /usr/local/cuda/bin/nvcc -I/opt/.miniconda/lib/python3.7/site-packages/torch/lib/include -I/opt/.miniconda/lib/python3.7/site-packages/torch/lib/include/torch/csrc/api/include -I/opt/.miniconda/lib/python3.7/site-packages/torch/lib/include/TH -I/opt/.miniconda/lib/python3.7/site-packages/torch/lib/include/THC -I/usr/local/cuda/include -I/opt/.miniconda/include/python3.7m -c csrc/multi_tensor_axpby_kernel.cu -o build/temp.linux-x86_64-3.7/csrc/multi_tensor_axpby_kernel.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --compiler-options '-fPIC' -lineinfo -O3 --use_fast_math -DTORCH_API_INCLUDE_EXTENSION_H -DTORCH_EXTENSION_NAME=amp_C -D_GLIBCXX_USE_CXX11_ABI=0 -std=c++11
    g++ -pthread -shared -B /opt/.miniconda/compiler_compat -L/opt/.miniconda/lib -Wl,-rpath=/opt/.miniconda/lib -Wl,--no-as-needed -Wl,--sysroot=/ build/temp.linux-x86_64-3.7/csrc/amp_C_frontend.o build/temp.linux-x86_64-3.7/csrc/multi_tensor_scale_kernel.o build/temp.linux-x86_64-3.7/csrc/multi_tensor_axpby_kernel.o -L/usr/local/cuda/lib64 -lcudart -o build/lib.linux-x86_64-3.7/amp_C.cpython-37m-x86_64-linux-gnu.so
    /opt/.miniconda/compiler_compat/ld: build/temp.linux-x86_64-3.7/csrc/amp_C_frontend.o: unable to initialize decompress status for section .debug_info
    /opt/.miniconda/compiler_compat/ld: build/temp.linux-x86_64-3.7/csrc/amp_C_frontend.o: unable to initialize decompress status for section .debug_info
    /opt/.miniconda/compiler_compat/ld: build/temp.linux-x86_64-3.7/csrc/amp_C_frontend.o: unable to initialize decompress status for section .debug_info
    /opt/.miniconda/compiler_compat/ld: build/temp.linux-x86_64-3.7/csrc/amp_C_frontend.o: unable to initialize decompress status for section .debug_info
    build/temp.linux-x86_64-3.7/csrc/amp_C_frontend.o: file not recognized: file format not recognized
    collect2: error: ld returned 1 exit status
    error: command 'g++' failed with exit status 1

    ----------------------------------------
Command "/opt/.miniconda/bin/python -u -c "import setuptools, tokenize;__file__='/tmp/pip-req-build-b0pvvy97/setup.py';f=getattr(tokenize, 'open', open)(__file__);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, __file__, 'exec'))" --cpp_ext --cuda_ext install --record /tmp/pip-record-4os3snlg/install-record.txt --single-version-externally-managed --compile" failed with error code 1 in /tmp/pip-req-build-b0pvvy97/

moskomule · 2019-03-25T04:52:44Z

Update: Without using tmux, I could install apex.

It works with AMP but warns as Warning: multi_tensor_applier fused unscale kernel is unavailable, possibly because apex was installed without --cuda_ext --cpp_ext. Using Python fallback. Original ImportError was: ImportError('/opt/.miniconda/lib/python3.7/site-packages/amp_C.cpython-37m-x86_64-linux-gnu.so: undefined symbol: __cudaPopCallConfiguration').

Lausannen · 2019-03-25T07:02:59Z

@moskomule I think install apex with --cuda_ext --cpp_ext is necessary, I guess this problem is related with your cuda setting. Since in my case, firstly I check my path using "nvcc -V", it is CUDA-9.0 but I found the link in ~/.bashrc is invalid. Maybe you should check this.

moskomule · 2019-03-25T09:07:09Z

Thanks, in case of the warning above, I used --cuda_ext --cpp_ext to install and the installation itself seemed to finish successfully. But when running AMP, the warning above appeared.

mcarilli · 2019-03-25T22:19:50Z

@moskomule You should make sure to use the pip install command

$ pip install -v --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" .

instead of

$ python setup.py install --cpp_ext --cuda_ext

Also, before reinstalling Apex, you need to make sure any old conflicting installs are removed, and if you installed using the direct setup.py command, you also need to make sure stale apex/build and apex.egg-info are removed. Try

$ pip uninstall apex
$ pip uninstall apex (repeat until you're sure it's uninstalled...)
$ cd apex
$ rm -rf build
$ rm -rf apex.egg-info
$ pip install -v --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" .

moskomule · 2019-04-01T05:30:32Z

Thank you. So far, I found it fails to build on Ubuntu18.04 but success on Ubuntu16.04.

DangerousY · 2019-09-26T08:13:01Z

@moskomule You should make sure to use the pip install command
$ pip install -v --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" .
instead of
$ python setup.py install --cpp_ext --cuda_ext
Also, before reinstalling Apex, you need to make sure any old conflicting installs are removed, and if you installed using the direct setup.py command, you also need to make sure stale apex/build and apex.egg-info are removed. Try
$ pip uninstall apex
$ pip uninstall apex (repeat until you're sure it's uninstalled...)
$ cd apex
$ rm -rf build
$ rm -rf apex.egg-info
$ pip install -v --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" .

I meet with this problem
ERROR: Command errored out with exit status 1: /home/zyx/anaconda3/envs/maskrcnn_benchmark/bin/python -u -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'/tmp/pip-req-build-i7tiph4m/setup.py'"'"'; file='"'"'/tmp/pip-req-build-i7tiph4m/setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(file);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, file, '"'"'exec'"'"'))' --cpp_ext --cuda_ext install --record /tmp/pip-record-4x7z98cw/install-record.txt --single-version-externally-managed --compile Check the logs for full command output.

ptrblck · 2019-09-26T10:34:06Z

@DangerousY Could you please post the complete stack trace so that we could have a look?

chccgiven · 2019-09-30T11:48:47Z

@moskomule You should make sure to use the pip install command
$ pip install -v --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" .
instead of
$ python setup.py install --cpp_ext --cuda_ext
Also, before reinstalling Apex, you need to make sure any old conflicting installs are removed, and if you installed using the direct setup.py command, you also need to make sure stale apex/build and apex.egg-info are removed. Try
$ pip uninstall apex
$ pip uninstall apex (repeat until you're sure it's uninstalled...)
$ cd apex
$ rm -rf build
$ rm -rf apex.egg-info
$ pip install -v --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" .

Hi，I have executed the above command, but the program is reporting following error.
ERROR: You must give at least one requirement to install (see "pip help install")
My Ubuntu version is 18.04, can you help me? Thank you!

ptrblck · 2019-09-30T14:47:48Z

@chccgiven This error is usually thrown, if you forget the folder location at the end of the pip install command (the dot at the end or ./ alternatively).

maschasap · 2020-06-05T19:26:27Z

@ptrblck good afternoon! Try to install apex though always get this error:

/tmp/pip-jp2_qt25-build/setup.py:51: UserWarning: Option --pyprof not specified. Not installing PyProf dependencies!
  warnings.warn("Option --pyprof not specified. Not installing PyProf dependencies!")

Compiling cuda extensions with
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2019 NVIDIA Corporation
Built on Sun_Jul_28_19:07:16_PDT_2019
Cuda compilation tools, release 10.1, V10.1.243
from /usr/local/cuda/bin

Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/tmp/pip-jp2_qt25-build/setup.py", line 130, in <module>
    check_cuda_torch_binary_vs_bare_metal(torch.utils.cpp_extension.CUDA_HOME)
  File "/tmp/pip-jp2_qt25-build/setup.py", line 85, in check_cuda_torch_binary_vs_bare_metal
    "https://github.com/NVIDIA/apex/pull/323#discussion_r287021798.  "
RuntimeError: Cuda extensions are being compiled with a version of Cuda that does not match the version used to compile Pytorch binaries.  Pytorch binaries were compiled with Cuda 10.2.
In some cases, a minor-version mismatch will not cause later errors:  https://github.com/NVIDIA/apex/pull/323#discussion_r287021798.  You can try commenting out this check (at your own risk).

error
Cleaning up...
Removing source in /tmp/pip-jp2_qt25-build
Command "/raid/akim/myenv/bin/python3.6 -u -c "import setuptools, tokenize;file='/tmp/pip-jp2_qt25-build/setup.py';f=getattr(tokenize, 'open', open)(file);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, file, 'exec'))" --cpp_ext --cuda_ext install --record /tmp/pip-ylohle8g-record/install-record.txt --single-version-externally-managed --compile --install-headers /raid/akim/myenv/include/site/python3.6/apex" failed with error code 1 in /tmp/pip-jp2_qt25-build/
Exception information:
Traceback (most recent call last):
File "/raid/akim/myenv/lib/python3.6/site-packages/pip/basecommand.py", line 215, in main
status = self.run(options, args)
File "/raid/akim/myenv/lib/python3.6/site-packages/pip/commands/install.py", line 360, in run
prefix=options.prefix_path,
File "/raid/akim/myenv/lib/python3.6/site-packages/pip/req/req_set.py", line 784, in install
**kwargs
File "/raid/akim/myenv/lib/python3.6/site-packages/pip/req/req_install.py", line 878, in install
spinner=spinner,
File "/raid/akim/myenv/lib/python3.6/site-packages/pip/utils/init.py", line 725, in call_subprocess
% (command_desc, proc.returncode, cwd))
pip.exceptions.InstallationError: Command "/raid/akim/myenv/bin/python3.6 -u -c "import setuptools, tokenize;file='/tmp/pip-jp2_qt25-build/setup.py';f=getattr(tokenize, 'open', open)(file);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, file, 'exec'))" --cpp_ext --cuda_ext install --record /tmp/pip-ylohle8g-record/install-record.txt --single-version-externally-managed --compile --install-headers /raid/akim/myenv/include/site/python3.6/apex" failed with error code 1 in /tmp/pip-jp2_qt25-build/

Do you know what the issue may be? Thanks in advance!

Lausannen closed this as completed Mar 21, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Install error when compile the lib #214

Install error when compile the lib #214

Lausannen commented Mar 21, 2019

Lausannen commented Mar 21, 2019

mcarilli commented Mar 21, 2019

moskomule commented Mar 25, 2019 •

edited

moskomule commented Mar 25, 2019

Lausannen commented Mar 25, 2019

moskomule commented Mar 25, 2019 •

edited

mcarilli commented Mar 25, 2019 •

edited

moskomule commented Apr 1, 2019

DangerousY commented Sep 26, 2019

ptrblck commented Sep 26, 2019

chccgiven commented Sep 30, 2019

ptrblck commented Sep 30, 2019

maschasap commented Jun 5, 2020 •

edited

Install error when compile the lib #214

Install error when compile the lib #214

Comments

Lausannen commented Mar 21, 2019

Lausannen commented Mar 21, 2019

mcarilli commented Mar 21, 2019

moskomule commented Mar 25, 2019 • edited

moskomule commented Mar 25, 2019

Lausannen commented Mar 25, 2019

moskomule commented Mar 25, 2019 • edited

mcarilli commented Mar 25, 2019 • edited

moskomule commented Apr 1, 2019

DangerousY commented Sep 26, 2019

ptrblck commented Sep 26, 2019

chccgiven commented Sep 30, 2019

ptrblck commented Sep 30, 2019

maschasap commented Jun 5, 2020 • edited

moskomule commented Mar 25, 2019 •

edited

moskomule commented Mar 25, 2019 •

edited

mcarilli commented Mar 25, 2019 •

edited

maschasap commented Jun 5, 2020 •

edited