Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cuda extensions are being compiled with a version of Cuda that does not match the version used to compile Pytorch binaries. #550

Closed
antgr opened this issue Oct 19, 2019 · 12 comments

Comments

@antgr
Copy link

antgr commented Oct 19, 2019

NVIDIA-SMI 435.21 Driver Version: 435.21 CUDA Version: 10.1

pip3 install --user -v --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" ./
/usr/lib/python3.7/site-packages/pip/_internal/commands/install.py:217: UserWarning: Disabling all use of wheels due to the use of --build-options / --global-options / --install-options.
cmdoptions.check_install_build_global(options)
Created temporary directory: /tmp/pip-ephem-wheel-cache-lamo85tb
Created temporary directory: /tmp/pip-req-tracker-du8h9cpx
Created requirements tracker '/tmp/pip-req-tracker-du8h9cpx'
Created temporary directory: /tmp/pip-install-wkd21x48
Processing /home/polykratis/mt-dnn/apex
Created temporary directory: /tmp/pip-req-build-j8rxyv0b
Added file:///home/polykratis/mt-dnn/apex to build tracker '/tmp/pip-req-tracker-du8h9cpx'
Running setup.py (path:/tmp/pip-req-build-j8rxyv0b/setup.py) egg_info for package from file:///home/polykratis/mt-dnn/apex
Running command python setup.py egg_info
torch.version = 1.1.0
running egg_info
creating pip-egg-info/apex.egg-info
writing pip-egg-info/apex.egg-info/PKG-INFO
writing dependency_links to pip-egg-info/apex.egg-info/dependency_links.txt
writing top-level names to pip-egg-info/apex.egg-info/top_level.txt
writing manifest file 'pip-egg-info/apex.egg-info/SOURCES.txt'
reading manifest file 'pip-egg-info/apex.egg-info/SOURCES.txt'
writing manifest file 'pip-egg-info/apex.egg-info/SOURCES.txt'
/tmp/pip-req-build-j8rxyv0b/setup.py:43: UserWarning: Option --pyprof not specified. Not installing PyProf dependencies!
warnings.warn("Option --pyprof not specified. Not installing PyProf dependencies!")
Source in /tmp/pip-req-build-j8rxyv0b has version 0.1, which satisfies requirement apex==0.1 from file:///home/polykratis/mt-dnn/apex
Removed apex==0.1 from file:///home/polykratis/mt-dnn/apex from build tracker '/tmp/pip-req-tracker-du8h9cpx'
Installing collected packages: apex
Created temporary directory: /tmp/pip-record-ehkfvhhb
Running setup.py install for apex ... Running command /usr/bin/python3 -u -c "import setuptools, tokenize;file='/tmp/pip-req-build-j8rxyv0b/setup.py';f=getattr(tokenize, 'open', open)(file);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, file, 'exec'))" --cpp_ext --cuda_ext install --record /tmp/pip-record-ehkfvhhb/install-record.txt --single-version-externally-managed --compile --user --prefix=
torch.version = 1.1.0
/tmp/pip-req-build-j8rxyv0b/setup.py:43: UserWarning: Option --pyprof not specified. Not installing PyProf dependencies!
warnings.warn("Option --pyprof not specified. Not installing PyProf dependencies!")

Compiling cuda extensions with
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2018 NVIDIA Corporation
Built on Sat_Aug_25_21:08:01_CDT_2018
Cuda compilation tools, release 10.0, V10.0.130
from /usr/bin

Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/tmp/pip-req-build-j8rxyv0b/setup.py", line 100, in <module>
    check_cuda_torch_binary_vs_bare_metal(torch.utils.cpp_extension.CUDA_HOME)
  File "/tmp/pip-req-build-j8rxyv0b/setup.py", line 77, in check_cuda_torch_binary_vs_bare_metal
    "https://github.com/NVIDIA/apex/pull/323#discussion_r287021798.  "
RuntimeError: Cuda extensions are being compiled with a version of Cuda that does not match the version used to compile Pytorch binaries.  Pytorch binaries were compiled with Cuda 9.0.176.
In some cases, a minor-version mismatch will not cause later errors:  https://github.com/NVIDIA/apex/pull/323#discussion_r287021798.  You can try commenting out this check (at your own risk).

error
Cleaning up...
Removing source in /tmp/pip-req-build-j8rxyv0b
Removed build tracker '/tmp/pip-req-tracker-du8h9cpx'
Command "/usr/bin/python3 -u -c "import setuptools, tokenize;file='/tmp/pip-req-build-j8rxyv0b/setup.py';f=getattr(tokenize, 'open', open)(file);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, file, 'exec'))" --cpp_ext --cuda_ext install --record /tmp/pip-record-ehkfvhhb/install-record.txt --single-version-externally-managed --compile --user --prefix=" failed with error code 1 in /tmp/pip-req-build-j8rxyv0b/
Exception information:
Traceback (most recent call last):
File "/usr/lib/python3.7/site-packages/pip/_internal/cli/base_command.py", line 179, in main
status = self.run(options, args)
File "/usr/lib/python3.7/site-packages/pip/_internal/commands/install.py", line 421, in run
strip_file_prefix=options.strip_file_prefix,
File "/usr/lib/python3.7/site-packages/pip/_internal/req/init.py", line 57, in install_given_reqs
**kwargs
File "/usr/lib/python3.7/site-packages/pip/_internal/req/req_install.py", line 949, in install
spinner=spinner,
File "/usr/lib/python3.7/site-packages/pip/_internal/utils/misc.py", line 771, in call_subprocess
% (command_desc, proc.returncode, cwd))
pip._internal.exceptions.InstallationError: Command "/usr/bin/python3 -u -c "import setuptools, tokenize;file='/tmp/pip-req-build-j8rxyv0b/setup.py';f=getattr(tokenize, 'open', open)(file);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, file, 'exec'))" --cpp_ext --cuda_ext install --record /tmp/pip-record-ehkfvhhb/install-record.txt --single-version-externally-managed --compile --user --prefix=" failed with error code 1 in /tmp/pip-req-build-j8rxyv0b/
1 location(s) to search for versions of pip:

@antgr antgr changed the title uda extensions are being compiled with a version of Cuda that does not match the version used to compile Pytorch binaries. cuda extensions are being compiled with a version of Cuda that does not match the version used to compile Pytorch binaries. Oct 19, 2019
@dhpollack
Copy link

You need to download CUDA 9.0 and then change your CUDA_HOME and PATH environmental variables to that folder. You can do this by downloading the .tar version of cuda and extracting manually if you want to have multiple version of CUDA. Otherwise, you can install pytorch 1.3 which is built against CUDA 10.1.

@antgr
Copy link
Author

antgr commented Oct 21, 2019

Thank you

@antgr antgr closed this as completed Oct 21, 2019
@pedrocolon93
Copy link

In case anyone has this problem, what worked for me was doing pip uninstall torch a few times and reinstalling with conda. It seems I had older versions of pytorch that apex was looking at. After uninstalling 3 times I got that torch is no longer installed and proceeded to install through conda. It worked after this.

@ashesh-0
Copy link

ashesh-0 commented Jun 7, 2020

Does anyone has figured out how to install apex with

CUDA Version 10.0*
torch==1.5.0

It will be immensely helpful if there is a way to install apex with most recent versions of CUDA and torch. I cannot downdrade CUDA to a lower version. Thanks !

@hansen7
Copy link

hansen7 commented Jun 15, 2020

Does anyone has figured out how to install apex with

CUDA Version 10.0*
torch==1.5.0

It will be immensely helpful if there is a way to install apex with most recent versions of CUDA and torch. I cannot downdrade CUDA to a lower version. Thanks !

I think PyTorch 1.5.0 is compiled with CUDA 10.2

@ashesh-0
Copy link

Does anyone has figured out how to install apex with

CUDA Version 10.0*
torch==1.5.0

It will be immensely helpful if there is a way to install apex with most recent versions of CUDA and torch. I cannot downdrade CUDA to a lower version. Thanks !

I think PyTorch 1.5.0 is compiled with CUDA 10.2

Yeah. That is correct. Since I could not upgrade CUDA, I downgraded pytorch.
conda install gxx_linux-64 and conda install pytorch torchvision cudatoolkit=10.0 -c pytorch did the trick for me.

@gauenk
Copy link

gauenk commented Sep 14, 2020

I fixed this issue by running export CUDA_HOME=/usr/local/cuda-10.2/

@pyaf
Copy link

pyaf commented Dec 2, 2020

So, I had cuda-10.0 installed on my system (only /usr/local/cuda-10.0), I had installed pytorch with cuda-11.0, and that's why this compilation was throwing this error. I installed cuda-11.0 toolkit (only, didn't touch the drivers), and I had two cuda versions on my system after this (which is completely fine, you just need to point to the one you wanna use at the time of compilations and stuff). After this, I just did export CUDA_HOME=/usr/local/cuda-11.0/ and tried compiling again. It worked!

@Lavenderjiang
Copy link

After a long time of Googling, I found each version of cuda has different compatibility for gcc. For me, I was using cuda 10.2, and downgrading gcc to 6.1 solved this problem.

@serg06
Copy link

serg06 commented Mar 6, 2022

Thanks guys! I was able to install Apex for my conda PyTorch installation with your help. Here is the full step by step:

  • Figure out correct Cuda version with python -c "import torch; print(torch.version.cuda)"
  • Find the corresponding installer here
  • Download and execute the runfile
    • Uncheck everything except toolkit
    • Go into additional options -> toolkit and uncheck everything, like making symbolic links and desktop shortcuts
  • Finally, run the pip install command, but prefix it with CUDA_HOME=/usr/local/cuda-{your-version-here}/.
    • For example, I was installing cuda 11.3 so my full command was CUDA_HOME=/usr/local/cuda-11.3 pip install -v --disable-pip-version-check --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" ./

@daafonsecato
Copy link

In addition to #550 (comment) remember to change the specific version you want to add in the install command sudo apt-get -y install cuda-11-3

@Rainbowman0
Copy link

You can use the nvidia-smi and nvcc -V commands to check whether the NVIDIA CUDA driver version is consistent with the cuda compiler version. If it is not consistent, this error will be reported. For example, my previous version, as shown in the figure below, will lead to the same error.
image
image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests