Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cant install on google colab #627

Closed
hadaev8 opened this issue Nov 24, 2019 · 6 comments
Closed

Cant install on google colab #627

hadaev8 opened this issue Nov 24, 2019 · 6 comments

Comments

@hadaev8
Copy link

hadaev8 commented Nov 24, 2019

https://colab.research.google.com/drive/19irKL4JTWyyPG70QGsihn9vx5966zQq8

It is worked fine a while ago

@bamps53
Copy link

bamps53 commented Nov 25, 2019

Me either. FYI, when I omit --global-option="--cpp_ext" --global-option="--cuda_ext", it worked.

@NaleRaphael
Copy link

Version of default PyTorch in Google Colab has been updated to 1.3.1 and it's compiled by CUDA 10.1.
However, CUDA in the colab runtime is 10.0.130 (you can check it by this command: !cat /usr/local/cuda/version.txt).

So, when you install apex, you may see the following error message (run the pip command without -q):

...
Compiling cuda extensions with
    nvcc: NVIDIA (R) Cuda compiler driver
    Copyright (c) 2005-2018 NVIDIA Corporation
    Built on Sat_Aug_25_21:08:01_CDT_2018
    Cuda compilation tools, release 10.0, V10.0.130
    from /usr/local/cuda/bin

    Traceback (most recent call last):
      File "<string>", line 1, in <module>
      File "/tmp/pip-req-build-_saohd3c/setup.py", line 100, in <module>
        check_cuda_torch_binary_vs_bare_metal(torch.utils.cpp_extension.CUDA_HOME)
      File "/tmp/pip-req-build-_saohd3c/setup.py", line 77, in check_cuda_torch_binary_vs_bare_metal
        "https://github.com/NVIDIA/apex/pull/323#discussion_r287021798.  "
    RuntimeError: Cuda extensions are being compiled with a version of Cuda that does not match the version used to compile Pytorch binaries.  Pytorch binaries were compiled with Cuda 10.1.243.
    In some cases, a minor-version mismatch will not cause later errors:  https://github.com/NVIDIA/apex/pull/323#discussion_r287021798.  You can try commenting out this check (at your own risk).
...

To solve this problem, you can try to install another version of PyTorch compiled by CUDA 10.
In my case, I choose torch 1.2.0 + torchvision 0.4.0:

!pip install https://download.pytorch.org/whl/cu100/torch-1.2.0-cp36-cp36m-manylinux1_x86_64.whl && pip install https://download.pytorch.org/whl/cu100/torchvision-0.4.0-cp36-cp36m-manylinux1_x86_64.whl

Or you can find the version you want to use from the official archives (choose those links with a prefix cu100 and a suffix cp36-cp36m-manylinux1_x86_64): https://download.pytorch.org/whl/torch_stable.html

After re-installing PyTorch, you can install apex again (it doesn't need to omit the option for extension --global-option="--cpp_ext" --global-option="--cuda_ext"), and it should work.

@henrique
Copy link

downgrading and restarting the runtime works fine! thx
https://colab.research.google.com/drive/1drodd29aL2B8ufcb0gwrBBhGDvPBaDha

I guess you can close this issue

@hadaev8 hadaev8 closed this as completed Jan 19, 2020
@kushagra1198
Copy link

My colab has CUDA 10.1.243. What pytorch version should I install?

@NaleRaphael
Copy link

NaleRaphael commented Jun 25, 2020

Hi @kushagra1198, apex should work fine without doing any further configuration now.

Currently, PyTorch on Colab is also compiled by CUDA 10.1.243, you can check it out by the following code snippet:

# just some code taken from `apex/setup.py`
import subprocess, torch
from torch.utils.cpp_extension import CUDAExtension

cuda_dir = torch.utils.cpp_extension.CUDA_HOME
print(subprocess.check_output([cuda_dir + "/bin/nvcc", "-V"]))

# you would see the this:
# b'nvcc: NVIDIA (R) Cuda compiler driver\nCopyright (c) 2005-2019 NVIDIA Corporation\nBuilt on Sun_Jul_28_19:07:16_PDT_2019\nCuda compilation tools, release 10.1, V10.1.243\n'

Feel free to let me know if it doesn't work.

@ManosMpampis
Copy link

With latest changes I try to run the official pytorchbearer google colab notebook in: https://colab.research.google.com/github/pytorchbearer/torchbearer/blob/master/docs/_static/notebooks/apex_torchbearer.ipynb#scrollTo=kaCrlsfk-UDw

when I use the cuda extension to install apex, the installation is stuck in loop with a lot of msgs like:
/usr/local/cuda/bin/nvcc -I/usr/local/lib/python3.7/dist-packages/torch/include -I/usr/local/lib/python3.7/dist-packages/torch/include/torch/csrc/api/include -I/usr/local/lib/python3.7/dist-packages/torch/include/TH -I/usr/local/lib/python3.7/dist-packages/torch/include/THC -I/usr/local/cuda/include -I/usr/include/python3.7m -c csrc/multi_tensor_sgd_kernel.cu -o build/temp.linux-x86_64-3.7/csrc/multi_tensor_sgd_kernel.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options '-fPIC' -lineinfo -O3 --use_fast_math -DVERSION_GE_1_1 -DVERSION_GE_1_3 -DVERSION_GE_1_5 -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE="gcc" -DPYBIND11_STDLIB="libstdcpp" -DPYBIND11_BUILD_ABI="cxxabi1011" -DTORCH_EXTENSION_NAME=amp_C -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_37,code=compute_37 -gencode=arch=compute_37,code=sm_37 -std=c++14
nvcc warning : The 'compute_35', 'compute_37', 'compute_50', 'sm_35', 'sm_37' and 'sm_50' architectures are deprecated, and may be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning).
/usr/local/cuda/bin/nvcc -I/usr/local/lib/python3.7/dist-packages/torch/include -I/usr/local/lib/python3.7/dist-packages/torch/include/torch/csrc/api/include -I/usr/local/lib/python3.7/dist-packages/torch/include/TH -I/usr/local/lib/python3.7/dist-packages/torch/include/THC -I/usr/local/cuda/include -I/usr/include/python3.7m -c csrc/multi_tensor_scale_kernel.cu -o build/temp.linux-x86_64-3.7/csrc/multi_tensor_scale_kernel.o -D__CUDA_NO_HALF_OPERATORS
-D__CUDA_NO_HALF_CONVERSIONS
_ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options '-fPIC' -lineinfo -O3 --use_fast_math -DVERSION_GE_1_1 -DVERSION_GE_1_3 -DVERSION_GE_1_5 -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE="_gcc" -DPYBIND11_STDLIB="_libstdcpp" -DPYBIND11_BUILD_ABI="_cxxabi1011" -DTORCH_EXTENSION_NAME=amp_C -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_37,code=compute_37 -gencode=arch=compute_37,code=sm_37 -std=c++14
nvcc warning : The 'compute_35', 'compute_37', 'compute_50', 'sm_35', 'sm_37' and 'sm_50' architectures are deprecated, and may be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning).
Running setup.py install for apex

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants