Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Building PyTorch w/o Docker ? #337

Open
gateway opened this issue Jan 2, 2019 · 22 comments
Open

Building PyTorch w/o Docker ? #337

gateway opened this issue Jan 2, 2019 · 22 comments
Assignees

Comments

@gateway
Copy link

gateway commented Jan 2, 2019

Hi, Im trying to get my AMD system set up to run some torch software , I prefer not to have to mess with Docker, is there a reason to do this ?

Is there a way to build this w/o docker?

@iotamudelta
Copy link

Sure, make sure that you install the dependencies as listed inside the docker files and follow the subsequent steps afterwards.

@iotamudelta iotamudelta self-assigned this Jan 4, 2019
@iamkucuk
Copy link

Sure, make sure that you install the dependencies as listed inside the docker files and follow the subsequent steps afterwards.

An installation script may be very good and helpful. I would be grateful if you could provide one for the community!

@iamkucuk
Copy link

Any progress on that?

@Delaunay
Copy link

@iotamudelta can you point me to the docker file you are referring to ? Is it that one ?

This is what I did to compile pytorch:

  1. Install pytorch dependencies, rocm-dev and a bunch of rocm libraries. CMake will gracefully tell you which are missing

  2. execute ./.jenkins/caffe2/build.sh
    it hipifies the caffe2 source code, generating the missing files required for the compilation.
    You might be able to just run python tools/amd_build/build_amd.py but I have not tried it alone.

  3. compile pytorch as usual python setup.py develop.

The compilation is still going so I am not sure if it is all I needed to do but it looks good so far.
hipcc uses a lot of memory. I had a few OOM errors, made me restart with make -j 1.

@iotamudelta
Copy link

@Delaunay yes, that Dockerfile is part of it - I'd recommend using https://github.com/ROCmSoftwarePlatform/pytorch/blob/master/docker/caffe2/jenkins/build.sh with "py2-clang7-rocmdeb-ubuntu16.04" as the argument if you build your own docker. A standalone Dockerfile is here: https://raw.githubusercontent.com/wiki/ROCmSoftwarePlatform/pytorch/Dockerfile

Yes, just running python tools/amd_build/build_amd.py is sufficient to hipify the full source.

How much RAM do you have? A good rule of thumb seems to be MAX_JOBS=(RAM in GB)/4.

@Delaunay
Copy link

I only have 8Go on that machine.
I was able to compile pytorch with ninja (without it the installation fails)
but the version I compiled is not functional.

Would you know it is an issue with the configuration of the compilation or if the kernel is really missing ?
Thanks

>>> import torch
>>> torch.cuda.is_available()
True
>>> torch.cuda.get_device_name(0)
'Ellesmere [Radeon RX 470/480/570/570X/580/580X]'
>>> torch.cuda.max_memory_allocated(0)
1024
>>> t = torch.zeros((10, 10, 10), dtype=torch.float32)
>>> t.cuda()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/setepenre/rocm_pytorch/torch/tensor.py", line 70, in __repr__
    return torch._tensor_str._str(self)
  File "/home/setepenre/rocm_pytorch/torch/_tensor_str.py", line 285, in _str
    tensor_str = _tensor_str(self, indent)
  File "/home/setepenre/rocm_pytorch/torch/_tensor_str.py", line 203, in _tensor_str
    formatter = _Formatter(get_summarized_data(self) if summarize else self)
  File "/home/setepenre/rocm_pytorch/torch/_tensor_str.py", line 89, in __init__
    nonzero_finite_vals = torch.masked_select(tensor_view, torch.isfinite(tensor_view) & tensor_view.ne(0))
  File "/home/setepenre/rocm_pytorch/torch/functional.py", line 222, in isfinite
    return (tensor == tensor) & (tensor.abs() != inf)
RuntimeError: No device code available for function: _Z21kernelPointwiseApply3I10TensorEQOpIfhEhffjLi1ELi1ELi1EEv10OffsetInfoIT0_T3_XT4_EES2_IT1_S4_XT5_EES2_IT2_S4_XT6_EES4_T_

@iotamudelta
Copy link

@Delaunay what GPU do you have? We currently need to compile specifically for a microarchitecture (changes to that are incoming). Export HCC_AMDGPU_TARGET prior to building to your uarch - either gfx803 (which we do not support well in PT, if you find issues please report them), gfx900 (Vega64/Vega56 generation, these work well), or gfx906 (Radeon VII, this should also work well)

@Delaunay
Copy link

Thanks, recompiled it overnight for the gfx803. It is working now.
I only have one test failing on my side. Is it supposed to ?
If not I can open another ticket and gather info on it.

======================================================================
FAIL: test_multinomial_invalid_probs_cuda (test_cuda.TestCuda)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/setepenre/rocm_pytorch/test/common_utils.py", line 296, in wrapper
    method(*args, **kwargs)
  File "/home/setepenre/rocm_pytorch/test/test_cuda.py", line 2223, in test_multinomial_invalid_probs_cuda
    self._spawn_method(test_method, torch.Tensor([1, -1, 1]))
  File "/home/setepenre/rocm_pytorch/test/test_cuda.py", line 2203, in _spawn_method
    self.fail(e)
AssertionError: False

@iotamudelta
Copy link

Yeah, that test works for me on gfx906. So please do open a ticket. I don't have a gfx803 setup currently but I'll try to have a look at it when I do and have time. In the meantime, we can discuss in that ticket how to root cause.

Is that the only failing test? That'd be better than I thought, to be honest.

@Delaunay
Copy link

Delaunay commented Feb 25, 2019

This is what I got on my side overall with PYTORCH_TEST_WITH_ROCM=1 python test/run_test.py

  • test_autograd 919 tests in 161s (6 skipped)
  • test_cuda 154 tests in 19sec (77 skipped, 1 failed)

I also ran resnet18 & resnet50. I will do more testing later but for now the timings look great.


For the person stumbling upon this thread.
You can find below the rough steps describing how to compile without docker:

  1. Install ROCm here
  2. Install PyTorch dependencies (I recommend using Ninja)
  3. Install ROCm PyTorch dependencies (some might already be installed)
    • rocrand, hiprand, rocblas, miopen, miopengemm, rocfft, rocsparse, rocm-cmake, rocm-dev, rocm-device-libs,rocm-libs, hcc, hip_base,hip_hcc, hip-thrust
  4. Clone PyTorch repository
  5. 'Hipify' PyTorch source by executing python tools/amd_build/build_amd.py
  6. You can set export USE_NINJA=1 and export MAX_JOBS=N (N=(RAM in GB)/4)
  7. python setup.py [develop|install]
  8. Make sure everything is working with PYTORCH_TEST_WITH_ROCM=1 python test/run_test.py

@iamkucuk
Copy link

This is what I got on my side overall with PYTORCH_TEST_WITH_ROCM=1 python test/run_test.py

  • test_autograd 919 tests in 161s (6 skipped)
  • test_cuda 154 tests in 19sec (77 skipped, 1 failed)

I also ran resnet18 & resnet50. I will do more testing later but for now the timings look great.

For the person stumbling upon this thread.
You can find below the rough steps describing how to compile without docker:

  1. Install ROCm here

  2. Install PyTorch dependencies (I recommend using Ninja)

  3. Install ROCm PyTorch dependencies (some might already be installed)

    • rocrand, hiprand, rocblas, miopen, miopengemm, rocfft, rocsparse, rocm-cmake, rocm-dev, rocm-device-libs,rocm-libs, hcc, hip_base,hip_hcc
  4. Clone PyTorch repository

  5. 'Hipify' PyTorch source by executing python tools/amd_build/build_amd.py

  6. Pick the architecture you want to compile for by setting HCC_AMDGPU_TARGET=gfx900 (multi arch support incoming)

    • gfx906 for Radeon VII
    • gfx900 for Vega
    • gfx803 for Radeon RX 470/480/570/570X/580/580X
  7. You can set export USE_NINJA=1 and export MAX_JOBS=N (N=(RAM in GB)/4)

  8. python setup.py [develop|install]

  9. Make sure everything is working with PYTORCH_TEST_WITH_ROCM=1 python test/run_test.py

Finally a proper answer! I can't thank you enough for this! Will try it ASAP!

@iamkucuk
Copy link

This is what I got on my side overall with PYTORCH_TEST_WITH_ROCM=1 python test/run_test.py

  • test_autograd 919 tests in 161s (6 skipped)
  • test_cuda 154 tests in 19sec (77 skipped, 1 failed)

I also ran resnet18 & resnet50. I will do more testing later but for now the timings look great.

For the person stumbling upon this thread.
You can find below the rough steps describing how to compile without docker:

  1. Install ROCm here

  2. Install PyTorch dependencies (I recommend using Ninja)

  3. Install ROCm PyTorch dependencies (some might already be installed)

    • rocrand, hiprand, rocblas, miopen, miopengemm, rocfft, rocsparse, rocm-cmake, rocm-dev, rocm-device-libs,rocm-libs, hcc, hip_base,hip_hcc
  4. Clone PyTorch repository

  5. 'Hipify' PyTorch source by executing python tools/amd_build/build_amd.py

  6. Pick the architecture you want to compile for by setting HCC_AMDGPU_TARGET=gfx900 (multi arch support incoming)

    • gfx906 for Radeon VII
    • gfx900 for Vega
    • gfx803 for Radeon RX 470/480/570/570X/580/580X
  7. You can set export USE_NINJA=1 and export MAX_JOBS=N (N=(RAM in GB)/4)

  8. python setup.py [develop|install]

  9. Make sure everything is working with PYTORCH_TEST_WITH_ROCM=1 python test/run_test.py

Quick questions: I don't have any info about Ninja. Is this the package manager you are talking about?
Is there a documentation for how to use it and using pip instead of ninja causes any trouble?
Where can I find a RocM alternative for magma-cuda dependency? Or should I just ignore it?

@Delaunay
Copy link

Delaunay commented Feb 27, 2019

ninja is just the build system that pytorch can use to compile itself. You do not have to use it.
it is explained here.

ROCm has rocblas and miopen for linear algebra and Machine learning primitives respectively.
I did not see anything about Magma when I installed pytorch.

@masahi
Copy link

masahi commented Feb 28, 2019

@Delaunay thanks for the info I managed to build pytorch from source on my box! I should mention that I had to install thrust hip port to build caffe2.

@Delaunay
Copy link

thanks, I updated the list of dependencies

@hameerabbasi
Copy link

#337 (comment) doesn't seem to work for me. I get this error no matter what I try:

 By not providing "Findhip.cmake" in CMAKE_MODULE_PATH this project has
  asked CMake to find a package configuration file provided by "hip", but
  CMake did not find one.

I'm willing to help debug the issue, I have all dependencies already installed.

@iotamudelta
Copy link

@Delaunay could you remove step #6 pertaining to HCC_AMDGPU_TARGET? The default is multi-arch now and it's a debug flag of the compiler that I'd rather we not continue exploit. :-)

@Delaunay
Copy link

nice, I updated it

@iamkucuk
Copy link

@Delaunay Hi mate! I'm trying to build pytorch with your way, however I'm experiencing some issues. Here is my script. Can you check it out? https://gist.github.com/iamkucuk/c8f74ec6d4f91804d6ff3d1006f26040

@iotamudelta
Copy link

We added documentation for host installs here: https://github.com/ROCmSoftwarePlatform/pytorch/wiki/Building-PyTorch-for-ROCm#option-4-install-directly-on-host

Please note that this requires good knowledge of your operating system, its package manager, and unfortunately in step 4) makes alterations to the ROCm install itself - we are hoping to fix the last in the future.

@iamkucuk
Copy link

We added documentation for host installs here: https://github.com/ROCmSoftwarePlatform/pytorch/wiki/Building-PyTorch-for-ROCm#option-4-install-directly-on-host

Please note that this requires good knowledge of your operating system, its package manager, and unfortunately in step 4) makes alterations to the ROCm install itself - we are hoping to fix the last in the future.

Why don't you provide a script for full installation process? PyTorch is becoming more popular, especially in academic world.

@dagamayank
Copy link

Why don't you provide a script for full installation process? PyTorch is becoming more popular, especially in academic world.

@msabony1966

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants