Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pytorch Install Instructions #515

Closed
Delaunay opened this issue Oct 25, 2019 · 9 comments
Closed

Pytorch Install Instructions #515

Delaunay opened this issue Oct 25, 2019 · 9 comments

Comments

@Delaunay
Copy link

馃摎 Documentation

Pytorch build instructions are a bit outdated

  1. New rocm dependency roctracer (ROCTX)

rock-dkms rocm-dev rocm-libs miopen-hip hipsparse rocthrust hipcub rccl roctracer-dev

  1. New sed command for rccl

sed -i 's/find_dependency(hip)/find_dependency(HIP)/g' /opt/rocm/rccl/lib/cmake/rccl/rccl-config.cmake

I also had to add RCCL linking path export LIBRARY_PATH="/opt/rocm/rccl/lib/".
Although CMake did say it found it and ld -lrccl was not able to find it either.

@ghostplant
Copy link

Is there a prebuilt version for pytorch >= 1.0?

@iotamudelta
Copy link

@ghostplant yes, see https://hub.docker.com/r/rocm/pytorch/tags

@ghostplant
Copy link

ghostplant commented Nov 9, 2019

@iotamudelta I tried that but torch library is not included in the python runtime:

$ docker run -it --rm --network=host --privileged rocm/pytorch:rocm2.9_ubuntu18.04_py3.6 python3 -c 'import torch'
Traceback (most recent call last):
  File "<string>", line 1, in <module>
ModuleNotFoundError: No module named 'torch'

@jithunnair-amd
Copy link
Collaborator

jithunnair-amd commented Nov 9, 2019

Please try the ones with '_pytorch' in their tag, and specify 'python3.6' instead of 'python3' (sometimes the dockers have python3.5 linked to python3). HTH

@ghostplant
Copy link

@jithunnair-amd Thanks, seems that for rocm-2.9, only py2.7 contains the pytorch but py3.6 doesn't.

@jithunnair-amd
Copy link
Collaborator

Seems okay on my end:

user@machine:~$ rundocker --name DELETE_ME rocm/pytorch:rocm2.9_ubuntu16.04_py3.6_pytorch
root@d3811f1dcda9:/# python3 --version
Python 3.5.2
root@d3811f1dcda9:/# python3.6
Python 3.6.9 (default, Jul  3 2019, 15:36:16)
[GCC 5.4.0 20160609] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import torch
>>> torch.__file__
'/root/.local/lib/python3.6/site-packages/torch/__init__.py'

@Delaunay
Copy link
Author

Your both correct, ubuntu18.04 does not have pytorch but ubuntu16.04 does

sudo docker run -it rocm/pytorch:rocm2.9_ubuntu18.04_py3.6 bash
root@0070379bf58b:/# python3.6 -c "import torch"
Traceback (most recent call last):
  File "<string>", line 1, in <module>
ModuleNotFoundError: No module named 'torch'
sudo docker run -it rocm/pytorch:rocm2.9_ubuntu16.04_py3.6_pytorch bash
root@58934992cf7d:/# python3.6 -c "import torch"
root@58934992cf7d:/# 

@jithunnair-amd
Copy link
Collaborator

@Delaunay Wiki has been updated for the sed command and dependency update. However, you shouldn't need to modify your LIBRARY_PATH for rccl, as the cmake build should find the rccl library and use the absolute path in the build. Can you please try again and let us know if you still observe the issue with finding rccl library? You can use VERBOSE=1 during the build and capture the log file to observe which build command uses '-lrccl', if any.

@Delaunay
Copy link
Author

I will close the issue, I do not have access to the machine with AMD GPUs anymore so I wont be able to test again for the RCCL linking issue.
I assume it was probably just an issue with my setup

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants