Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ROCm 3.0 support #550

Open
sebpuetz opened this issue Dec 20, 2019 · 11 comments
Open

ROCm 3.0 support #550

sebpuetz opened this issue Dec 20, 2019 · 11 comments

Comments

@sebpuetz
Copy link

sebpuetz commented Dec 20, 2019

I updated to ROCm 3.0 which broke the PyTorch installation, which I expected. I recompiled the wheel following the instructions in the Wiki which worked, but importing torch in Python throws an ImportError:

$ python -c "import torch"
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/home/seb/.pyenv/versions/3.7.3/lib/python3.7/site-packages/torch/__init__.py", line 81, in <module>
    from torch._C import *
ImportError: libhc_am.so.3: cannot open shared object file: No such file or directory

Does the master branch in this repo support ROCm 3.0 yet?


For completeness' sake the commands used to compile & install:

git clone git@github.com:ROCmSoftwarePlatform/pytorch
cd pytorch
git submodule update --init --recursive
cd /data/pytorch/
python tools/amd_build/build_amd.py
sudo sed -i 's/find_dependency(hip)/find_dependency(HIP)/g' /opt/rocm/rocsparse/lib/cmake/rocsparse/rocsparse-config.cmake
sudo sed -i 's/find_dependency(hip)/find_dependency(HIP)/g' /opt/rocm/rocfft/lib/cmake/rocfft/rocfft-config.cmake
sudo sed -i 's/find_dependency(hip)/find_dependency(HIP)/g' /opt/rocm/miopen/lib/cmake/miopen/miopen-config.cmake
sudo sed -i 's/find_dependency(hip)/find_dependency(HIP)/g' /opt/rocm/rocblas/lib/cmake/rocblas/rocblas-config.cmake
pip install enum34 numpy pyyaml setuptools typing cffi future hypothesis
export PYTORCH_ROCM_ARCH=gfx906
USE_ROCM=1 USE_LMDB=1 USE_OPENCV=1 MAX_JOBS=14 python setup.py bdist_wheel
pip install dist/torch-1.4.0a0+97e1b3c-cp37-cp37m-linux_x86_64.whl
python -c "import torch"
@iotamudelta
Copy link

PyTorch was tested for ROCm 3.0 - however, you will need to apply this patch pytorch@3a7ecd3 .

@sebpuetz
Copy link
Author

Tried the following, but it didn't solve the issue:

git merge 3a7ecd3
USE_ROCM=1 USE_LMDB=1 USE_OPENCV=1 MAX_JOBS=14 python setup.py bdist_wheel
pip install dist/torch-1.4.0a0+97e1b3c-cp37-cp37m-linux_x86_64.whl --force
python -c "import torch"

I didn't clean up the build directory before trying this, currently recompiling from clean state. I'll comment again if the problem persists.

@sebpuetz
Copy link
Author

sebpuetz commented Dec 20, 2019

Same error:

$ python -c "import torch"
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/home/seb/.pyenv/versions/3.7.3/lib/python3.7/site-packages/torch/__init__.py", line 81, in <module>
    from torch._C import *
ImportError: libhc_am.so.3: cannot open shared object file: No such file or directory

@bcahlit
Copy link

bcahlit commented Dec 21, 2019

i have same problem after upgrade ubuntu.

i try merge 3a7ecd3, it still.

there are libhc_am.so in my rocm

(base) ➜ pytorch git:(master) ✗ find /opt/rocm |grep libhc_am
/opt/rocm/hcc/lib/libhc_am.so.2
/opt/rocm/hcc/lib/libhc_am.so.2.10
/opt/rocm/hcc/lib/libhc_am.so
/opt/rocm/lib/libhc_am.so.2
/opt/rocm/lib/libhc_am.so.2.10
/opt/rocm/lib/libhc_am.so

@se11en
Copy link

se11en commented Dec 22, 2019

I had the same problem, too.
But when I built torchvision, the problem is gone.
here is how I built torchvision.

git clone https://github.com/pytorch/vision
cd vision
python setup.py install
cd ..

After this , everything went fine.
Note that my OS is Ubuntu 16.04 and the Rocm veision is 3.0, and I installed
rock-dkms rocm-dev rocm-libs miopen-hip hipsparse rocthrust hipcub rccl roctracer-dev before building and added extra sed command
sed -i 's/find_dependency(hip)/find_dependency(HIP)/g' /opt/rocm/rccl/lib/cmake/rccl/rccl-config.cmake

@sebpuetz
Copy link
Author

Note that my OS is Ubuntu 16.04 and the Rocm veision is 3.0, and I installed
rock-dkms rocm-dev rocm-libs miopen-hip hipsparse rocthrust hipcub rccl roctracer-dev before building and added extra sed command

This was actually enough to resolve the issue. sudo apt install rock-dkms rocm-dev rocm-libs miopen-hip hipsparse rocthrust hipcub rccl roctracer-dev installed a few new packages that apparently previously weren't part of the requirements?

Thanks @se11en for documenting this.

@sebpuetz
Copy link
Author

sebpuetz commented Dec 22, 2019

Although, this did fix the import error, now there's stack smashing going on:

import torch
torch.zeros(100).cuda()
*** stack smashing detected ***: <unknown> terminated
[1]    10291 abort (core dumped)  ipython

Any insights on that?

@bcahlit
Copy link

bcahlit commented Dec 22, 2019

Thanks @se11en ,i install hcc it work。

@sebpuetz i use rx580 , it no such problem.

@sebpuetz
Copy link
Author

I'm on Ubuntu 18.04 and Radeon VII

@se11en
Copy link

se11en commented Dec 22, 2019

torch.zeros(100).cuda()

Maybe you can try to rebuild pytorch after added extra packages.

>>> import torch
>>> torch.zeros(100).cuda()
tensor([0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
        0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
        0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
        0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
        0., 0., 0., 0.], device='cuda:0')

My GPU is Radeon VII, too. I don't know whether your problem is related to the OS version, because the kernel of Ubuntu18.04 is Linux 5.0+, which is newer. I cannot pass every test, but my codes run well. So I am OK with the situation.

@iotamudelta
Copy link

Upstream CI was updated to ROCm 3.1.1 and above compilation was ifdef'd based on ROCm version.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants