RuntimeError: HIP error when running ResNet-50 on AMD PRO W7900 GPU with PyTorch #1398

liangyong928 · 2024-04-23T15:34:34Z

The following code runs normally on an AMD PRO W7900 GPU:

import torch
device = torch.device("cuda")
x = torch.randn(128,10,224,224).to(device)
model = torch.nn.Conv2d(10, 64, 5).to(device)
output = model(x)
print(output.device)

However, when running the code below, I encounter an error:

import torch
import torchvision.models as models
from torchvision.models import ResNet50_Weights
x_large = torch.randn(128, 3, 224, 224)
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
#device = torch.device("cpu")
weights = ResNet50_Weights.IMAGENET1K_V1
model = models.resnet50(weights=weights).to(device)
model.eval()
x_large = x_large.to(device)
output = model(x_large)
print(output.device)

The error message is as follows:

Traceback (most recent call last):
  File "/root/test/testnew2.py", line 11, in <module>
    output = model(x_large)
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/torchvision/models/resnet.py", line 285, in forward
    return self._forward_impl(x)
  File "/usr/local/lib/python3.10/dist-packages/torchvision/models/resnet.py", line 278, in _forward_impl
    x = self.avgpool(x)
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/pooling.py", line 1194, in forward
    return F.adaptive_avg_pool2d(input, self.output_size)
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/functional.py", line 1228, in adaptive_avg_pool2d
    return torch._C._nn.adaptive_avg_pool2d(input, _output_size)
RuntimeError: HIP error: the operation cannot be performed in the present state
Compile with `TORCH_USE_HIP_DSA` to enable device-side assertions.

Why does the first block of code run without any issues while the second block throws an error when using the AMD PRO W7900 GPU for computation? I would appreciate any insights or suggestions for resolving this issue.

The text was updated successfully, but these errors were encountered:

briansp2020 · 2024-04-23T16:18:14Z

What version of ROCm and PyTorch are you using? It helps to provide as much information as possible when asking for help.
I just tried it with ROCm 6.1 & PyTorch nightly build on 7900XTX and it seems fine...

liangyong928 · 2024-04-26T03:54:56Z

@briansp2020
For installing PyTorch, following the instructions in https://rocm.docs.amd.com/projects/radeon/en/latest/docs/install/install-pytorch.html under the "Install PyTorch via PIP" section, the installation commands used were:

wget https://repo.radeon.com/rocm/manylinux/rocm-rel-6.0.2/torch-2.1.2+rocm6.0-cp310-cp310-linux_x86_64.whl
wget https://repo.radeon.com/rocm/manylinux/rocm-rel-6.0.2/torchvision-0.16.1+rocm6.0-cp310-cp310-linux_x86_64.whl
pip3 install --force-reinstall torch-2.1.2+rocm6.0-cp310-cp310-linux_x86_64.whl torchvision-0.16.1+rocm6.0-cp310-cp310-linux_x86_64.whl

The versions of Python and PyTorch are as follows:

root@yong-System:/home/yong# python3
Python 3.10.12 (main, Nov 20 2023, 15:14:05) [GCC 11.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import torch
>>> torch.__version__
'2.1.2+rocm6.0'

For installing ROCm, the commands used followed the "Option A: Graphics usecase" section from https://rocm.docs.amd.com/projects/radeon/en/latest/docs/install/install-radeon.html# :

sudo apt update
wget https://repo.radeon.com/amdgpu-install/23.40.2/ubuntu/jammy/amdgpu-install_6.0.60002-1_all.deb
sudo apt install ./amdgpu-install_6.0.60002-1_all.deb
sudo amdgpu-install -y --usecase=graphics,rocm
sudo usermod -a -G render,video $LOGNAME

After installation, the ROCm version is 6.0.2:

root@yong-System:/home/yong# ls /opt/rocm
rocm/       rocm-6.0.2/ 
root@yong-System:/home/yong# ls /opt/rocm-6.0.2/
amdgcn  bin  bin.bak  include  lib  libexec  llvm  share

briansp2020 · 2024-04-26T04:37:32Z

I'd try the newly released 2.3

pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/rocm6.0

or the nightly version

pip3 install --pre --force-reinstall torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/rocm6.0

liangyong928 · 2024-04-26T05:44:13Z

pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/rocm6.0

Thanks, the issue has been resolved.

liangyong928 closed this as completed Apr 26, 2024

liangyong928 mentioned this issue Apr 26, 2024

RuntimeError: HIP error when running ResNet-50 on PRO W7900 with PyTorch pytorch/examples#1249

Closed

briansp2020 mentioned this issue Apr 29, 2024

[Issue]: python -c "import torch;print(torch.cuda.is_available())" returns False ROCm/ROCm#3072

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RuntimeError: HIP error when running ResNet-50 on AMD PRO W7900 GPU with PyTorch #1398

RuntimeError: HIP error when running ResNet-50 on AMD PRO W7900 GPU with PyTorch #1398

liangyong928 commented Apr 23, 2024

briansp2020 commented Apr 23, 2024

liangyong928 commented Apr 26, 2024

briansp2020 commented Apr 26, 2024

liangyong928 commented Apr 26, 2024

RuntimeError: HIP error when running ResNet-50 on AMD PRO W7900 GPU with PyTorch #1398

RuntimeError: HIP error when running ResNet-50 on AMD PRO W7900 GPU with PyTorch #1398

Comments

liangyong928 commented Apr 23, 2024

briansp2020 commented Apr 23, 2024

liangyong928 commented Apr 26, 2024

briansp2020 commented Apr 26, 2024

liangyong928 commented Apr 26, 2024