ROCm 5.xx ever planning to include gfx90c GPUs? #1743

shridharkini6 · 2022-05-23T12:32:52Z

Hi
the official docker images of pytorch and tf docker are avialble only for gfx900(Vega10-type GPU - MI25, Vega56, Vega64), gfx906 (Vega20-type GPU - MI50, MI60) and gfx908 (MI100), gfx90a (MI200) and gfx1030 (Navi21).

When does gfx90c support expected.
Thanks

Bengt · 2022-05-27T21:46:56Z

Hi, @shridharkini6!

Thanks for your request. Since I am not an employee at AMD, I have no insight into what is planned there internally. However, at least some amount of library coverage seems to be a prerequisite for extending the Docker images to this class of GPUs, which are integrated into the CPU (or an "APU" in AMD's lingo). However, I do not see any support for the gfx90c as a TARGET in any of the public libraries. See my pull request for an attempt at a complete overview of the state of library support. PyTorch uses RCCL and MIOpen to run on ROCm, and so does TensorFlow. MIOpen in turn uses rocBLAS as its backend. For the available TARGETs, see the CMakeLists.txt of rocBLAS and the CMakeLists.txt of RCCL, respectively. As you can see, there is no support for gfx90c and in fact no other APU.

This aligns with what can be gathered from public sources, namely that AMD is focussing on the products which the hyperscalers or supercomputer customers are currently buying. I personally think this is fair enough, as those customers seem to be rather feature-sensitive. Starting from those high-profile customers, consider the following leaky pipe of support:

Enterprise
("Instinct"-branded products intended for hyperscalers and supercomputer customers, usually sold in servers or racks)
Professional
("Radeon PRO"-branded products intended for CAD and such use cases, usually sold in workstations)
Desktop
("Radeon"-branded products intended for demanding users like gamers and video editors, sold as dGPU components or pre-built systems)
APUs
("Ryzen with Radeon Graphics"-branded products intended for lighter workloads like office PCs and thin/light laptops)

Things might change a bit with the Ryzen 7000 line of desktop processors, which are announced to include a chiplet-ish GPU in the IO die. Such an arrangement does not currently fit into this leaky support pipe, but I would also not hold my breath for any kind of revolution. My bet would be on support gradually improving, as it has (not without setbacks) in the past.

ffleader1 · 2022-05-28T03:55:16Z

I do not think it is AMD's top priority to support an APU when even the Navi 22 and Navi 23 are not supported. Also, AMD did pull the plug on supporting APUs long time before. So I think quite frankly, to answer your question, it is... never.

AGenchev · 2022-05-31T23:13:28Z

@ffleader1 that's not so clever move from AMD, because they have nothing positioned against Nvidia Jetson type of hardware. So we buy Nvidia APUs despite they're not very FOSS friendly.

langyuxf · 2022-06-08T13:16:43Z

Here is a workaround to run pytorch on gfx90c.
Just build pytorch for gfx900 and override gfx90c to gfx900.

Build pytorch
$ git clone https://github.com/pytorch/pytorch.git  
$ cd pytorch  
$ git submodule update --init --recursive
$ sudo pip3 install -r requirements.txt
$ sudo pip3 install enum34 numpy pyyaml setuptools typing cffi future hypothesis typing_extensions
$ sudo python3 tools/amd_build/build_amd.py
$ sudo PYTORCH_ROCM_ARCH=gfx900 USE_ROCM=1 MAX_JOBS=4 python3 setup.py install

Run an example
$ git clone https://github.com/pytorch/examples.git
$ cd examples/mnist
$ sudo pip3 install -r requirements.txt
$ sudo HSA_OVERRIDE_GFX_VERSION=9.0.0 python3 main.py
...
Train Epoch: 14 [51200/60000 (85%)]     Loss: 0.027863
Train Epoch: 14 [51840/60000 (86%)]     Loss: 0.017484
Train Epoch: 14 [52480/60000 (87%)]     Loss: 0.021983
Train Epoch: 14 [53120/60000 (88%)]     Loss: 0.003217
Train Epoch: 14 [53760/60000 (90%)]     Loss: 0.011038
Train Epoch: 14 [54400/60000 (91%)]     Loss: 0.007962
Train Epoch: 14 [55040/60000 (92%)]     Loss: 0.018526
Train Epoch: 14 [55680/60000 (93%)]     Loss: 0.001039
Train Epoch: 14 [56320/60000 (94%)]     Loss: 0.017513
Train Epoch: 14 [56960/60000 (95%)]     Loss: 0.028949
Train Epoch: 14 [57600/60000 (96%)]     Loss: 0.028286
Train Epoch: 14 [58240/60000 (97%)]     Loss: 0.064388
Train Epoch: 14 [58880/60000 (98%)]     Loss: 0.002042
Train Epoch: 14 [59520/60000 (99%)]     Loss: 0.002829

Test set: Average loss: 0.0280, Accuracy: 9921/10000 (99%)

Note:
1, Disable some power features for gfx90c
sudo modprobe amdgpu ppfeaturemask=0xfff73fff
2, ROCm
https://docs.amd.com/bundle/ROCm-Downloads-Guide-v5.0/page/ROCm_Installation.html
3, Pytorch
branch: master
commit: 815532d40c25e81d8c09b3c36403016bea394aee

langyuxf · 2022-06-09T10:55:01Z

You can also use docker of pytorch on gfx90c. Just run like this. @shridharkini6

$ git clone https://github.com/pytorch/examples.git
$ cd examples/mnist
$ pip3 install -r requirements.txt
$ HSA_OVERRIDE_GFX_VERSION=9.0.0 python3 main.py

Note:
Your video memory should be at least 2GB.

ffleader1 · 2022-06-09T11:00:37Z

You can also use docker of pytorch on gfx90c. Just run like this. @shridharkini6
$ git clone https://github.com/pytorch/examples.git
$ cd examples/mnist
$ pip3 install -r requirements.txt
$ HSA_OVERRIDE_GFX_VERSION=9.0.0 python3 main.py
Note: Your video memory should be at least 2GB.

Maybe you have not tried it but at least do u think your to method will work with unsupported GPUs, like gfx1031 for example.

langyuxf · 2022-06-09T11:07:47Z

You can also use docker of pytorch on gfx90c. Just run like this. @shridharkini6
$ git clone https://github.com/pytorch/examples.git
$ cd examples/mnist
$ pip3 install -r requirements.txt
$ HSA_OVERRIDE_GFX_VERSION=9.0.0 python3 main.py
Note: Your video memory should be at least 2GB.
Maybe you have not tried it but at least do u think your to method will work with unsupported GPUs, like gfx1031 for example.

You may try, run like this.

$ HSA_OVERRIDE_GFX_VERSION=10.3.0 python3 main.py

ffleader1 · 2022-06-09T13:09:45Z

You can also use docker of pytorch on gfx90c. Just run like this. @shridharkini6
$ git clone https://github.com/pytorch/examples.git
$ cd examples/mnist
$ pip3 install -r requirements.txt
$ HSA_OVERRIDE_GFX_VERSION=9.0.0 python3 main.py
Note: Your video memory should be at least 2GB.
Maybe you have not tried it but at least do u think your to method will work with unsupported GPUs, like gfx1031 for example.
You may try, run like this.

$ HSA_OVERRIDE_GFX_VERSION=10.3.0 python3 main.py

wait I am a bit confused.
Maybe I am missing something but your example is about running pytorch example right
But how do u get Rocm to install on gfx90c or gfx1031in the first place?
Thank you,

langyuxf · 2022-06-09T14:17:22Z

You can also use docker of pytorch on gfx90c. Just run like this. @shridharkini6
$ git clone https://github.com/pytorch/examples.git
$ cd examples/mnist
$ pip3 install -r requirements.txt
$ HSA_OVERRIDE_GFX_VERSION=9.0.0 python3 main.py
Note: Your video memory should be at least 2GB.
Maybe you have not tried it but at least do u think your to method will work with unsupported GPUs, like gfx1031 for example.
You may try, run like this.
$ HSA_OVERRIDE_GFX_VERSION=10.3.0 python3 main.py
wait I am a bit confused. Maybe I am missing something but your example is about running pytorch example right But how do u get Rocm to install on gfx90c or gfx1031in the first place? Thank you,

1, Docker with PyTorch and ROCm installed
https://docs.amd.com/bundle/AMD-Deep-Learning-Guide-v5.1.3/page/Deep_Learning_Frameworks.html
2, ROCm Installation guide
https://docs.amd.com/bundle/ROCm-Installation-Guide-v5.0/page/Overview_of_ROCm_Installation_Methods.html

ffleader1 · 2022-06-09T14:25:11Z

You can also use docker of pytorch on gfx90c. Just run like this. @shridharkini6
$ git clone https://github.com/pytorch/examples.git
$ cd examples/mnist
$ pip3 install -r requirements.txt
$ HSA_OVERRIDE_GFX_VERSION=9.0.0 python3 main.py
Note: Your video memory should be at least 2GB.
Maybe you have not tried it but at least do u think your to method will work with unsupported GPUs, like gfx1031 for example.
You may try, run like this.
$ HSA_OVERRIDE_GFX_VERSION=10.3.0 python3 main.py
wait I am a bit confused. Maybe I am missing something but your example is about running pytorch example right But how do u get Rocm to install on gfx90c or gfx1031in the first place? Thank you,
1, Docker with PyTorch and ROCm installed https://docs.amd.com/bundle/AMD-Deep-Learning-Guide-v5.1.3/page/Deep_Learning_Frameworks.html 2, ROCm Installation guide https://docs.amd.com/bundle/ROCm-Installation-Guide-v5.0/page/Overview_of_ROCm_Installation_Methods.html

I have not tried docker but for rocm, I am pretty sure the install will only be successful if your GPU is supported. I.e the rocm installation will not work on a gfx1031 or lower.

shridharkini6 · 2022-06-13T08:48:39Z

@xfyucg i followed your methods, looks to me training is using only CPU not GPU.

import torch
t = torch.tensor([5, 5, 5], dtype=torch.int64, device='cuda')

throws error like

Traceback (most recent call last):
File "", line 1, in
File "/opt/conda/lib/python3.7/site-packages/torch/cuda/init.py", line 216, in _lazy_init
torch._C._cuda_init()
RuntimeError: Found no NVIDIA driver on your system. Please check that you have an NVIDIA GPU and installed a driver from http://www.nvidia.com/Download/index.aspx

Thanks

langyuxf · 2022-06-13T08:54:40Z

@xfyucg i followed your methods, looks to me training is using only CPU not GPU.
import torch
t = torch.tensor([5, 5, 5], dtype=torch.int64, device='cuda')
throws error like

Traceback (most recent call last):
File "", line 1, in
File "/opt/conda/lib/python3.7/site-packages/torch/cuda/init.py", line 216, in _lazy_init
torch._C._cuda_init()
RuntimeError: Found no NVIDIA driver on your system. Please check that you have an NVIDIA GPU and installed a driver from http://www.nvidia.com/Download/index.aspx

Thanks

Run like this, that works well on my Cezanne platform.

lang@lang-test:~/Videos/pytorch$ HSA_OVERRIDE_GFX_VERSION=9.0.0 python3
Python 3.8.10 (default, Mar 15 2022, 12:22:08)
[GCC 9.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import torch
>>> t = torch.tensor([5, 5, 5], dtype=torch.int64, device='cuda')
>>>

shridharkini6 · 2022-06-13T09:00:02Z

@xfyucg i followed your methods, looks to me training is using only CPU not GPU.
import torch
t = torch.tensor([5, 5, 5], dtype=torch.int64, device='cuda')
throws error like

Traceback (most recent call last):
File "", line 1, in
File "/opt/conda/lib/python3.7/site-packages/torch/cuda/init.py", line 216, in _lazy_init
torch._C._cuda_init()
RuntimeError: Found no NVIDIA driver on your system. Please check that you have an NVIDIA GPU and installed a driver from http://www.nvidia.com/Download/index.aspx

Thanks
Run like this, that works well on my Cezanne platform.
lang@lang-test:~/Videos/pytorch$ HSA_OVERRIDE_GFX_VERSION=9.0.0 python3
Python 3.8.10 (default, Mar 15 2022, 12:22:08)
[GCC 9.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import torch
>>> t = torch.tensor([5, 5, 5], dtype=torch.int64, device='cuda')
>>> 

Tried this as well..ended up with same error

langyuxf · 2022-06-13T09:03:18Z

@shridharkini6

@xfyucg i followed your methods, looks to me training is using only CPU not GPU.
import torch
t = torch.tensor([5, 5, 5], dtype=torch.int64, device='cuda')
throws error like

Traceback (most recent call last):
File "", line 1, in
File "/opt/conda/lib/python3.7/site-packages/torch/cuda/init.py", line 216, in _lazy_init
torch._C._cuda_init()
RuntimeError: Found no NVIDIA driver on your system. Please check that you have an NVIDIA GPU and installed a driver from http://www.nvidia.com/Download/index.aspx

Thanks
Run like this, that works well on my Cezanne platform.
lang@lang-test:~/Videos/pytorch$ HSA_OVERRIDE_GFX_VERSION=9.0.0 python3
Python 3.8.10 (default, Mar 15 2022, 12:22:08)
[GCC 9.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import torch
>>> t = torch.tensor([5, 5, 5], dtype=torch.int64, device='cuda')
>>> 
Tried this as well..ended up with same error

Can you put the output of $ rocminfo here?

shridharkini6 · 2022-06-13T09:58:42Z

@shridharkini6
@xfyucg i followed your methods, looks to me training is using only CPU not GPU.
import torch
t = torch.tensor([5, 5, 5], dtype=torch.int64, device='cuda')
throws error like

Traceback (most recent call last):
File "", line 1, in
File "/opt/conda/lib/python3.7/site-packages/torch/cuda/init.py", line 216, in _lazy_init
torch._C._cuda_init()
RuntimeError: Found no NVIDIA driver on your system. Please check that you have an NVIDIA GPU and installed a driver from http://www.nvidia.com/Download/index.aspx

Thanks
Run like this, that works well on my Cezanne platform.
lang@lang-test:~/Videos/pytorch$ HSA_OVERRIDE_GFX_VERSION=9.0.0 python3
Python 3.8.10 (default, Mar 15 2022, 12:22:08)
[GCC 9.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import torch
>>> t = torch.tensor([5, 5, 5], dtype=torch.int64, device='cuda')
>>> 
Tried this as well..ended up with same error
Can you put the output of $ rocminfo here?

ROCk module is loaded

HSA System Attributes

Runtime Version: 1.1
System Timestamp Freq.: 1000.000000MHz
Sig. Max Wait Duration: 18446744073709551615 (0xFFFFFFFFFFFFFFFF) (timestamp count)
Machine Model: LARGE
System Endianness: LITTLE

==========
HSA Agents

Agent 1

Name: AMD Ryzen 7 4700U with Radeon Graphics
Uuid: CPU-XX
Marketing Name: AMD Ryzen 7 4700U with Radeon Graphics
Vendor Name: CPU
Feature: None specified
Profile: FULL_PROFILE
Float Round Mode: NEAR
Max Queue Number: 0(0x0)
Queue Min Size: 0(0x0)
Queue Max Size: 0(0x0)
Queue Type: MULTI
Node: 0
Device Type: CPU
Cache Info:
L1: 32768(0x8000) KB
Chip ID: 0(0x0)
Cacheline Size: 64(0x40)
Max Clock Freq. (MHz): 2000
BDFID: 0
Internal Node ID: 0
Compute Unit: 8
SIMDs per CU: 0
Shader Engines: 0
Shader Arrs. per Eng.: 0
WatchPts on Addr. Ranges:1
Features: None
Pool Info:
Pool 1
Segment: GLOBAL; FLAGS: FINE GRAINED
Size: 7612028(0x74267c) KB
Allocatable: TRUE
Alloc Granule: 4KB
Alloc Alignment: 4KB
Accessible by all: TRUE
Pool 2
Segment: GLOBAL; FLAGS: KERNARG, FINE GRAINED
Size: 7612028(0x74267c) KB
Allocatable: TRUE
Alloc Granule: 4KB
Alloc Alignment: 4KB
Accessible by all: TRUE
Pool 3
Segment: GLOBAL; FLAGS: COARSE GRAINED
Size: 7612028(0x74267c) KB
Allocatable: TRUE
Alloc Granule: 4KB
Alloc Alignment: 4KB
Accessible by all: TRUE
ISA Info:

Agent 2

Name: gfx90c
Uuid: GPU-XX
Marketing Name:
Vendor Name: AMD
Feature: KERNEL_DISPATCH
Profile: BASE_PROFILE
Float Round Mode: NEAR
Max Queue Number: 128(0x80)
Queue Min Size: 64(0x40)
Queue Max Size: 131072(0x20000)
Queue Type: MULTI
Node: 1
Device Type: GPU
Cache Info:
L1: 16(0x10) KB
L2: 1024(0x400) KB
Chip ID: 5686(0x1636)
Cacheline Size: 64(0x40)
Max Clock Freq. (MHz): 1600
BDFID: 1024
Internal Node ID: 1
Compute Unit: 7
SIMDs per CU: 4
Shader Engines: 1
Shader Arrs. per Eng.: 1
WatchPts on Addr. Ranges:4
Features: KERNEL_DISPATCH
Fast F16 Operation: TRUE
Wavefront Size: 64(0x40)
Workgroup Max Size: 1024(0x400)
Workgroup Max Size per Dimension:
x 1024(0x400)
y 1024(0x400)
z 1024(0x400)
Max Waves Per CU: 40(0x28)
Max Work-item Per CU: 2560(0xa00)
Grid Max Size: 4294967295(0xffffffff)
Grid Max Size per Dimension:
x 4294967295(0xffffffff)
y 4294967295(0xffffffff)
z 4294967295(0xffffffff)
Max fbarriers/Workgrp: 32
Pool Info:
Pool 1
Segment: GLOBAL; FLAGS: COARSE GRAINED
Size: 524288(0x80000) KB
Allocatable: TRUE
Alloc Granule: 4KB
Alloc Alignment: 4KB
Accessible by all: FALSE
Pool 2
Segment: GROUP
Size: 64(0x40) KB
Allocatable: FALSE
Alloc Granule: 0KB
Alloc Alignment: 0KB
Accessible by all: FALSE
ISA Info:
ISA 1
Name: amdgcn-amd-amdhsa--gfx90c:xnack-
Machine Models: HSA_MACHINE_MODEL_LARGE
Profiles: HSA_PROFILE_BASE
Default Rounding Mode: NEAR
Default Rounding Mode: NEAR
Fast f16: TRUE
Workgroup Max Size: 1024(0x400)
Workgroup Max Size per Dimension:
x 1024(0x400)
y 1024(0x400)
z 1024(0x400)
Grid Max Size: 4294967295(0xffffffff)
Grid Max Size per Dimension:
x 4294967295(0xffffffff)
y 4294967295(0xffffffff)
z 4294967295(0xffffffff)
FBarrier Max Size: 32
*** Done ***

langyuxf · 2022-06-13T13:25:09Z

@shridharkini6 Are you using docker?
If yes, try to start your docker like this.

sudo docker run -it --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --device=/dev/kfd --device=/dev/dri --group-add video --ipc=host --shm-size 8G rocm/pytorch:latest

shridharkini6 · 2022-06-14T08:29:40Z

@shridharkini6 Are you using docker? If yes, try to start your docker like this.
sudo docker run -it --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --device=/dev/kfd --device=/dev/dri --group-add video --ipc=host --shm-size 8G rocm/pytorch:latest

I have tried the same..used rocm/pytorch:latest-base docker.

langyuxf · 2022-06-15T05:17:19Z

@shridharkini6 Are you using docker? If yes, try to start your docker like this.
sudo docker run -it --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --device=/dev/kfd --device=/dev/dri --group-add video --ipc=host --shm-size 8G rocm/pytorch:latest
I have tried the same..used rocm/pytorch:latest-base docker.

According to https://docs.amd.com/bundle/AMD-Deep-Learning-Guide-v5.1.3/page/Deep_Learning_Frameworks.html
Option 3: Install PyTorch Using PyTorch ROCm Base Docker Image

docker pull rocm/pytorch:latest-base

NOTE This will download the base container, which does not contain PyTorch

So please use rocm/pytorch:latest

docker pull rocm/pytorch:latest

docker run -it --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --device=/dev/kfd --device=/dev/dri --group-add video --ipc=host --shm-size 8G rocm/pytorch:latest

sudo modprobe amdgpu ppfeaturemask=0xfff73fff

HSA_OVERRIDE_GFX_VERSION=9.0.0 python3

ffleader1 · 2022-06-15T05:24:38Z

@shridharkini6 Are you using docker? If yes, try to start your docker like this.
sudo docker run -it --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --device=/dev/kfd --device=/dev/dri --group-add video --ipc=host --shm-size 8G rocm/pytorch:latest
I have tried the same..used rocm/pytorch:latest-base docker.
According to https://docs.amd.com/bundle/AMD-Deep-Learning-Guide-v5.1.3/page/Deep_Learning_Frameworks.html Option 3: Install PyTorch Using PyTorch ROCm Base Docker Image

docker pull rocm/pytorch:latest-base

NOTE This will download the base container, which does not contain PyTorch

So please use rocm/pytorch:latest
docker pull rocm/pytorch:latest

docker run -it --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --device=/dev/kfd --device=/dev/dri --group-add video --ipc=host --shm-size 8G rocm/pytorch:latest

sudo modprobe amdgpu ppfeaturemask=0xfff73fff

HSA_OVERRIDE_GFX_VERSION=9.0.0 python3

His hardware is not supported, and so is your I think. APUs in general do not work. Docker won't change unsatisfied prerequisites hardware availability.

langyuxf · 2022-06-15T05:38:10Z

@shridharkini6 Are you using docker? If yes, try to start your docker like this.
sudo docker run -it --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --device=/dev/kfd --device=/dev/dri --group-add video --ipc=host --shm-size 8G rocm/pytorch:latest
I have tried the same..used rocm/pytorch:latest-base docker.
According to https://docs.amd.com/bundle/AMD-Deep-Learning-Guide-v5.1.3/page/Deep_Learning_Frameworks.html Option 3: Install PyTorch Using PyTorch ROCm Base Docker Image
docker pull rocm/pytorch:latest-base
NOTE This will download the base container, which does not contain PyTorch
So please use rocm/pytorch:latest
docker pull rocm/pytorch:latest

docker run -it --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --device=/dev/kfd --device=/dev/dri --group-add video --ipc=host --shm-size 8G rocm/pytorch:latest

sudo modprobe amdgpu ppfeaturemask=0xfff73fff

HSA_OVERRIDE_GFX_VERSION=9.0.0 python3
His hardware is not supported, and so is your I think. APUs in general do not work. Docker won't change unsatisfied prerequisites hardware availability.

No, they use same ISA with gfx900. So for gfx90c, just override it to gfx900. That actually works.
He uses rocm/pytorch:latest-base, so he must build pytorch for rocm.

shridharkini6 · 2022-06-15T11:42:35Z

@xfyucg i have followed all the procedures you suggested. i.e used rocm/pytorch:latest-base and compiled pytorch from source. but get the same error

Traceback (most recent call last):
File "", line 1, in
File "/opt/conda/lib/python3.7/site-packages/torch/cuda/init.py", line 216, in _lazy_init
torch._C._cuda_init()
RuntimeError: Found no NVIDIA driver on your system. Please check that you have an NVIDIA GPU and installed a driver from http://www.nvidia.com/Download/index.aspx

langyuxf · 2022-06-15T12:18:18Z

@xfyucg i have followed all the procedures you suggested. i.e used rocm/pytorch:latest-base and compiled pytorch from source. but get the same error
Traceback (most recent call last):
File "", line 1, in
File "/opt/conda/lib/python3.7/site-packages/torch/cuda/init.py", line 216, in _lazy_init
torch._C._cuda_init()
RuntimeError: Found no NVIDIA driver on your system. Please check that you have an NVIDIA GPU and installed a driver from http://www.nvidia.com/Download/index.aspx

May be some environment issues, it's hard to debug. It's error-prone to build pytorch by yourself.
Why not use rocm/pytorch:latest? It's simple and also the recommended way.

shridharkini6 · 2022-06-20T05:39:50Z

@xfyucg i have followed all the procedures you suggested. i.e used rocm/pytorch:latest-base and compiled pytorch from source. but get the same error
Traceback (most recent call last):
File "", line 1, in
File "/opt/conda/lib/python3.7/site-packages/torch/cuda/init.py", line 216, in _lazy_init
torch._C._cuda_init()
RuntimeError: Found no NVIDIA driver on your system. Please check that you have an NVIDIA GPU and installed a driver from http://www.nvidia.com/Download/index.aspx
May be some environment issues, it's hard to debug. It's error-prone to build pytorch by yourself. Why not use rocm/pytorch:latest? It's simple and also the recommended way.

@xfyucg yes i tried with rocm/pytorch:latest also. it throws similar errors. i hope it could be issues with base libraries as @Bengt mentioned.

langyuxf · 2022-06-20T06:22:55Z

@xfyucg i have followed all the procedures you suggested. i.e used rocm/pytorch:latest-base and compiled pytorch from source. but get the same error
Traceback (most recent call last):
File "", line 1, in
File "/opt/conda/lib/python3.7/site-packages/torch/cuda/init.py", line 216, in _lazy_init
torch._C._cuda_init()
RuntimeError: Found no NVIDIA driver on your system. Please check that you have an NVIDIA GPU and installed a driver from http://www.nvidia.com/Download/index.aspx
May be some environment issues, it's hard to debug. It's error-prone to build pytorch by yourself. Why not use rocm/pytorch:latest? It's simple and also the recommended way.
@xfyucg yes i tried with rocm/pytorch:latest also. it throws similar errors. i hope it could be issues with base libraries as @Bengt mentioned.

No. If you install and start docker(rocm/pytorch:latest) correctly, you will get the error like following.

root@0f962c3a9d38:/var/lib/jenkins# python3
Python 3.7.13 (default, Mar 29 2022, 02:18:16)
[GCC 7.5.0] :: Anaconda, Inc. on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import torch
>>> torch.cuda.is_available()
"hipErrorNoBinaryForGpu: Unable to find code object for all current devices!"
Aborted (core dumped)
root@0f962c3a9d38:~#

After override gfx90c to gfx900

root@0f962c3a9d38:/var/lib/jenkins# HSA_OVERRIDE_GFX_VERSION=9.0.0 python3
Python 3.7.13 (default, Mar 29 2022, 02:18:16)
[GCC 7.5.0] :: Anaconda, Inc. on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import torch
>>> torch.cuda.is_available()
True
>>>

langyuxf · 2022-06-22T06:51:07Z

Make sure amdgpu kernel mode driver is installed. If you use a generic kernel on Ubuntu 20.04, install amdgpu kernel mode driver as following.

sudo apt-get update
wget https://repo.radeon.com/amdgpu-install/22.10.3/ubuntu/focal/amdgpu-install_22.10.3.50103-1_all.deb 
sudo apt-get install ./amdgpu-install_22.10.3.50103-1_all.deb

amdgpu-install --usecase=dkms

lucasew · 2022-12-28T14:44:40Z

Try updating your system's kernel to a version newer than 6.0 and run the commands setting the following environment variable:

HSA_OVERRIDE_GFX_VERSION=9.0.0

You can use export HSA_OVERRIDE_GFX_VERSION=9.0.0 in the shell you are running the commands to propagate the environment variable to child processes. That's what allowed the rocm/pytorch container to not crash on import or crash when doing simple tensor operations like torch.tensor([[1,2],[3,4]]).to(torch.device('cuda')).

I tested this on NixOS, branch 22.11, kernel 6.0.13 and latest rocm/pytorch container with a Ryzen 5600G.

jithunnair-amd · 2023-08-07T02:57:06Z

CC @hongxiayang

abhimeda · 2024-01-26T05:35:17Z

@shridharkini6 Hi, is your issue resolved on the latest ROCm? If so can we close this ticket?

nixrunner · 2024-04-22T12:58:08Z

Is this still applicable to latest ROCm?

ppanchad-amd · 2024-06-17T19:24:42Z

@shridharkini6 Unfortunately your APU (gfx90c) is not currently supported in the latest ROCm. Thanks!

langyuxf mentioned this issue Aug 5, 2022

[ROCm 5.0/OpenCL] Immediate crash #1673

Closed

Madouura mentioned this issue Feb 7, 2023

[Tracking] ROCm packages NixOS/nixpkgs#197885

Open

34 tasks

daineAMD mentioned this issue Mar 7, 2024

gfx90c ROCm/rocBLAS#1398

Closed

Jarauvi mentioned this issue Apr 25, 2024

GPU gfx90c support? fyhertz/rocm-wyoming-whisper#1

Open

ppanchad-amd closed this as not planned Won't fix, can't repro, duplicate, stale Jun 17, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ROCm 5.xx ever planning to include gfx90c GPUs? #1743

ROCm 5.xx ever planning to include gfx90c GPUs? #1743

shridharkini6 commented May 23, 2022

Bengt commented May 27, 2022 •

edited

Loading

ffleader1 commented May 28, 2022

AGenchev commented May 31, 2022 •

edited

Loading

langyuxf commented Jun 8, 2022

langyuxf commented Jun 9, 2022

ffleader1 commented Jun 9, 2022

langyuxf commented Jun 9, 2022 •

edited

Loading

ffleader1 commented Jun 9, 2022 •

edited

Loading

langyuxf commented Jun 9, 2022

ffleader1 commented Jun 9, 2022

shridharkini6 commented Jun 13, 2022

langyuxf commented Jun 13, 2022 •

edited

Loading

shridharkini6 commented Jun 13, 2022

langyuxf commented Jun 13, 2022

shridharkini6 commented Jun 13, 2022

langyuxf commented Jun 13, 2022

shridharkini6 commented Jun 14, 2022

langyuxf commented Jun 15, 2022 •

edited

Loading

ffleader1 commented Jun 15, 2022 •

edited

Loading

langyuxf commented Jun 15, 2022

shridharkini6 commented Jun 15, 2022

langyuxf commented Jun 15, 2022

shridharkini6 commented Jun 20, 2022

langyuxf commented Jun 20, 2022

langyuxf commented Jun 22, 2022 •

edited

Loading

lucasew commented Dec 28, 2022 •

edited

Loading

jithunnair-amd commented Aug 7, 2023

abhimeda commented Jan 26, 2024

nixrunner commented Apr 22, 2024

ppanchad-amd commented Jun 17, 2024

ROCm 5.xx ever planning to include gfx90c GPUs? #1743

ROCm 5.xx ever planning to include gfx90c GPUs? #1743

Comments

shridharkini6 commented May 23, 2022

Bengt commented May 27, 2022 • edited Loading

ffleader1 commented May 28, 2022

AGenchev commented May 31, 2022 • edited Loading

langyuxf commented Jun 8, 2022

langyuxf commented Jun 9, 2022

ffleader1 commented Jun 9, 2022

langyuxf commented Jun 9, 2022 • edited Loading

ffleader1 commented Jun 9, 2022 • edited Loading

langyuxf commented Jun 9, 2022

ffleader1 commented Jun 9, 2022

shridharkini6 commented Jun 13, 2022

langyuxf commented Jun 13, 2022 • edited Loading

shridharkini6 commented Jun 13, 2022

langyuxf commented Jun 13, 2022

shridharkini6 commented Jun 13, 2022

ROCk module is loaded

HSA System Attributes

========== HSA Agents

langyuxf commented Jun 13, 2022

shridharkini6 commented Jun 14, 2022

langyuxf commented Jun 15, 2022 • edited Loading

ffleader1 commented Jun 15, 2022 • edited Loading

langyuxf commented Jun 15, 2022

shridharkini6 commented Jun 15, 2022

langyuxf commented Jun 15, 2022

shridharkini6 commented Jun 20, 2022

langyuxf commented Jun 20, 2022

langyuxf commented Jun 22, 2022 • edited Loading

lucasew commented Dec 28, 2022 • edited Loading

jithunnair-amd commented Aug 7, 2023

abhimeda commented Jan 26, 2024

nixrunner commented Apr 22, 2024

ppanchad-amd commented Jun 17, 2024

Bengt commented May 27, 2022 •

edited

Loading

AGenchev commented May 31, 2022 •

edited

Loading

langyuxf commented Jun 9, 2022 •

edited

Loading

ffleader1 commented Jun 9, 2022 •

edited

Loading

langyuxf commented Jun 13, 2022 •

edited

Loading

==========
HSA Agents

langyuxf commented Jun 15, 2022 •

edited

Loading

ffleader1 commented Jun 15, 2022 •

edited

Loading

langyuxf commented Jun 22, 2022 •

edited

Loading

lucasew commented Dec 28, 2022 •

edited

Loading