ROCm 5.xx ever planning to include GFX1012 navi 14 RDNA GPUs? #1735

darkar18 · 2022-05-06T16:50:24Z

I know there has been numerous occurrences of issues opened where people having NAVI 14 model (gfx1012) architectures are having trouble using GPU accelerated ML frameworks from last 2 years. I do with all respect believes that it is high time that ROCm team need to work on a built for making High Performance Computing available for RX5500M,RX5500XT,RX5500,RX5600,RX5600XT GPUs

ROCM 5.xx have been successfully installed and built according to manuals but what use does it provide if I cant use GPU acceleration on Supported versions of pytorch with this?
With the help of ROCm stack, I believe is a platform that was created to bring AMD GPU cards in reply to Nvidia's Dominance in this field.
It is hence so surprising for me to know that ROCm stack supports Nvidia GPUs in ML frameworks but not Native AMD GPUs,
There are many users using RDNA GPUs and it was not a right decision for radeon team to skip RDNA and jump straight to RDNA2/3 cards.
Our trust in you is greatly at stake!!!

saadrahim · 2022-05-17T01:48:02Z

I apologize to you in advance. There is no plan to officially add support for gfx1012 in ROCm.

The only option available for users with Nav1x series GPUs is to build from source. You may get a working solution by building from source. Confirming that the numerical accuracy of the stack is sufficient is left to the end user.

darkar18 · 2022-06-02T18:14:01Z

Could you attach some manuals or references so that we can try it out ourselves?

hanzy1110 · 2022-08-14T15:12:01Z

Any guide/resource to build the stack from source?
Thanks!!

erkinalp · 2022-11-13T19:52:59Z

Certain parts build unmodified, others build with certain patches applied. See https://github.com/xuhuisheng/rocm-build/tree/master/navi14

darkar18 · 2022-11-14T05:36:00Z

Hey thanks for resources! Have you tested it? How about performance and compatibility? Get Outlook for Android<https://aka.ms/AAb9ysg>

…

________________________________ From: Erkin Alp Güney ***@***.***> Sent: Monday, November 14, 2022 1:23:10 AM To: RadeonOpenCompute/ROCm ***@***.***> Cc: Alex v Ajith ***@***.***>; Author ***@***.***> Subject: Re: [RadeonOpenCompute/ROCm] ROCm 5.xx ever planning to include GFX1012 navi 14 RDNA GPUs? (Issue #1735) Certain parts build unmodified, others build with certain patches applied. See https://github.com/xuhuisheng/rocm-build/tree/master/navi14 — Reply to this email directly, view it on GitHub<#1735 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/APVGBEEO5PPXNMI5XMIJLS3WIFBKNANCNFSM5VIWF3JA>. You are receiving this because you authored the thread.Message ID: ***@***.***>

serhii-nakon · 2023-05-06T22:35:51Z

Hello, you can build OpenCL backend for CPU version of PyTorch from this repository https://github.com/artyom-beilis/pytorch_dlprim , I have tested it and it works with my RX5500M

I am going to open pull request to add Dockerfile and docker-compose.yaml to quicker build it and test/work

set-soft · 2023-05-15T00:11:41Z

I apologize to you in advance. There is no plan to officially add support for gfx1012 in ROCm.

Hi @saadrahim! I can understand AMD doesn't have enough resources and/or interest on officially supporting these boards. But we all know that faking the card model makes them usable, the only big annoyance is the lack of a .kdb file with the precompiled kernels. This makes the start-up of things like the Stable Diffusion WebUI to take around 3 minutes in my system. Perhaps some unofficial .kbd files could help people. Just a wish.

saadrahim · 2023-05-15T02:49:28Z

Can you let me know what .kdb file you are looking for? I am not familiar with what is missing that is causing your delays.

I was impressed to see the work behind https://github.com/xuhuisheng/rocm-build/tree/master/navi14.

set-soft · 2023-05-15T09:35:31Z

Can you let me know what .kdb file you are looking for? I am not familiar with what is missing that is causing your delays.

The error message states that I need to install gfx1030_11.kdb, containing precompiled kernels.
For some reason the MIOpen lib included with the PyTorch 1.13.1 doesn't cache anything, I took a look at ~/.cache/miopen and found just an empty dir.
Not having the precompiled kernels and not having anything cached means that every time I start a fresh docker container the first run of Stable Diffusion WebUI takes 3 to 4 minutes to compile them. Well, that at least the information that all the sources I found explain.
Of course I can't install gfx1030_11.kdb, it doesn't exist, and is even a fake name. I'm forcing HSA_OVERRIDE_GFX_VERSION=10.3.0 to get unofficial support for Navi 14 (gfx1012). I also think that this is not just a matter of names, I can't take gfx1030_36 data base and rename it to gfx1030_11.
This is why I'm asking if there is a chance to get unofficial (use at your own risk) databases for Navi14.
Note: I'm not using PyTorch 2.x because I get a memory fault, I tried 2.0.0, 2.0.1 and a 2.1.0 nightly. All of them die running python micro_benchmarking_pytorch.py --network alexnet (same with Satble Diffusion, but the benchmark is easier to run on a fresh docker image that is half finished).

I was impressed to see the work behind https://github.com/xuhuisheng/rocm-build/tree/master/navi14.

Thanks, I saw it. I didn't try it yet because compiling the whole ROCm + PyTorch looks like an adventure to me, I need to free an unknown amount of disk space (if the official docker images for ROCm + Pytorch takes 29 GiB I can't even imagine how much disk space I'll need for the code with the debug symbols and repeated at least twice, objects and lib). I have only 16 GiB of RAM and it will use the swap a lot, I have 24 GiB of SDD swap which I guess will do.

amayra · 2023-05-27T01:56:01Z

as the owner of RX 5500 xt 8gb this is my first AMD GPU and my last one too

for stable diffusion and CUDA I'm going with NV GPU in my next upgrade

I hope AMD thinks more about why its always second place in GPU markets

set-soft · 2023-05-29T10:02:15Z

Hi @amayra !
I got the impression that ROCm is aimed at the data center target, not the personal computer segment.
Also looks like AMD doesn't dedicate resources to make it usable for the desktop.
They could use volunteers just giving away some hardware and also dedicate a few people to coordinate enthusiasts, but they don't.
Big corporations usually fail at this, they only see the big numbers, and don't realize that they must have a wider target, after all nobody knows if those users will be the ones that will be making the big buying decisions in the near future.
I also think that they aren't paying attention to boards that will give poor cost to performance ratios, just because these ratios could be used against them. And RX 5500 XT (without half precision floating point support) may be the case.

amayra · 2023-05-30T13:10:11Z

Hi @amayra ! I got the impression that ROCm is aimed at the data center target, not the personal computer segment. Also looks like AMD doesn't dedicate resources to make it usable for the desktop. They could use volunteers just giving away some hardware and also dedicate a few people to coordinate enthusiasts, but they don't. Big corporations usually fail at this, they only see the big numbers, and don't realize that they must have a wider target, after all nobody knows if those users will be the ones that will be making the big buying decisions in the near future. I also think that they aren't paying attention to boards that will give poor cost to performance ratios, just because these ratios could be used against them. And RX 5500 XT (without half precision floating point support) may be the case.

it's sad to see CUDA work for most Nvidia GPU work and even RX 5700 XT is not support here with ROCm

serhii-nakon · 2023-07-23T15:12:58Z

Hello, I have just installed Debian 12, and https://packages.debian.org/bookworm/hipcc package then I successfully can use HIP inside Blender for my RX5500M

serhii-nakon · 2023-07-23T16:47:45Z

Looks like Debian built rocm with support RX5500 by default

darkar18 · 2023-07-24T02:10:29Z

Thanks a lot for sharing, so does this mean I need to install rocm5.x later after Debian installation or is it out of the box Get Outlook for Android<https://aka.ms/AAb9ysg>

…

________________________________ From: serhii-nakon ***@***.***> Sent: Sunday, July 23, 2023 10:17:56 PM To: RadeonOpenCompute/ROCm ***@***.***> Cc: Alex v Ajith ***@***.***>; Author ***@***.***> Subject: Re: [RadeonOpenCompute/ROCm] ROCm 5.xx ever planning to include GFX1012 navi 14 RDNA GPUs? (Issue #1735) Looks like Debian built rocm with support RX5500 by default — Reply to this email directly, view it on GitHub<#1735 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/APVGBEGQQYZUF6HS335NYC3XRVITZANCNFSM5VIWF3JA>. You are receiving this because you authored the thread.Message ID: ***@***.***>

serhii-nakon · 2023-07-24T07:38:13Z

You just need install this https://packages.debian.org/bookworm/hipcc package after install Debian, then It should install all other packages like dependency.

Also looks like Arch Linux also provide ROCm with all cards by default.

serhii-nakon · 2023-07-28T14:32:41Z

Simple update, by default ROCm very partially support HIP on GFX1012
I just re-compile PyTorch with support GFX1012 and now it require files while run mnist test and this files not exist for this card.

PS: Looks like this patch return those required files https://github.com/xuhuisheng/rocm-build/blob/master/patch/22.tensile-gfx1012-1.patch

serhii-nakon · 2023-07-28T14:39:58Z

Here docker-compose and dockerfile that I used to test docker.zip
When I will have enough time I will try https://github.com/xuhuisheng/rocm-build - looks like it should work almost fine

serhii-nakon · 2023-07-28T14:59:49Z

PSS: I successfully complete this mnist test using this OpenCL backend from this project https://github.com/artyom-beilis/pytorch_dlprim

Here docker file with PyTorch and this backend
docker_cl.zip

serhii-nakon · 2023-08-09T11:11:53Z

I successfully re-built ROCm 5.4.3 rccl, rocsparse, rocblas, rocfft, rocrand (all other components are default), PyTorch 2.1 and complete Mnist test using HIP/ROCm

PS: To build PyTorch you need at least 32GB RAM (or swap)

Docker_rocm543_Pytorch21git.zip

set-soft · 2023-08-09T12:31:51Z

Hi @serhii-nakon !
Can you upload the docker image to dockerhub or the GitHub registry? This might save a lot of time to other people, specially for people with less than 32 GiB of RAM.

serhii-nakon · 2023-08-09T12:44:09Z

Possible I will do it, but need simple refactor to minimize size of container

darkar18 · 2023-08-09T14:25:24Z

@serhii-nakon thank you so much for your efforts!!!. Is there any way I (with 8gb Ubuntu) can use rocm?

serhii-nakon · 2023-08-09T14:27:27Z

You can use already pre-built Docker image but you can not build with 8GB RAM

darkar18 · 2023-08-09T14:43:53Z

Will the future versions support build with 8gb ram? Will you guide me if theres anything that I can do

serhii-nakon · 2023-08-09T14:54:47Z

I mean that if I upload Docker image with already pre-build PyTorch and ROCm you can use it, but main issue it re-build PyTorch from source because it use 8-25GB of RAM while building.

serhii-nakon · 2023-08-09T21:23:33Z

I have uploaded this image, please test also PyTorch Audio because I have not enough time to do it (only compiled)
https://hub.docker.com/r/serhiin/rocm_gfx1012_pytorch

serhii-nakon · 2023-08-11T19:37:59Z

PS: Sorry for my mistake I added only one tag and it non latest that's why you can not pull it (I update description with correct instruction), please use this tag ubuntu2004_rocm543_pytorch21 or full example serhiin/rocm_gfx1012_pytorch:ubuntu2004_rocm543_pytorch21

serhii-nakon · 2023-08-11T21:05:10Z

I have tested it with diffusers and it works, but sometimes out of memory due 4GB VRAM

megumintyan · 2023-08-22T10:43:31Z

@serhii-nakon can you post pytorch .whl files?

serhii-nakon · 2023-08-22T11:17:45Z

@megumintyan You can extract it from Docker container. But also you need rebuild some part of ROCm (Docker already has it).

Better to use Docker for it where all parts already configured and just works.

megumintyan · 2023-08-22T12:54:23Z

@serhii-nakon I can't find it inside the container. Also it takes up 70gb

serhii-nakon · 2023-08-22T18:45:44Z

@megumintyan My mistake, I did not build whl files

serhii-nakon · 2023-09-01T16:54:48Z

I uploaded build with Ubuntu 22.04 and minimized container/image size (10GB uncompressed and 2GB compressed size)
https://hub.docker.com/r/serhiin/rocm_gfx1012_pytorch/tags

kernel2008 · 2023-10-01T02:07:30Z

I uploaded build with Ubuntu 22.04 and minimized container/image size (10GB uncompressed and 2GB compressed size) https://hub.docker.com/r/serhiin/rocm_gfx1012_pytorch/tags

@erhii-nakon thinks.When using rocm_gfx1012_pytorch image on the AMD Radeon Pro W5500 device, the gpu device cannot be used by torch

sudo rocminfo
Agent 2 Name: gfx1012 Uuid: GPU-XX Marketing Name: AMD Radeon Pro W5500
rocm-smi
======================= ROCm System Management Interface ======================= ================================= Concise Info ================================= GPU Temp (DieEdge) AvgPwr SCLK MCLK Fan Perf PwrCap VRAM% GPU% 0 36.0c 3.0W 0Mhz 100Mhz 24.71% auto 105.0W 3% 0% ================================================================================ ============================= End of ROCm SMI Log ==============================
pytorch
`

import torch
torch.cuda.is_available()
False
`

serhii-nakon · 2023-10-01T08:54:49Z

I uploaded build with Ubuntu 22.04 and minimized container/image size (10GB uncompressed and 2GB compressed size) https://hub.docker.com/r/serhiin/rocm_gfx1012_pytorch/tags

@erhii-nakon thinks.When using rocm_gfx1012_pytorch image on the AMD Radeon Pro W5500 device, the gpu device cannot be used by torch

sudo rocminfo
Agent 2 Name: gfx1012 Uuid: GPU-XX Marketing Name: AMD Radeon Pro W5500

rocm-smi
======================= ROCm System Management Interface ======================= ================================= Concise Info ================================= GPU Temp (DieEdge) AvgPwr SCLK MCLK Fan Perf PwrCap VRAM% GPU% 0 36.0c 3.0W 0Mhz 100Mhz 24.71% auto 105.0W 3% 0% ================================================================================ ============================= End of ROCm SMI Log ==============================

pytorch
`

import torch
torch.cuda.is_available()
False
`

Please check permissions inside container like described on page on DockerHub. Also make sure that your provided /dev/dri/* and /dev/kfd devices.

serhii-nakon · 2023-10-01T08:57:36Z

Possible you need to upgrade or downgrade kernel or firmwares (I use linux 6.4 and latest amd's firmware). Also make sure that your CPU support atomics (I know that I it required)

kernel2008 · 2023-10-04T08:20:14Z

Possible you need to upgrade or downgrade kernel or firmwares (I use linux 6.4 and latest amd's firmware). Also make sure that your CPU support atomics (I know that I it required)

@serhii-nakon Thanks a lot! I can run torch mnist example on radeon pro w5500,but the Debian12 kernel crashes intermittently during execution(python mnist/main.py or other app). The crashes also occur when I set multi-user.target mode.

hardware
AMD Ryzen 5 2600+asus b450m+amd radeon pro w5500
uname -a
Linux debma12 6.1.0-12-amd64 #1 SMP PREEMPT_DYNAMIC Debian 6.1.52-1 (2023-09-07) x86_64 GNU/Linux
blender use amd gpu for render ok
docker run -it --rm --device=/dev/kfd --device=/dev/dri --security-opt seccomp=unconfined --group-add video --entrypoint /bin/bash serhiin/rocm_gfx1012_pytorch:ubuntu2204_rocm543_pytorch21
`

import torch
torch.cuda.is_available()
True
`

serhii-nakon · 2023-10-04T09:10:05Z

@kernel2008 If you have few cards it can cause crashes.

fir3-1ce · 2024-01-09T11:58:18Z

Has this been fixed yet? I have Navi14 but rocminfo doesn't show any errors. So does that mean it works? OpenCL is still broken on Ubuntu, trying to weigh my options here.

cgmb · 2024-02-14T23:02:28Z

Navi 14 is not supported by AMD's official packages, but it is enabled by default in the OS packages for ROCm provided on Debian 13 and Ubuntu 23.10 and later. However, not all libraries provided by ROCm have been packaged in this way. The libraries available are sufficient to run AI tools like llama-cpp on Navi 14 hardware, but not PyTorch.

serhii-nakon · 2024-03-28T10:36:15Z

@cgmb Can you provide how long ROCm team support every card? For example how long RX7900XTX will be supported?

PS: I want to buy this, but not sure, because worry does RX7900XTX not cause the same problem like with RX5000

cgmb · 2024-04-02T18:25:26Z

@cgmb Can you provide how long ROCm team support every card? For example how long RX7900XTX will be supported?

Unfortunately, I do not know that myself.

PS: I want to buy this, but not sure, because worry does RX7900XTX not cause the same problem like with RX5000

Well, there's two things that are different:

The RX 7900 XTX has official support for ROCm from AMD, which is something that the RX 5000 series never had. When considering Navi 31 support, I think Navi 21 is a better comparison than Navi 10, 12 or 14.
There is a much larger community using ROCm these days. I'm hopeful this will help to prevent regressions beyond AMD's official support cycles. That is partly why I've been helping to build the Debian ROCm Team's CI system.

Speaking of which, I recently added a Radeon Pro W5700 (gfx1010) worker to the Debian ROCm CI. The results on that worker should be mostly representative of all RDNA 1 hardware, but I do have a Radeon Pro W5500 (gfx1012) that I would like to add. However, I need to either figure out how to get PCIe passthrough working with the W5500 or buy a server dedicated to testing that GPU.

If anyone in the community knows what tricks are needed to get PCIe passthrough working for the W5500, W5700, or MI60 on Debian 12, that may help me to reuse an existing server for testing gfx1012.

serhii-nakon · 2024-04-02T20:04:19Z

@cgmb Hello, thank you for your answer very much, I thought that it had official support in past but looks like no. Now it make more sense why no support for now.

FCLC mentioned this issue May 17, 2022

Navi 1 w5700 w5500 Radeon Pro v520 ROCm support in 5.x series #1706

Open

serhii-nakon mentioned this issue Apr 2, 2024

Enable gfx1010 by default in rocSPARSE builds ROCm/rocSPARSE#348

Closed

Zakhrov mentioned this issue Apr 24, 2024

Regression in rocm 5.3 and newer for gfx1010 #2527

Open

ROCm 5.xx ever planning to include GFX1012 navi 14 RDNA GPUs? #1735

ROCm 5.xx ever planning to include GFX1012 navi 14 RDNA GPUs? #1735

Comments

darkar18 commented May 6, 2022

saadrahim commented May 17, 2022

darkar18 commented Jun 2, 2022

hanzy1110 commented Aug 14, 2022

erkinalp commented Nov 13, 2022

darkar18 commented Nov 14, 2022 via email

serhii-nakon commented May 6, 2023 • edited

set-soft commented May 15, 2023

saadrahim commented May 15, 2023

set-soft commented May 15, 2023

amayra commented May 27, 2023

set-soft commented May 29, 2023

amayra commented May 30, 2023

serhii-nakon commented Jul 23, 2023

serhii-nakon commented Jul 23, 2023

darkar18 commented Jul 24, 2023 via email

serhii-nakon commented Jul 24, 2023 • edited

serhii-nakon commented Jul 28, 2023 • edited

serhii-nakon commented Jul 28, 2023 • edited

serhii-nakon commented Jul 28, 2023

serhii-nakon commented Aug 9, 2023 • edited

set-soft commented Aug 9, 2023

serhii-nakon commented Aug 9, 2023

darkar18 commented Aug 9, 2023

serhii-nakon commented Aug 9, 2023

darkar18 commented Aug 9, 2023

serhii-nakon commented Aug 9, 2023

serhii-nakon commented Aug 9, 2023

serhii-nakon commented Aug 11, 2023 • edited

serhii-nakon commented Aug 11, 2023

megumintyan commented Aug 22, 2023

serhii-nakon commented Aug 22, 2023 • edited

megumintyan commented Aug 22, 2023

serhii-nakon commented Aug 22, 2023 • edited

serhii-nakon commented Sep 1, 2023 • edited

kernel2008 commented Oct 1, 2023 • edited

serhii-nakon commented Oct 1, 2023

serhii-nakon commented Oct 1, 2023

kernel2008 commented Oct 4, 2023

serhii-nakon commented Oct 4, 2023

fir3-1ce commented Jan 9, 2024

cgmb commented Feb 14, 2024 • edited

serhii-nakon commented Mar 28, 2024 • edited

cgmb commented Apr 2, 2024

serhii-nakon commented Apr 2, 2024 • edited

serhii-nakon commented May 6, 2023 •

edited

serhii-nakon commented Jul 24, 2023 •

edited

serhii-nakon commented Jul 28, 2023 •

edited

serhii-nakon commented Jul 28, 2023 •

edited

serhii-nakon commented Aug 9, 2023 •

edited

serhii-nakon commented Aug 11, 2023 •

edited

serhii-nakon commented Aug 22, 2023 •

edited

serhii-nakon commented Aug 22, 2023 •

edited

serhii-nakon commented Sep 1, 2023 •

edited

kernel2008 commented Oct 1, 2023 •

edited

cgmb commented Feb 14, 2024 •

edited

serhii-nakon commented Mar 28, 2024 •

edited

serhii-nakon commented Apr 2, 2024 •

edited