Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ROCm 5.xx ever planning to include GFX1012 navi 14 RDNA GPUs? #1735

Open
darkar18 opened this issue May 6, 2022 · 44 comments
Open

ROCm 5.xx ever planning to include GFX1012 navi 14 RDNA GPUs? #1735

darkar18 opened this issue May 6, 2022 · 44 comments

Comments

@darkar18
Copy link

darkar18 commented May 6, 2022

I know there has been numerous occurrences of issues opened where people having NAVI 14 model (gfx1012) architectures are having trouble using GPU accelerated ML frameworks from last 2 years. I do with all respect believes that it is high time that ROCm team need to work on a built for making High Performance Computing available for RX5500M,RX5500XT,RX5500,RX5600,RX5600XT GPUs

ROCM 5.xx have been successfully installed and built according to manuals but what use does it provide if I cant use GPU acceleration on Supported versions of pytorch with this?
With the help of ROCm stack, I believe is a platform that was created to bring AMD GPU cards in reply to Nvidia's Dominance in this field.
It is hence so surprising for me to know that ROCm stack supports Nvidia GPUs in ML frameworks but not Native AMD GPUs,
There are many users using RDNA GPUs and it was not a right decision for radeon team to skip RDNA and jump straight to RDNA2/3 cards.
Our trust in you is greatly at stake!!!

@saadrahim
Copy link
Member

I apologize to you in advance. There is no plan to officially add support for gfx1012 in ROCm.

The only option available for users with Nav1x series GPUs is to build from source. You may get a working solution by building from source. Confirming that the numerical accuracy of the stack is sufficient is left to the end user.

@darkar18
Copy link
Author

darkar18 commented Jun 2, 2022

Could you attach some manuals or references so that we can try it out ourselves?

@hanzy1110
Copy link

Any guide/resource to build the stack from source?
Thanks!!

@erkinalp
Copy link

Certain parts build unmodified, others build with certain patches applied. See https://github.com/xuhuisheng/rocm-build/tree/master/navi14

@darkar18
Copy link
Author

darkar18 commented Nov 14, 2022 via email

@serhii-nakon
Copy link

serhii-nakon commented May 6, 2023

Hello, you can build OpenCL backend for CPU version of PyTorch from this repository https://github.com/artyom-beilis/pytorch_dlprim , I have tested it and it works with my RX5500M

I am going to open pull request to add Dockerfile and docker-compose.yaml to quicker build it and test/work

@set-soft
Copy link

I apologize to you in advance. There is no plan to officially add support for gfx1012 in ROCm.

Hi @saadrahim! I can understand AMD doesn't have enough resources and/or interest on officially supporting these boards. But we all know that faking the card model makes them usable, the only big annoyance is the lack of a .kdb file with the precompiled kernels. This makes the start-up of things like the Stable Diffusion WebUI to take around 3 minutes in my system. Perhaps some unofficial .kbd files could help people. Just a wish.

@saadrahim
Copy link
Member

Can you let me know what .kdb file you are looking for? I am not familiar with what is missing that is causing your delays.

I was impressed to see the work behind https://github.com/xuhuisheng/rocm-build/tree/master/navi14.

@set-soft
Copy link

Can you let me know what .kdb file you are looking for? I am not familiar with what is missing that is causing your delays.

The error message states that I need to install gfx1030_11.kdb, containing precompiled kernels.
For some reason the MIOpen lib included with the PyTorch 1.13.1 doesn't cache anything, I took a look at ~/.cache/miopen and found just an empty dir.
Not having the precompiled kernels and not having anything cached means that every time I start a fresh docker container the first run of Stable Diffusion WebUI takes 3 to 4 minutes to compile them. Well, that at least the information that all the sources I found explain.
Of course I can't install gfx1030_11.kdb, it doesn't exist, and is even a fake name. I'm forcing HSA_OVERRIDE_GFX_VERSION=10.3.0 to get unofficial support for Navi 14 (gfx1012). I also think that this is not just a matter of names, I can't take gfx1030_36 data base and rename it to gfx1030_11.
This is why I'm asking if there is a chance to get unofficial (use at your own risk) databases for Navi14.
Note: I'm not using PyTorch 2.x because I get a memory fault, I tried 2.0.0, 2.0.1 and a 2.1.0 nightly. All of them die running python micro_benchmarking_pytorch.py --network alexnet (same with Satble Diffusion, but the benchmark is easier to run on a fresh docker image that is half finished).

I was impressed to see the work behind https://github.com/xuhuisheng/rocm-build/tree/master/navi14.

Thanks, I saw it. I didn't try it yet because compiling the whole ROCm + PyTorch looks like an adventure to me, I need to free an unknown amount of disk space (if the official docker images for ROCm + Pytorch takes 29 GiB I can't even imagine how much disk space I'll need for the code with the debug symbols and repeated at least twice, objects and lib). I have only 16 GiB of RAM and it will use the swap a lot, I have 24 GiB of SDD swap which I guess will do.

@amayra
Copy link

amayra commented May 27, 2023

as the owner of RX 5500 xt 8gb this is my first AMD GPU and my last one too

for stable diffusion and CUDA I'm going with NV GPU in my next upgrade

I hope AMD thinks more about why its always second place in GPU markets

@set-soft
Copy link

Hi @amayra !
I got the impression that ROCm is aimed at the data center target, not the personal computer segment.
Also looks like AMD doesn't dedicate resources to make it usable for the desktop.
They could use volunteers just giving away some hardware and also dedicate a few people to coordinate enthusiasts, but they don't.
Big corporations usually fail at this, they only see the big numbers, and don't realize that they must have a wider target, after all nobody knows if those users will be the ones that will be making the big buying decisions in the near future.
I also think that they aren't paying attention to boards that will give poor cost to performance ratios, just because these ratios could be used against them. And RX 5500 XT (without half precision floating point support) may be the case.

@amayra
Copy link

amayra commented May 30, 2023

Hi @amayra ! I got the impression that ROCm is aimed at the data center target, not the personal computer segment. Also looks like AMD doesn't dedicate resources to make it usable for the desktop. They could use volunteers just giving away some hardware and also dedicate a few people to coordinate enthusiasts, but they don't. Big corporations usually fail at this, they only see the big numbers, and don't realize that they must have a wider target, after all nobody knows if those users will be the ones that will be making the big buying decisions in the near future. I also think that they aren't paying attention to boards that will give poor cost to performance ratios, just because these ratios could be used against them. And RX 5500 XT (without half precision floating point support) may be the case.

it's sad to see CUDA work for most Nvidia GPU work and even RX 5700 XT is not support here with ROCm

@serhii-nakon
Copy link

Hello, I have just installed Debian 12, and https://packages.debian.org/bookworm/hipcc package then I successfully can use HIP inside Blender for my RX5500M

Screenshot from 2023-07-23 18-01-04
Screenshot from 2023-07-23 18-10-40

@serhii-nakon
Copy link

Looks like Debian built rocm with support RX5500 by default

@darkar18
Copy link
Author

darkar18 commented Jul 24, 2023 via email

@serhii-nakon
Copy link

serhii-nakon commented Jul 24, 2023

You just need install this https://packages.debian.org/bookworm/hipcc package after install Debian, then It should install all other packages like dependency.

Also looks like Arch Linux also provide ROCm with all cards by default.

@serhii-nakon
Copy link

serhii-nakon commented Jul 28, 2023

Simple update, by default ROCm very partially support HIP on GFX1012
I just re-compile PyTorch with support GFX1012 and now it require files while run mnist test and this files not exist for this card.

PS: Looks like this patch return those required files https://github.com/xuhuisheng/rocm-build/blob/master/patch/22.tensile-gfx1012-1.patch

Screenshot from 2023-07-28 17-32-34

@serhii-nakon
Copy link

serhii-nakon commented Jul 28, 2023

Here docker-compose and dockerfile that I used to test docker.zip
When I will have enough time I will try https://github.com/xuhuisheng/rocm-build - looks like it should work almost fine

@serhii-nakon
Copy link

PSS: I successfully complete this mnist test using this OpenCL backend from this project https://github.com/artyom-beilis/pytorch_dlprim

Here docker file with PyTorch and this backend
docker_cl.zip

Screenshot from 2023-07-28 17-57-14

@serhii-nakon
Copy link

serhii-nakon commented Aug 9, 2023

I successfully re-built ROCm 5.4.3 rccl, rocsparse, rocblas, rocfft, rocrand (all other components are default), PyTorch 2.1 and complete Mnist test using HIP/ROCm

PS: To build PyTorch you need at least 32GB RAM (or swap)

Screenshot from 2023-08-09 14-11-36

Docker_rocm543_Pytorch21git.zip

@set-soft
Copy link

set-soft commented Aug 9, 2023

Hi @serhii-nakon !
Can you upload the docker image to dockerhub or the GitHub registry? This might save a lot of time to other people, specially for people with less than 32 GiB of RAM.

@serhii-nakon
Copy link

Possible I will do it, but need simple refactor to minimize size of container

@darkar18
Copy link
Author

darkar18 commented Aug 9, 2023

@serhii-nakon thank you so much for your efforts!!!. Is there any way I (with 8gb Ubuntu) can use rocm?

@serhii-nakon
Copy link

You can use already pre-built Docker image but you can not build with 8GB RAM

@darkar18
Copy link
Author

darkar18 commented Aug 9, 2023

Will the future versions support build with 8gb ram? Will you guide me if theres anything that I can do

@serhii-nakon
Copy link

I mean that if I upload Docker image with already pre-build PyTorch and ROCm you can use it, but main issue it re-build PyTorch from source because it use 8-25GB of RAM while building.

@serhii-nakon
Copy link

I have uploaded this image, please test also PyTorch Audio because I have not enough time to do it (only compiled)
https://hub.docker.com/r/serhiin/rocm_gfx1012_pytorch

@serhii-nakon
Copy link

serhii-nakon commented Aug 11, 2023

PS: Sorry for my mistake I added only one tag and it non latest that's why you can not pull it (I update description with correct instruction), please use this tag ubuntu2004_rocm543_pytorch21 or full example serhiin/rocm_gfx1012_pytorch:ubuntu2004_rocm543_pytorch21

@serhii-nakon
Copy link

I have tested it with diffusers and it works, but sometimes out of memory due 4GB VRAM

@megumintyan
Copy link

@serhii-nakon can you post pytorch .whl files?

@serhii-nakon
Copy link

serhii-nakon commented Aug 22, 2023

@megumintyan You can extract it from Docker container. But also you need rebuild some part of ROCm (Docker already has it).

Better to use Docker for it where all parts already configured and just works.

@megumintyan
Copy link

@serhii-nakon I can't find it inside the container. Also it takes up 70gb

@serhii-nakon
Copy link

serhii-nakon commented Aug 22, 2023

@megumintyan My mistake, I did not build whl files

@serhii-nakon
Copy link

serhii-nakon commented Sep 1, 2023

I uploaded build with Ubuntu 22.04 and minimized container/image size (10GB uncompressed and 2GB compressed size)
https://hub.docker.com/r/serhiin/rocm_gfx1012_pytorch/tags

@kernel2008
Copy link

kernel2008 commented Oct 1, 2023

I uploaded build with Ubuntu 22.04 and minimized container/image size (10GB uncompressed and 2GB compressed size) https://hub.docker.com/r/serhiin/rocm_gfx1012_pytorch/tags

@erhii-nakon thinks.When using rocm_gfx1012_pytorch image on the AMD Radeon Pro W5500 device, the gpu device cannot be used by torch

  • sudo rocminfo
    Agent 2 Name: gfx1012 Uuid: GPU-XX Marketing Name: AMD Radeon Pro W5500
  • rocm-smi
    ======================= ROCm System Management Interface ======================= ================================= Concise Info ================================= GPU Temp (DieEdge) AvgPwr SCLK MCLK Fan Perf PwrCap VRAM% GPU% 0 36.0c 3.0W 0Mhz 100Mhz 24.71% auto 105.0W 3% 0% ================================================================================ ============================= End of ROCm SMI Log ==============================
  • pytorch
    `

import torch
torch.cuda.is_available()
False
`

@serhii-nakon
Copy link

I uploaded build with Ubuntu 22.04 and minimized container/image size (10GB uncompressed and 2GB compressed size) https://hub.docker.com/r/serhiin/rocm_gfx1012_pytorch/tags

@erhii-nakon thinks.When using rocm_gfx1012_pytorch image on the AMD Radeon Pro W5500 device, the gpu device cannot be used by torch

  • sudo rocminfo
    Agent 2 Name: gfx1012 Uuid: GPU-XX Marketing Name: AMD Radeon Pro W5500
  • rocm-smi
    ======================= ROCm System Management Interface ======================= ================================= Concise Info ================================= GPU Temp (DieEdge) AvgPwr SCLK MCLK Fan Perf PwrCap VRAM% GPU% 0 36.0c 3.0W 0Mhz 100Mhz 24.71% auto 105.0W 3% 0% ================================================================================ ============================= End of ROCm SMI Log ==============================
  • pytorch
    `

import torch
torch.cuda.is_available()
False
`

Please check permissions inside container like described on page on DockerHub. Also make sure that your provided /dev/dri/* and /dev/kfd devices.

@serhii-nakon
Copy link

Possible you need to upgrade or downgrade kernel or firmwares (I use linux 6.4 and latest amd's firmware). Also make sure that your CPU support atomics (I know that I it required)

@kernel2008
Copy link

Possible you need to upgrade or downgrade kernel or firmwares (I use linux 6.4 and latest amd's firmware). Also make sure that your CPU support atomics (I know that I it required)

@serhii-nakon Thanks a lot! I can run torch mnist example on radeon pro w5500,but the Debian12 kernel crashes intermittently during execution(python mnist/main.py or other app). The crashes also occur when I set multi-user.target mode.

  • hardware
    AMD Ryzen 5 2600+asus b450m+amd radeon pro w5500
  • uname -a
    Linux debma12 6.1.0-12-amd64 #1 SMP PREEMPT_DYNAMIC Debian 6.1.52-1 (2023-09-07) x86_64 GNU/Linux
  • blender use amd gpu for render ok
  • docker run -it --rm --device=/dev/kfd --device=/dev/dri --security-opt seccomp=unconfined --group-add video --entrypoint /bin/bash serhiin/rocm_gfx1012_pytorch:ubuntu2204_rocm543_pytorch21
    `

import torch
torch.cuda.is_available()
True
`

@serhii-nakon
Copy link

@kernel2008 If you have few cards it can cause crashes.

@fir3-1ce
Copy link

fir3-1ce commented Jan 9, 2024

Has this been fixed yet? I have Navi14 but rocminfo doesn't show any errors. So does that mean it works? OpenCL is still broken on Ubuntu, trying to weigh my options here.

@cgmb
Copy link
Collaborator

cgmb commented Feb 14, 2024

Navi 14 is not supported by AMD's official packages, but it is enabled by default in the OS packages for ROCm provided on Debian 13 and Ubuntu 23.10 and later. However, not all libraries provided by ROCm have been packaged in this way. The libraries available are sufficient to run AI tools like llama-cpp on Navi 14 hardware, but not PyTorch.

@serhii-nakon
Copy link

serhii-nakon commented Mar 28, 2024

@cgmb Can you provide how long ROCm team support every card? For example how long RX7900XTX will be supported?

PS: I want to buy this, but not sure, because worry does RX7900XTX not cause the same problem like with RX5000

@cgmb
Copy link
Collaborator

cgmb commented Apr 2, 2024

@cgmb Can you provide how long ROCm team support every card? For example how long RX7900XTX will be supported?

Unfortunately, I do not know that myself.

PS: I want to buy this, but not sure, because worry does RX7900XTX not cause the same problem like with RX5000

Well, there's two things that are different:

  1. The RX 7900 XTX has official support for ROCm from AMD, which is something that the RX 5000 series never had. When considering Navi 31 support, I think Navi 21 is a better comparison than Navi 10, 12 or 14.

  2. There is a much larger community using ROCm these days. I'm hopeful this will help to prevent regressions beyond AMD's official support cycles. That is partly why I've been helping to build the Debian ROCm Team's CI system.

Speaking of which, I recently added a Radeon Pro W5700 (gfx1010) worker to the Debian ROCm CI. The results on that worker should be mostly representative of all RDNA 1 hardware, but I do have a Radeon Pro W5500 (gfx1012) that I would like to add. However, I need to either figure out how to get PCIe passthrough working with the W5500 or buy a server dedicated to testing that GPU.

If anyone in the community knows what tricks are needed to get PCIe passthrough working for the W5500, W5700, or MI60 on Debian 12, that may help me to reuse an existing server for testing gfx1012.

@serhii-nakon
Copy link

serhii-nakon commented Apr 2, 2024

@cgmb Hello, thank you for your answer very much, I thought that it had official support in past but looks like no. Now it make more sense why no support for now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests