New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ROCm 5.xx ever planning to include GFX1012 navi 14 RDNA GPUs? #1735
Comments
I apologize to you in advance. There is no plan to officially add support for gfx1012 in ROCm. The only option available for users with Nav1x series GPUs is to build from source. You may get a working solution by building from source. Confirming that the numerical accuracy of the stack is sufficient is left to the end user. |
Could you attach some manuals or references so that we can try it out ourselves? |
Any guide/resource to build the stack from source? |
Certain parts build unmodified, others build with certain patches applied. See https://github.com/xuhuisheng/rocm-build/tree/master/navi14 |
Hey thanks for resources!
Have you tested it? How about performance and compatibility?
Get Outlook for Android<https://aka.ms/AAb9ysg>
…________________________________
From: Erkin Alp Güney ***@***.***>
Sent: Monday, November 14, 2022 1:23:10 AM
To: RadeonOpenCompute/ROCm ***@***.***>
Cc: Alex v Ajith ***@***.***>; Author ***@***.***>
Subject: Re: [RadeonOpenCompute/ROCm] ROCm 5.xx ever planning to include GFX1012 navi 14 RDNA GPUs? (Issue #1735)
Certain parts build unmodified, others build with certain patches applied. See https://github.com/xuhuisheng/rocm-build/tree/master/navi14
—
Reply to this email directly, view it on GitHub<#1735 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/APVGBEEO5PPXNMI5XMIJLS3WIFBKNANCNFSM5VIWF3JA>.
You are receiving this because you authored the thread.Message ID: ***@***.***>
|
Hello, you can build OpenCL backend for CPU version of PyTorch from this repository https://github.com/artyom-beilis/pytorch_dlprim , I have tested it and it works with my RX5500M I am going to open pull request to add Dockerfile and docker-compose.yaml to quicker build it and test/work |
Hi @saadrahim! I can understand AMD doesn't have enough resources and/or interest on officially supporting these boards. But we all know that faking the card model makes them usable, the only big annoyance is the lack of a .kdb file with the precompiled kernels. This makes the start-up of things like the Stable Diffusion WebUI to take around 3 minutes in my system. Perhaps some unofficial .kbd files could help people. Just a wish. |
Can you let me know what .kdb file you are looking for? I am not familiar with what is missing that is causing your delays. I was impressed to see the work behind https://github.com/xuhuisheng/rocm-build/tree/master/navi14. |
The error message states that I need to install gfx1030_11.kdb, containing precompiled kernels.
Thanks, I saw it. I didn't try it yet because compiling the whole ROCm + PyTorch looks like an adventure to me, I need to free an unknown amount of disk space (if the official docker images for ROCm + Pytorch takes 29 GiB I can't even imagine how much disk space I'll need for the code with the debug symbols and repeated at least twice, objects and lib). I have only 16 GiB of RAM and it will use the swap a lot, I have 24 GiB of SDD swap which I guess will do. |
as the owner of RX 5500 xt 8gb this is my first AMD GPU and my last one too for stable diffusion and CUDA I'm going with NV GPU in my next upgrade I hope AMD thinks more about why its always second place in GPU markets |
Hi @amayra ! |
it's sad to see CUDA work for most Nvidia GPU work and even RX 5700 XT is not support here with ROCm |
Hello, I have just installed Debian 12, and https://packages.debian.org/bookworm/hipcc package then I successfully can use HIP inside Blender for my RX5500M |
Looks like Debian built rocm with support RX5500 by default |
Thanks a lot for sharing, so does this mean I need to install rocm5.x later after Debian installation or is it out of the box
Get Outlook for Android<https://aka.ms/AAb9ysg>
…________________________________
From: serhii-nakon ***@***.***>
Sent: Sunday, July 23, 2023 10:17:56 PM
To: RadeonOpenCompute/ROCm ***@***.***>
Cc: Alex v Ajith ***@***.***>; Author ***@***.***>
Subject: Re: [RadeonOpenCompute/ROCm] ROCm 5.xx ever planning to include GFX1012 navi 14 RDNA GPUs? (Issue #1735)
Looks like Debian built rocm with support RX5500 by default
—
Reply to this email directly, view it on GitHub<#1735 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/APVGBEGQQYZUF6HS335NYC3XRVITZANCNFSM5VIWF3JA>.
You are receiving this because you authored the thread.Message ID: ***@***.***>
|
You just need install this https://packages.debian.org/bookworm/hipcc package after install Debian, then It should install all other packages like dependency. Also looks like Arch Linux also provide ROCm with all cards by default. |
Simple update, by default ROCm very partially support HIP on GFX1012 PS: Looks like this patch return those required files https://github.com/xuhuisheng/rocm-build/blob/master/patch/22.tensile-gfx1012-1.patch |
Here docker-compose and dockerfile that I used to test docker.zip |
PSS: I successfully complete this mnist test using this OpenCL backend from this project https://github.com/artyom-beilis/pytorch_dlprim Here docker file with PyTorch and this backend |
I successfully re-built ROCm 5.4.3 rccl, rocsparse, rocblas, rocfft, rocrand (all other components are default), PyTorch 2.1 and complete Mnist test using HIP/ROCm PS: To build PyTorch you need at least 32GB RAM (or swap) |
Hi @serhii-nakon ! |
Possible I will do it, but need simple refactor to minimize size of container |
@serhii-nakon thank you so much for your efforts!!!. Is there any way I (with 8gb Ubuntu) can use rocm? |
You can use already pre-built Docker image but you can not build with 8GB RAM |
Will the future versions support build with 8gb ram? Will you guide me if theres anything that I can do |
I mean that if I upload Docker image with already pre-build PyTorch and ROCm you can use it, but main issue it re-build PyTorch from source because it use 8-25GB of RAM while building. |
I have uploaded this image, please test also PyTorch Audio because I have not enough time to do it (only compiled) |
PS: Sorry for my mistake I added only one tag and it non |
I have tested it with diffusers and it works, but sometimes out of memory due 4GB VRAM |
@serhii-nakon can you post pytorch .whl files? |
@megumintyan You can extract it from Docker container. But also you need rebuild some part of ROCm (Docker already has it). Better to use Docker for it where all parts already configured and just works. |
@serhii-nakon I can't find it inside the container. Also it takes up 70gb |
@megumintyan My mistake, I did not build whl files |
I uploaded build with Ubuntu 22.04 and minimized container/image size (10GB uncompressed and 2GB compressed size) |
@erhii-nakon thinks.When using rocm_gfx1012_pytorch image on the AMD Radeon Pro W5500 device, the gpu device cannot be used by torch
|
Please check permissions inside container like described on page on DockerHub. Also make sure that your provided /dev/dri/* and /dev/kfd devices. |
Possible you need to upgrade or downgrade kernel or firmwares (I use linux 6.4 and latest amd's firmware). Also make sure that your CPU support atomics (I know that I it required) |
@serhii-nakon Thanks a lot! I can run torch mnist example on radeon pro w5500,but the Debian12 kernel crashes intermittently during execution(python mnist/main.py or other app). The crashes also occur when I set multi-user.target mode.
|
@kernel2008 If you have few cards it can cause crashes. |
Has this been fixed yet? I have Navi14 but |
Navi 14 is not supported by AMD's official packages, but it is enabled by default in the OS packages for ROCm provided on Debian 13 and Ubuntu 23.10 and later. However, not all libraries provided by ROCm have been packaged in this way. The libraries available are sufficient to run AI tools like llama-cpp on Navi 14 hardware, but not PyTorch. |
@cgmb Can you provide how long ROCm team support every card? For example how long RX7900XTX will be supported? PS: I want to buy this, but not sure, because worry does RX7900XTX not cause the same problem like with RX5000 |
Unfortunately, I do not know that myself.
Well, there's two things that are different:
Speaking of which, I recently added a Radeon Pro W5700 (gfx1010) worker to the Debian ROCm CI. The results on that worker should be mostly representative of all RDNA 1 hardware, but I do have a Radeon Pro W5500 (gfx1012) that I would like to add. However, I need to either figure out how to get PCIe passthrough working with the W5500 or buy a server dedicated to testing that GPU. If anyone in the community knows what tricks are needed to get PCIe passthrough working for the W5500, W5700, or MI60 on Debian 12, that may help me to reuse an existing server for testing gfx1012. |
@cgmb Hello, thank you for your answer very much, I thought that it had official support in past but looks like no. Now it make more sense why no support for now. |
I know there has been numerous occurrences of issues opened where people having NAVI 14 model (gfx1012) architectures are having trouble using GPU accelerated ML frameworks from last 2 years. I do with all respect believes that it is high time that ROCm team need to work on a built for making High Performance Computing available for RX5500M,RX5500XT,RX5500,RX5600,RX5600XT GPUs
ROCM 5.xx have been successfully installed and built according to manuals but what use does it provide if I cant use GPU acceleration on Supported versions of pytorch with this?
With the help of ROCm stack, I believe is a platform that was created to bring AMD GPU cards in reply to Nvidia's Dominance in this field.
It is hence so surprising for me to know that ROCm stack supports Nvidia GPUs in ML frameworks but not Native AMD GPUs,
There are many users using RDNA GPUs and it was not a right decision for radeon team to skip RDNA and jump straight to RDNA2/3 cards.
Our trust in you is greatly at stake!!!
The text was updated successfully, but these errors were encountered: