-
Notifications
You must be signed in to change notification settings - Fork 360
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Regression in rocm 5.3 and newer for gfx1010 #2527
Comments
What's your motivation to use newer ROCm? Expect better performance? |
Well, for example to be able to use the official pytorch builds instead of using old nighties or compliling from source. |
Also, PyTorch deleted their rocm5.2 repo, so all that's available now is the broken 5.6. Edit: Never mind, I missed that the relevant repositories are specific to Python 3.10. |
This is not an atomics issue for gfx1010 users, λ ~/ grep flags /sys/class/kfd/kfd/topology/nodes/*/io_links/0/properties INSERT
flags 1 For 1010 users, my understanding is that there is no official target. We have had to use the pytorch/pytorch#103973 caught my eye (and likely other users with similar consumer GPUs) because it's the first issue stating that functionality was lost beyond torch1.13+rocm5.2 that has had a thorough looking at. In addition, the (greatly appreciated) work of @jeffdaily was the first step we've seen with regards to bumping up the usable version of PyTorch for us. I understand that the work required to isolate and undo whatever memory access changes took place between 5.2-5.3 is probably more than what its worth, considering AMD's stance on maintaining compat for older GPUs, as well as possibly breaking the actually supported gfx1030 GPUs. Therefore we've been left to fend for ourselves a little. That's how the issue got a little hijacked, apologies for the intrusion there. I would say that this issue would be the appropriate place for any continued conversation on the matter. 👍 |
Have you tried to simulate gfx906, like:
|
Results below:
Stack trace from coredump here:
|
Thanks for trying. Next thing we can try is to build pytorch on rocm from source. Since you don't have PCIe atomics issue, we will use official pytorch repository. Here is instructions:
(2) clone the pytorch inside your docker container
(3) run the test again inside pytorch folder. |
I have compiled torch 2.2.0a0+git6849d75 and torchvision 0.17.0a0+4433680.
If a stack trace could be of use, please let me know and I'll figure out how to setup coredumpctl in the docker container. |
Any news on this? need more infos? |
I tried it also with the new ROCm 6.0, it doesn't really seem to change much. works fine with an old nigtly build of pytorch 2.0 compiled on rocm5.2, but crashes on the last --pre pytorch |
I made this work with HSA_OVERRIDE_GFX_VERSION=9.4.0 (and I had to find this out purely by trial & error....). |
I changed the gpu on my machine and cannot verify that. @kmsedu can you? Aldo, are you sure you were actually running on the last pytorch version? In automatic1111's webui there's a workaround for making it use by default an older pytorch version compiled on ROCm 5.2 on older navi cards like the RX5700xt (i know that well because... I wrote that workaround). |
Hey there. No, I'm not running on the latest version of pytorch. |
Since gfx1010 is not in the support gfx target list ( https://rocm.docs.amd.com/projects/install-on-linux/en/latest/reference/system-requirements.html#supported-gpus), the latest versions may not work for your gpu. |
Ok, then you have just the same problem. We know that anything compiled using Rocm 5.2 or older works just fine on that card.
I know. But it's still wierd that a gpu wich worked perfectly fine with pytorch compiled for an older romc version + HSA_OVERRIDE_GFX_VERSION=10.3.0 (even if not officially supported) suddently stops working on everything compiled using something newer. This is also true for other softwares wich rely on rocm, like llama-cpp with hipblas support |
When you use
If you're on Debian 13 or Ubuntu 23.10 or later, use libhipblas-dev. The OS-provided packages have gfx1010 enabled. |
Isn't that just the same as installing the rocm stack (or at least part of it)? It depends on rocblas, wich depends on hip, wich depends on the HSA runtime, and so on. |
HIP works fine on gfx1010. It's mainly just that the math libraries in AMD's binary packages are (mostly) not built for that architecture. When I packaged rocBLAS for Debian, I specified for it to be built for gfx1010. I also packaged the test suites for rocBLAS and hipBLAS, and ran them on both the RX 5700 XT, and Radeon W5700. All tests passed. Of course, nobody has packaged MIOpen for Debian yet, so while the OS packages should be sufficient for llama-cpp, they are not sufficient yet for something like PyTorch. |
Last time i tried i had a memory access error (just like the newer pytorch versions) when trying to load a model in lama.cpp with both hipblas and clblast offloading, while the second worked fine on Windows. I had the same problem in both ArchLinux and Ubuntu. |
Anyway, i'm perfectly aware of that. You are right, that card wasn't never supposed to work on rocm in the first place and it works with some older pytorch builds just thanks to a workaround. But after this reply on november 17th I was expecting to see something on this matter anyway. |
To be clear, on Ubuntu were you using
Using A better path to getting gfx1010 enabled in PyTorch would be to build the ROCm math and AI libraries for gfx1010 (or gfx10.1-generic). That is probably not going to happen in AMD's official packages, but there are other groups building and distributing ROCm packages. I can't speak for other distributions, but I expect to have it enabled later this year on Debian. With that said, my work with Debian is strictly volunteer work (on top of my full-time job), so don't expect it to happen quickly. |
Ok, now it's more clear. I was also thinking that the hsa override flag was needed for rocblas too, because i couldn't use it on native 1010 since the libs for 1010 were missing in the official packages. I also just found this PR wich has been merged just 5 days ago wich makes life a bit more simple for compiling the tensile libs for 1010 |
Now that ROCM 6.1 is out, I tried it with the latest pytorch nightly (which still is built with Rocm 6.0) and this is the error I get when trying to run ComfyUI:
setting HSA_OVERRIDE_GFX_VERSION to 10.3.0 still doesn't work it just maxes out the GPU clock and graphics pipeline but doesn't actually do anything. |
Pytorch wasn't building because of ROCm/aotriton#18 so I hacked it out and made pytorch compile without aotriton. This actually works and I'm able to run ComfyUI! Notes:
I'm trying again with aotriton patched with the fix @xinyazhang suggested to see if that helps with easier/more straightforward building and possibly with performance. |
OK after a lot of trial and error I've managed to get a consistent set of steps. Steps:
Caveats:
The key takeaway from this exercise is that I would have probably been better off had I not nuked my Windows install and just used DirectML. Needless to say it was both educational and frustrating. Anyway, hopefully this should help others until AMD releases ROCM 6.2 with the fallback libraries in the official rocBLAS packages and the docker images used to build pytorch, and pytorch themselves build pytorch wheels against those |
That's amazing, could you share the pytorch files you've built? I'm trying to build on Debian Unstable, which is compatible with gfx1010 by default using the rocm libraries, but I'm unable to build correctly (I've probably messed something up in the setup process of the pytorch packages). |
I'll try to build a wheel from my pytorch setup |
@daniandtheweb here you go: https://drive.google.com/file/d/1Y2kQ3bnoihs892tHOpXHkvfMQJH_gYa9/view?usp=drive_link |
OK more instability.
I suppose I'll have to build torchsde manually for that to work |
OK can confirm that the |
@Zakhrov try taking a look at this: https://lists.debian.org/debian-ai/2024/02/msg00164.html |
As you might be aware from this documentation (https://rocm.docs.amd.com/projects/install-on-linux/en/latest/reference/system-requirements.html#supported-gpus), gfx1010 was not among the supported gfx architectures, and therefore, the behavior is undefined. You can close this issue. |
I'm well aware of the supported gfx architectures. What we are following is the advice given here: #1735 (comment) |
Update: building rocSPARSE and comgr from source with @GZGavinZhao's patch available here: GZGavinZhao/ROCm-CompilerSupport@3419d51 got the SDE ksamplers to work in ComfyUI. Performance is still slower than with ROCM 5.2 (probably because of missing MiOpen and Composable Kernels) |
Ok in all seriousness, this issue should be resolved if ROCm/Tensile#1897 is cherry-picked into a release. I'll open an issue there (ROCm/Tensile#1916). @Zakhrov if you really want to test gfx1010 support, you can try on Solus (the distro that I'm a maintainer for) with Docker. Note that the docker image is experimental/community-maintained so this shouldn't be used for anything serious, just for testing purposes: # I personally use podman because I don't need to deal with sudo permission issues,
# but if you're more comfortable with Docker, replace `podman` with `sudo docker`
podman run -it --device=/dev/kfd --device=/dev/dri --group-add=video --group-add=render --group-add=nobody --ipc=host --cap-add=SYS_PTRACE --security-opt seccomp=unconfined silkeh/solus:devel bash Inside the container, run the following to install PyTorch with ROCm support: sudo eopkg ur && sudo eopkg up --ignore-comar -y # similar to sudo apt-get update && sudo apt-get upgrade
sudo eopkg it --ignore-comar pytorch python-torchaudio python-torchvision python-torchtext rocm-info -y Now, you can proceed to run ComfyUI as normal. We didn't test against ComfyUI but we did test against Fooocus, so the process shoudl be similar. I assume you would need to create a # Before everything, check that ROCm is not completely broken
rocminfo
# Assume you're already inside the Fooocus directory
python3 -m venv venv --system-site-packages
source venv/bin/activate
# You may get warnings about dependencies failed to be uninstalled/updated, but in practice we haven't found this to be problematic
pip3 install -r requirements.txt
# Now run Fooocus! If you're familiar with Nix, after NixOS/nixpkgs#298388 is merged, you should also be able to use Regarding |
I'm currently on openSUSE 15.5 (which is compatible with packages built for SLES 15 SP5) and I run rocm on bare metal for the most part. Your patch works fine for everything except torchsde workloads - that needs
That's what I manually patched in to |
I'm surprised that you need to rebuild
I assume that you know you can pass the CMake flag |
Wait @Zakhrov are you not on |
I have a Radeon RX 5600M which shows as gfx1010 in rocminfo. Maybe I built rocBLAS wrong the first time around |
Good news, ROCm/Tensile#1897 will be included in the ROCm 6.2 release judging from the |
Note that all the system packages on Ubuntu 24.04 have gfx1010 enabled. However, to use PyTorch you still need MIOpen. Once |
@DGdev91 Has your issue been resolved? If so, please close ticket. Thanks! |
I can't really test it anymore, as i changed the gpu some months ago (i now have a 7900XT, working fine) But according to @GZGavinZhao, the fix will be included in 6.2 release, so i was waiting for someone testing that version as soon it's released, before closing |
Does anyone have an ETA for ROCm 6.2? |
@Zakhrov So were you able to build it with gfx1010, i could not build it (MIOpen), do i need ck to be already built?
there is already a PR going for that part pytorch/pytorch#125230 (comment) |
Ok I actually managed to build the whole stack for gfx1010, with latest pytorch and all develop rocm So I thought I would make an update on that. Thanks a lot anyway. |
@waheedi Amazing, would it be possible to upload the pytorch wheels somewhere? |
@veyn3141 Amazing what man :), the wheel on its own is not going to help, as that would be bundled with some libraries that you actually won't have with a standard rocm installation so I dont think it would be of any help. But also I hit a blocker for building the last 200 tasks of torch and right now I'm a bit blocked. #3445 |
Since when pytorch 2 was officially released, i wasn't able to run it on my 5700XT, while i was previously able to use it just fine on pytorch 1.13.1 by setting "export HSA_OVERRIDE_GFX_VERSION=10.3.0"
There are many reporting the same issue on the 5000 series, like for example
AUTOMATIC1111/stable-diffusion-webui#6420
--precison-full and --no-half are also needed because the card seems like can't use fp16 on linux/rocm, as already reported here #1857
i also read about the PCI atomics requirement, following this issue pytorch/pytorch#103973
....But that doesn't seems to be my case. the command "grep flags /sys/class/kfd/kfd/topology/nodes/*/io_links/0/properties" returns:
Also, i tried to compile pytorch using the new "-mprintf-kind=buffered" flag, but it didn't change anything.
Finally, i recently found out that pytorch 2 works just fine on gfx1010 if that's compiled by rocm 5.2, as suggested here pytorch/pytorch#106728
The text was updated successfully, but these errors were encountered: