-
Notifications
You must be signed in to change notification settings - Fork 25.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug]: Seg Fault with ROCM 7900 XT #14763
Comments
Your torch rocm version seems to be quite old, try updating to at least 5.7. I have 7900 XTX and while I still haven't managed to get it working though |
First of all, use the last rocm version. I suggest you to use the official amdgpu installer tool, and follow the official instructions https://rocm.docs.amd.com/projects/install-on-linux/en/latest/tutorial/quick-start.html Since you just changed your gpu, try to delete your venv folder and make it download all the packages again. Also, make sure there isn't any customization in the webui_user.sh (for exaple, the HSA_OVERRIDE flag. You don't need it anymore. |
I have solved this by updating the venv's torch and torchvision versions to the latest nightlies. I also am running the latest ROCM driver (6.0.2). There are multiple ways to update the versions, but the way I elected to do it is by updating my
Then I deleted the Just to reiterate, I have stable-diffusion-webui working with my 7900 XT with little effort. The maintainers should be able to get this working by updating the install script only. If new torch or rocm versions become available, you can view the available torch versions on the torch pip index: https://download.pytorch.org/whl/nightly/rocm6.0 (you can also replace rocm6.0 in that url with newer or older versions of rocm to facilitate your driver version) |
I'm on RX 7800 XT, ROCm So using |
For 7900XT and 7900XTX the HSA_OVERRIDE_GFX_VERSION flag isn't needed at all. |
Hey @DGdev91 thanks for your reply :-) I already tried as, as I read your previous comment as well and it doesn't work. It causes following error: I only used 10.3.0 because it was recommended often and I didn't understand it's meaning. As far as I understand now, the 11.0.0 is the closest version to my card that's officially supported, right? I still have a memory leak, after a couple of runs my 32GB RAM is full so I have to restart the program. But that's off-topic and I will search for related issues. Update: I fixed the memory leak by omitting the |
Well, good to know then. .... there's also a patch wich has been merged some weeks ago wich should in theory make the the default config to build the tensile libs for many "not fully supported" archs, and should in theory make that flag not needed anymore in next rocm release. But for now, just keep that. |
Checklist
What happened?
Fresh install on Ubuntu 22.04 goes well. However, when running webui shortly after HTTP server starts up (and launches a browser window that successfully loads from the server), the server crashes with the following error. I have tested this with ROCM 5.7, 6.0, and 6.0.1. I am running text-generation-webui successfully on the rocm device (so I think its not an overall system config issue) and the device is detected properly. I previously had a 6700 XT installed that was running stable-diffusion-webui well, but the new 7900 XT is not.
Steps to reproduce the problem
What should have happened?
WebUI should start up normally and load a model.
What browsers do you use to access the UI ?
No response
Sysinfo
sysinfo-2024-01-25-23-06.json
Console logs
Additional information
No response
The text was updated successfully, but these errors were encountered: