Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

hipGetDeviceCount error with AMD gpu #1122

Closed
el-ef opened this issue Sep 26, 2022 · 5 comments
Closed

hipGetDeviceCount error with AMD gpu #1122

el-ef opened this issue Sep 26, 2022 · 5 comments
Labels
bug-report Report of a bug, yet to be confirmed

Comments

@el-ef
Copy link

el-ef commented Sep 26, 2022

Installing on Linux with a RX 6900 XT leads to a hipGetDeviceCount error.

...
Launching Web UI with arguments: --precision full --no-half
/home/mint/stable-diffusion-webui/venv/lib/python3.10/site-packages/torch/cuda/init.py:83: UserWarning: HIP initialization: Unexpected error from hipGetDeviceCount(). Did you run some cuda functions before calling NumHipDevices() that might have already set an error? Error 101: hipErrorInvalidDevice (Triggered internally at ../c10/hip/HIPFunctions.cpp:110.)
return torch._C._cuda_getDeviceCount() > 0
Warning: caught exception 'Unexpected error from hipGetDeviceCount(). Did you run some cuda functions before calling NumHipDevices() that might have already set an error? Error 101: hipErrorInvalidDevice', memory monitor disabled
LatentDiffusion: Running in eps-prediction mode
DiffusionWrapper has 859.52 M params.
making attention of type 'vanilla' with 512 in_channels
Working with z of shape (1, 4, 32, 32) = 4096 dimensions
...

I made a clean install with "--skip-torch-cuda-test" added to ARGS at line 15 of launch.py. The web UI appears to run fine so far.
Just thought I should mention the HIP error.

Kind regards
el-ef

@el-ef el-ef added the bug-report Report of a bug, yet to be confirmed label Sep 26, 2022
@garrett
Copy link

garrett commented Sep 26, 2022

It's working for me on Linux (Fedora 37 beta) with an AMD RX 6700 XT using ROCm, but I have to use a special variable to override:

HSA_OVERRIDE_GFX_VERSION=10.3.0 python3.10 launch.py --precision full --no-half --listen --medvram

(I haven't tried skipping the torch cuda test.)

I also have Torch installed with:

sh
python3.10 -m pip install torch torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/rocm5.1.1

(I'm specifying the version of Python as Fedora 37 comes with Python 3.11.0rc2 by default, which doesn't work well with this, so I have Python 3.10 parallel-installed on my system.)


It doesn't work "out of the box", and I agree with you that it would be nice if it would.

I do also get the same error as you if I don't use that variable above. It's:

RuntimeError: Error running command.
Command: "/usr/bin/python3.10" -c "import torch; assert torch.cuda.is_available(), 'Torch is not able to use GPU; add --skip-torch-cuda-test to COMMANDINE_ARGS variable to disable this check'"
Error code: -6
stdout: <empty>
stderr: "hipErrorNoBinaryForGpu: Unable to find code object for all current devices!"

Trying to use --skip-torch-cuda-test doesn't work for me, however. I get the same error message.

@el-ef
Copy link
Author

el-ef commented Sep 26, 2022

Hi,
I could get it working by doing a fresh pull of the release as described in the AMD guide. Then downloaded the .ckpt file, dropped in the models folder and added the skip cuda command to the launch.py. Then repeat with the AMD install instructions, preparing venv etc., then running:
TORCH_COMMAND='pip install torch torchvision --extra-index-url https://download.pytorch.org/whl/rocm5.1.1' python launch.py --precision full --no-half
The UI runs fine so far. It is pretty slow though. Running "blue ocean" as promt with default settings takes about 2 minutes to render. Is that considered to be normal at the current state with AMD hardware?

AMD 5800X
AMD RX 6900 XT
32GB RAM
Ubuntu (Mint 21)
Python 3.1.6

Is your setup running any faster?

Regards
el-ef

@garrett
Copy link

garrett commented Sep 26, 2022

Wow. Two minutes is quite long. It takes about 6-7 seconds for a normal 512 render with defaults here (depending on medvram or not).


The very first time you run it, it needs to "warm up" just under a minute as ROCm (at least on Fedora) doesn't have a binary for our cards. It complains about this with a URL which doesn't have a solution (except for Ubuntu... and even then, I'm not sure if your or my cards are supported).

MIOpen(HIP): Warning [SQLiteBase] Missing system database file: gfx1030_20.kdb Performance may degrade. Please follow instructions to install: https://github.com/ROCmSoftwarePlatform/MIOpen#installing-miopen-kernels-package

First run time: Time taken: 51.58s (this includes that warmup time).

But after that, it's fast. About 5 seconds each default txt2txt render.

Total progress: 100%|███████████████████████████| 20/20 [00:04<00:00,  4.08it/s]

blue ocean
Steps: 20, Sampler: Euler a, CFG scale: 7, Seed: 20056055, Size: 512x512, Model hash: 7460a6fa

Time taken: 5.17s

Torch active/reserved: 6256/6512 MiB, Sys VRAM: 6576/12272 MiB (53.59%)

If I have --medvram enabled (mentioned above) to be able to render a bit larger and have it not keep as much in RAM, then it gets a little slower. By a almost two seconds:

Total progress: 100%|███████████████████████████| 20/20 [00:06<00:00,  3.26it/s]

blue ocean
Steps: 20, Sampler: Euler a, CFG scale: 7, Seed: 2054059139, Size: 512x512, Model hash: 7460a6fa

Time taken: 6.73s

Torch active/reserved: 4579/5136 MiB, Sys VRAM: 5264/12272 MiB (42.89%)

(After seeing this, I might drop using medvram? You have more VRAM in your card, so you probably don't really need it, unless you just want to render larger without upscaling. But even then, it probably isn't so much larger.)

If I have previews turned on, it does slow it down by a second or two (depending on how often I have it render previews while it's computing an image).


I actually didn't use the AMD guide you linked above. I set up other stable diffusions before and set up "anaconda" (the Python version manager, not to be confused to the installer written in Python that Fedora/CentOS/RHEL uses that goes by the same name) for those and used it for this too. I don't think that's the difference, however.

I'm not sure the speed difference... You might want to manually install the ROCm version of Torch with pip, instead of starting it by passing a variable with the URL to the launch command? (Just guessing though.)

I did run these commands @ https://gist.github.com/geerlingguy/ff3c3cbcf4416be2c0c1e0f836a8183d#file-stable-diffusion-ubuntu-2004-amd-txt-L37-L38 but then I dropped the one with the versions and only used the pip install https://gist.github.com/geerlingguy/ff3c3cbcf4416be2c0c1e0f836a8183d#file-stable-diffusion-ubuntu-2004-amd-txt-L37 (but with python3.10 version: python3.10 -m pip install torch torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/rocm5.1.1 instead).

And I found the HSA_OVERRIDE_GFX_VERSION=10.3.0 variable fix from somewhere... Reddit, I think? I think two things were the magic that got it working for me (aside from the general instructions).

@el-ef
Copy link
Author

el-ef commented Sep 26, 2022

Thanks for all the infos!

The performance is staying that worse, even after burn-in. I did some further investigations and found that stable diffusion is actually computing on my cpu at the moment. That is why it is slowing down. Looks like there is some conflict with the ROCm initialization due to some missing dependencies apt-get refused to install. Ubuntu appears to be a mess with AMD drivers at the moment: https://community.amd.com/t5/drivers-software/ubuntu-22-04-amp-driver-amdgpu-install-22-10-2-50102-1-all-deb/m-p/536762

I will look further into this tomorrow.

Thanks!

@el-ef
Copy link
Author

el-ef commented Sep 28, 2022

I just managed to install the proprietary AMD 22.2 driver on my 20.4 Ubuntu (Mint 21) install. Stable diffusion works fine now. It takes only a few sec to generate an image.

How to install AMD driver on Ubuntu 20.04:

Add the custom PPA repository from https://launchpad.net/~oibaf/+archive/ubuntu/graphics-drivers
"deb http://ppa.launchpad.net/oibaf/graphics-drivers/ubuntu jammy main"

Download and install the amdgpu-install package for your model from: https://amd.com/en/support

Make sure all AMD repositories and PPAs are ticked in the package sources menu of your system. Then prepare the install by adding some dependencies:

sudo apt-get update
sudo apt-get upgrade
sudo apt-get install wget gnupg2
sudo apt-get install rocminfo
sudo apt-get install rocm-device-libs
sudo apt-get install libncurses5
sudo apt-get install libnuma-dev
sudo apt-get install libhsa-runtime64-1
sudo apt-get install libhsakmt1
sudo apt-get install amdgpu-core

And add the current user to the video group:
groups
sudo usermod -a -G video $LOGNAME

You probably also have to add those dummy packages before running the installer:
ROCm/ROCm#1713 (comment)
https://github.com/jacodt/rocm_dummy_packages

Finally, reboot and do:
sudo amdgpu-install

That did the trick for me. :)

@el-ef el-ef closed this as completed Sep 29, 2022
nne998 pushed a commit to fjteam/stable-diffusion-webui that referenced this issue Sep 26, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug-report Report of a bug, yet to be confirmed
Projects
None yet
Development

No branches or pull requests

2 participants