[Bug]: "Memory access fault by GPU node-1" error with RX 6600 on Linux #8139

Yumae · 2023-02-26T16:23:10Z

Is there an existing issue for this?

I have searched the existing issues and checked the recent builds/commits

What happened?

When trying to generate pictures above a certain resolution i get this error in the console window. I have been able to consistently reproduce this by trying to generate a picture bigger than 768x1024/1024x768. Im sure that i could go higher than that with the amount of VRAM that this card has considering that the KDE resource monitor shows VRAM usage never reaching 7gb. In the screenshot it can be seen that the generation process goes to 100% but when it tries to output the image it spits out that error instead.

Steps to reproduce the problem

Generate a picture with a resolution higher than 1024x768 like for example 1280x768.

What should have happened?

It should output the picture and it should let me generate at higher resolutions as well.

Commit where the problem happens

3715ece

What platforms do you use to access the UI ?

Linux

What browsers do you use to access the UI ?

Mozilla Firefox

Command Line Arguments

export COMMANDLINE_ARGS="--medvram --listen"

List of extensions

wildcards
openpose-editor
stable-diffusion-webui-dataset-tag-editor
stable-diffusion-webui-images-browser
stable-diffusion-webui-pixelization

Console logs

################################################################
Install script for stable-diffusion + Web UI
Tested on Debian 11 (Bullseye)
################################################################

################################################################
Running on anon user
################################################################

################################################################
Repo already cloned, using it as install directory
################################################################

################################################################
Create and activate python venv
################################################################

################################################################
Launching launch.py...
################################################################
Python 3.10.9 (main, Dec 19 2022, 17:35:49) [GCC 12.2.0]
Commit hash: 3715ece0adce7bf7c5e9c5ab3710b2fdc3848f39
Installing requirements for Web UI

Launching Web UI with arguments: --medvram --listen
No module 'xformers'. Proceeding without it.
Loading weights [c353313f5d] from /home/anon/stable-diffusion-webui/models/Stable-diffusion/AOM2-nutmegmixGav2+ElysV2.safetensors
Creating model from config: /home/anon/stable-diffusion-webui/configs/v1-inference.yaml
LatentDiffusion: Running in eps-prediction mode
DiffusionWrapper has 859.52 M params.
Loading VAE weights specified in settings: /home/anon/stable-diffusion-webui/models/VAE/sd15.vae.pt
Applying cross attention optimization (Doggettx).
Textual inversion embeddings loaded(25): chr-atagorq, chr-ayanami, chr-bremertonsummer, chr-honolulu, chr-shun, chr-shylily, chr-sirius, chr-stlouislux, chr-taihou, chr-yamashiro, chr-yukikazepan, ero-lactation, ero-doggystyle, ero-deepmissionary, spe-centaur, chr-nahida, spe-mothgirl, chr-okayu, spe-lamia, chr-senko, chr-i19, chr-lumine, chr-kashino, chr-yuudachi, EasyNegative
Model loaded in 18.8s (load weights from disk: 7.2s, create model: 1.1s, apply weights to model: 8.2s, apply half(): 0.6s, load VAE: 1.6s).
[tag-editor] Settings has been read from config.json
Running on local URL:  http://0.0.0.0:7860

To create a public link, set `share=True` in `launch()`.
  0%|                                                                                  | 0/20 [00:00<?, ?it/s]MIOpen(HIP): Warning [SQLiteBase] Missing system database file: gfx1030_14.kdb Performance may degrade. Please follow instructions to install: https://github.com/ROCmSoftwarePlatform/MIOpen#installing-miopen-kernels-package
100%|█████████████████████████████████████████████████████████████████████████| 20/20 [00:07<00:00,  2.66it/s]
100%|█████████████████████████████████████████████████████████████████████████| 20/20 [00:44<00:00,  2.23s/it]
Memory access fault by GPU node-1 (Agent handle: 0x559e97c5f090) on address 0x7f91800e1000. Reason: Page not present or supervisor privilege.


Warning: Program '/home/anon/stable-diffusion-webui/webui.sh' crashed.

Additional information

Distro: EndeavourOS (ArchLinux)
DE: KDE on X11
CPU: Ryzen 1600
GPU: RX 6600 (8GB VRAM)
RAM: 16GB

WebUI installed with the default script. I didn't mess with ROCm versions or any of that since it took care of that automatically.
Can generate pictures at or below 1024x768 with no problems. I get the same error both with and without highres fix enabled.

The text was updated successfully, but these errors were encountered:

raff766 · 2023-03-03T20:01:01Z

Can confirm, I'm having the same exact issue with my RX 6800 XT (16GB VRAM)

Parzival1608vonKatze · 2023-03-04T10:38:27Z

Same here, exactly the same issue. (RX 6700XT 12GB)

ishawn944 · 2023-03-10T13:47:51Z

You can try the following commands:
sudo usermod -a -G video $USER
sudo usermod -a -G render $USER

Set the environment variable for SD:
PYTORCH_HIP_ALLOC_CONF=garbage_collection_threshold:0.6,max_split_size_mb:128
GPU memory will be garbage collected when it reaches 60% capacity. and set the maximum size of memory splits to 128mb, that can help to reduce memory fragmentation.

You may also need to add the --medvram
These worked for my RX 6750XT

Ridien · 2023-03-11T07:08:53Z

Running the WebUI using --no-half and --lowvram solved it for me.

popemkt · 2023-03-19T15:00:10Z

Can confirm the same with RX 6800S 8GB

mlrey7 · 2023-03-30T00:22:27Z

Upgrading to pytorch 2.0 and rocm 5.4.2 fixed this for me. Also using --opt-sub-quad-attention really helps along side with --medvram and the PYTORCH_HIP_ALLOC_CONF=garbage_collection_threshold:0.6,max_split_size_mb:128 All of these allows me to hi-res fix 512x768 to 1.85x (944x1420) on the RX 6600 8GB VRAM

DGdev91 · 2023-04-06T01:50:26Z

I have the same problem on 5700XT using rocm 5.4.2 and pytorch 2.0
Strangely, it works fine using pytorch 1.13.1

same issue with both --medvram and --lowvram

With pytorch 2 i also tried to use --opt-sdp-attention with no effect

i also use --precision full and --no-half

i finally tried export PYTORCH_HIP_ALLOC_CONF=garbage_collection_threshold:0.6,max_split_size_mb:128, did not help.

egolfbr · 2023-04-27T12:20:39Z

I have the same problem on 5700XT using rocm 5.4.2 and pytorch 2.0 Strangely, it works fine using pytorch 1.13.1

same issue with both --medvram and --lowvram

With pytorch 2 i also tried to use --opt-sdp-attention with no effect

i also use --precision full and --no-half

i finally tried export PYTORCH_HIP_ALLOC_CONF=garbage_collection_threshold:0.6,max_split_size_mb:128, did not help.

I have a very similar setup on an Ubuntu machine. I downgraded to Pytorch 1.13.1 and everything appears to be fine except for a warning about missing database file

MIOpen(HIP): Warning [SQLiteBase] Missing system database file: gfx1030_20.kdb Performance may degrade. Please follow instructions to install: https://github.com/ROCmSoftwarePlatform/MIOpen#installing-miopen-kernels-package

DianaNites · 2023-04-27T18:22:44Z

@egolfbr That a harmless warning from AMD due to https://github.com/ROCmSoftwarePlatform/MIOpen/blob/develop/doc/src/cache.md

TLDR is that AMD ROCm will compile and cache some GPU stuff in the background, but also comes with pre-compiled GPU kernels for some cards. The version with pytorch 1.x does not seem to bundle a copy for your card, but the only effect should be that the first image you generate may be slow.

skerit · 2023-05-12T10:22:29Z

Had the same error while using a Lora model, but I was still using torch 1.12
Upgrading to 1.13.1 fixed it for me.

torgeir · 2023-05-27T06:16:49Z

The following fixed an issue similar to OP

index 49a426ff..03b57253 100644
--- a/webui-user.sh
+++ b/webui-user.sh
@@ -10,7 +10,7 @@
 #clone_dir="stable-diffusion-webui"
 
 # Commandline arguments for webui.py, for example: export COMMANDLINE_ARGS="--medvram --opt-split-attention"
-#export COMMANDLINE_ARGS=""
+#export COMMANDLINE_ARGS="--reinstall-torch"
 
 # python3 executable
 #python_cmd="python3"
@@ -27,6 +27,9 @@
 # install command for torch
 #export TORCH_COMMAND="pip install torch==1.12.1+cu113 --extra-index-url https://download.pytorch.org/whl/cu113"
 
+# https://github.com/AUTOMATIC1111/stable-diffusion-webui/issues/8139
+export PYTORCH_HIP_ALLOC_CONF=garbage_collection_threshold:0.8,max_split_size_mb:128
+
 # Requirements file to use for stable-diffusion-webui
 #export REQS_FILE="requirements_versions.txt"

--- a/webui.sh
+++ b/webui.sh
@@ -119,7 +119,7 @@ esac
 if echo "$gpu_info" | grep -q "AMD" && [[ -z "${TORCH_COMMAND}" ]]
 then
     # AMD users will still use torch 1.13 because 2.0 does not seem to work.
-    export TORCH_COMMAND="pip install torch==1.13.1+rocm5.2 torchvision==0.14.1+rocm5.2 --index-url https://download.pytorch.org/whl/rocm5.2"
+    export TORCH_COMMAND="pip install torch torchvision --index-url https://download.pytorch.org/whl/rocm5.4.2"
 fi  
 
 for preq in "${GIT}" "${python_cmd}"

Arch, RX6800XT

Essoje · 2023-05-30T08:41:49Z

Confirming the above has solved the 'Memory access fault by GPU node-1' problem on my machine.
However, while the above would work without a problem on a clean installation, I was forced to additionally use the --ignore-installed flag on the pip install command as follows.

TORCH_COMMAND="pip install --ignore-installed torch torchvision --index-url https://download.pytorch.org/whl/rocm5.4.2"

Manjaro, RX6900XT

lufixSch · 2023-12-02T14:08:20Z

Just wanted to add, for anyone finding this.

sudo usermod -a -G video $USER
sudo usermod -a -G render $USER

For some reason I got this error after adding my user to the groups video and render.
When removing the groups everything worked again.

sudo usermod -r -G video $USER
sudo usermod -r -G render $USER

juipeltje · 2024-01-29T21:17:31Z

i'm having the same problem with fooocus running on void linux with a 6950xt, pretty much tried every solution in this thread to no avail, but what seems to work as a workaround for me now is to use it with --always-no-vram and --always-offload-from-vram, not sure if A1111 has similar flags available but maybe worth a shot. it is a little bit slower compared to using vram, but it still easily beats running on cpu and atleast now i can leave it running to generate a bunch of images without it crashing every other image. if you have the extra system ram available it might be a good bandaid solution.

Yumae added the bug-report Report of a bug, yet to be confirmed label Feb 26, 2023

This was referenced Apr 6, 2023

[Bug]: AssertionError: Torch is not able to use GPU; add --skip-torch-cuda-test to COMMANDLINE_ARGS variable to disable this check #9402

Closed

Forcing PyTorch version for AMD GPUs automatic install #9404

Merged

update torch base environment #9191

Merged

JordyProvost mentioned this issue Jul 31, 2023

"Memory access fault by GPU node-1" error with RX 6600XT on Linux easydiffusion/easydiffusion#1450

Closed

Th3Rom3 mentioned this issue Nov 5, 2023

Running on Ubuntu VirtualBox comfyanonymous/ComfyUI#1870

Open

tjtanaa mentioned this issue Dec 13, 2023

Roadmap EmbeddedLLM/vllm-rocm#4

Open

6 tasks

cgerardin mentioned this issue Jan 12, 2024

Segfault on linux with AMD GPU lllyasviel/Fooocus#1783

Closed

wisniew mentioned this issue Mar 23, 2024

ROCm Memory access fault comfyanonymous/ComfyUI#3123

Open

tomasmark79 mentioned this issue May 30, 2024

Start gen image stuck Kubuntu 24.04 7900XTX (solved) nktice/AMD-AI#4

Open

smirgol mentioned this issue Jun 16, 2024

[Issue]: *ERROR* MES failed to response msg=3 ROCm/ROCm#3265

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug]: "Memory access fault by GPU node-1" error with RX 6600 on Linux #8139

[Bug]: "Memory access fault by GPU node-1" error with RX 6600 on Linux #8139

Yumae commented Feb 26, 2023

raff766 commented Mar 3, 2023

Parzival1608vonKatze commented Mar 4, 2023

ishawn944 commented Mar 10, 2023

Ridien commented Mar 11, 2023 •

edited

Loading

popemkt commented Mar 19, 2023

mlrey7 commented Mar 30, 2023

DGdev91 commented Apr 6, 2023

egolfbr commented Apr 27, 2023

DianaNites commented Apr 27, 2023

skerit commented May 12, 2023

torgeir commented May 27, 2023

Essoje commented May 30, 2023 •

edited

Loading

lufixSch commented Dec 2, 2023

juipeltje commented Jan 29, 2024

[Bug]: "Memory access fault by GPU node-1" error with RX 6600 on Linux #8139

[Bug]: "Memory access fault by GPU node-1" error with RX 6600 on Linux #8139

Comments

Yumae commented Feb 26, 2023

Is there an existing issue for this?

What happened?

Steps to reproduce the problem

What should have happened?

Commit where the problem happens

What platforms do you use to access the UI ?

What browsers do you use to access the UI ?

Command Line Arguments

List of extensions

Console logs

Additional information

raff766 commented Mar 3, 2023

Parzival1608vonKatze commented Mar 4, 2023

ishawn944 commented Mar 10, 2023

Ridien commented Mar 11, 2023 • edited Loading

popemkt commented Mar 19, 2023

mlrey7 commented Mar 30, 2023

DGdev91 commented Apr 6, 2023

egolfbr commented Apr 27, 2023

DianaNites commented Apr 27, 2023

skerit commented May 12, 2023

torgeir commented May 27, 2023

Essoje commented May 30, 2023 • edited Loading

lufixSch commented Dec 2, 2023

juipeltje commented Jan 29, 2024

Ridien commented Mar 11, 2023 •

edited

Loading

Essoje commented May 30, 2023 •

edited

Loading