-
Notifications
You must be signed in to change notification settings - Fork 25.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug]: torch.cuda.OutOfMemoryError: HIP out of memory. When training embeddings #6460
Comments
I think this is because your GPU memory are to low. |
what is the minimum I need, even with optimizations enabled? |
Try this: For AMD For Nvidia In my experience --opt-sub-quad-attention is the best vram optimizer for AMD cards and --xformers is the best for NVIDIA, so don't try using --medvram or --lowvram unless either of those don't work for you, and don't combine them like '--opt-sub-quad-attention --medvram' or '--xformers --lowvram' because in my testing it increased vram usage and made image generation slower, so only use one vram optimizer at once. I'm also getting the 'RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu!' error but it won't affect training in anyway, it just means you won't be able to see the preview images being generated in the webui, but you can still view them by going to /stable-diffusion-webui/textual_inversion/ |
I'm having the same issue, Where would I set this? |
For Windows For Linux Though I recommend switching to a docker container, I started using a docker container (using podman instead of docker) a little less than a week ago and I no longer have the issue when training. Also forgot to mention you'll want to check 'enable cross attention optimizations when training' in the settings, this will reduce your vram usage while training by a lot |
Thanks! I managed to add them manually directly to the webui.bat. I think (extreme emphasis on the think) adding it there sets the pytorch environment variable for the venv during its activation and, although I'm sure xformers is now doing it's job and I'm able to train, I'm not sure setting the pytorch variable the way I did actually works. Also because I'm in Windows and nvidia-smi won't actually show me vram usage for my 3080 I know how well it's running only when it dies and throws errors my way, which is not great. I'd try the docker to avoid issues but i fought with them in the past having issues with virtualization and stuff. Thanks again! |
Hi, I'm adding this just for future reference, I'm using a 6750xt GPU and this solved my Hip out of memory problem when generating large images (1024x1536 from hires. fix - I added --opt-sub-quad-attention in the terminal commands). However, I'd like to add for future reference that since this GPU is not really "supported", HSA_OVERRIDE_GFX_VERSION=10.3.0 should be ran in order to avoid Segmentation fault (core dumped) error. (just in case someone also gets the same error - I'm using Linux Mint.) Taken from a rentry troubleshooting page. Segmentation fault (core dumped) "${python_cmd}" launch.py You tried to force an incompatible binary with your gpu via the HSA_OVERRIDE_GFX_VERSION environment variable. Unset it via set -e HSA_OVERRIDE_GFX_VERSION and retry the command. |
Looking at your crash log you have 10GB vram so I'm guessing it's a RX 6700? Try using the new
In Settings -> Training enable "Move VAE and CLIP to RAM when training if possible" and "Use cross attention optimizations while training". If using a SD 2.x model enable Settings -> Stable Diffusion -> "Upcast cross attention layer to float32". With the above setup I'm able to train embeddings on a RX 5500XT 8GB (for 1.5 models anyway, haven't tried any 2.x training). |
I need help doing this, can we do screenshare? |
Results in unstable system, adding --opt-sub-quad-attention to launch args fixes the problem alone. Thank you. |
Just wanted to say thank you so much! I was not able to run SDXL in A1111 on my AMD 6700XT at all but after your suggestion its running fantastic , not out of memory and it faster then before. Running at 3.74s/it now. Game changer at least for me. |
Is there an existing issue for this?
What happened?
Im trying to train a embedding but im getting this error.
Running webui with this setttings
python3 launch.py --precision full --no-half --opt-split-attention
100%|█████████████████████████████████████████| 616/616 [01:20<00:00, 7.67it/s]
0%| | 0/3000 [00:00<?, ?it/s]Traceback (most recent call last):
File "/home/user/stable-diffusion-webui/modules/textual_inversion/textual_inversion.py", line 395, in train_embedding
scaler.scale(loss).backward()
File "/home/user/stable-diffusion-webui/venv/lib/python3.10/site-packages/torch/_tensor.py", line 488, in backward
torch.autograd.backward(
File "/home/user/stable-diffusion-webui/venv/lib/python3.10/site-packages/torch/autograd/init.py", line 197, in backward
Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass
File "/home/user/stable-diffusion-webui/venv/lib/python3.10/site-packages/torch/autograd/function.py", line 267, in apply
return user_fn(self, *args)
File "/home/user/stable-diffusion-webui/venv/lib/python3.10/site-packages/torch/utils/checkpoint.py", line 157, in backward
torch.autograd.backward(outputs_with_grad, args_with_grad)
File "/home/user/stable-diffusion-webui/venv/lib/python3.10/site-packages/torch/autograd/init.py", line 197, in backward
Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass
torch.cuda.OutOfMemoryError: HIP out of memory. Tried to allocate 512.00 MiB (GPU 0; 9.98 GiB total capacity; 8.51 GiB already allocated; 742.00 MiB free; 9.13 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_HIP_ALLOC_CONF
Steps to reproduce the problem
0%| | 0/3000 [00:00<?, ?it/s]Traceback (most recent call last):
File "/home/akairax/stable-diffusion-webui/modules/textual_inversion/textual_inversion.py", line 395, in train_embedding
scaler.scale(loss).backward()
File "/home/akairax/stable-diffusion-webui/venv/lib/python3.10/site-packages/torch/_tensor.py", line 488, in backward
torch.autograd.backward(
File "/home/akairax/stable-diffusion-webui/venv/lib/python3.10/site-packages/torch/autograd/init.py", line 197, in backward
Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! (when checking argument for argument weight in method wrapper__native_layer_norm_backward)
What should have happened?
just run
Commit where the problem happens
874b975
What platforms do you use to access UI ?
Linux
What browsers do you use to access the UI ?
Mozilla Firefox
Command Line Arguments
Additional information, context and logs
Running ubuntu 22.04
The text was updated successfully, but these errors were encountered: