New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
LORA on 8GB Graphics Cards is broken #755
Comments
Logs for the "Exception training model: too many values to unpack (expected 2)" bug Initializing dreambooth training... Injecting trainable lora... Memory output: {} Returning result: Exception training model: too many values to unpack (expected 2) |
it doesn't look like it will be possible on 8gb for quite a while now |
Downgrade: https://www.reddit.com/r/StableDiffusion/comments/1062b6s/comment/j3gj1yu/?context=3
|
i was using that extension but it doesn't seem to work properly now. results are completely wrong. |
@ovladuk you were using this exact version? How wrong? If it's just noise and nothing coherent then it's a version mismatch (probably xformers). Just make a clean webui install for dreambooth to another folder. |
yeah the prompts weren't giving me results that matched what i trained. what version of auto 11 should i use for clean install? |
@ovladuk latest worked for me today. |
well i have latest auto 11 with the extension's you linked cause i already found that previous version a few days ago but it recently started going wrong. so i don't know. don't see any point install a fresh copy of auto 11 of the version I'm already using. |
@ovladuk unless you experience something like what i get in my prompts #763 (like this) it's most likely incorrect configuration or low quality dataset. |
I really like that extension but d8ahazard results are natively incompatible. A workaround is to merge to a .ckpt first, then take the difference between that and the original base model using extract_lora_from_models.py. Main benefits are of course that dynamic loading capability, low file size, and the ability to easily apply/test LoRA weights with various larger ckpts. An interesting usage of the last part is to apply LoRA to derivative models. |
Kindly read the entire form below and fill it out with the requested information.
Please find the following lines in the console and paste them below. If you do not provide this information, your
issue will be automatically closed.
`
Python revision: 3.10.6 (tags/v3.10.6:9c7b4bd, Aug 1 2022, 21:53:49) [MSC v.1932 64 bit (AMD64)]
Dreambooth revision: 5588089
SD-WebUI revision: 9cfd10cdefc7b2966b8e42fbb0e05735967cf87b
Checking Dreambooth requirements...
[+] bitsandbytes version 0.35.0 installed.
[+] diffusers version 0.10.2 installed.
[+] transformers version 4.25.1 installed.
[+] xformers version 0.0.14.dev0 installed.
[+] torch version 1.12.1+cu113 installed.
[+] torchvision version 0.13.1+cu113 installed.
`
Have you read the Readme?
Yes, a couple of times trying to figure out filewords (which I finally was able to use after using "[filewords]" instead of "[Filewords]", anyways that's a tangent)
Have you completely restarted the stable-diffusion-webUI, not just reloaded the UI?
Yes
Have you updated Dreambooth to the latest revision?
Yes
Have you updated the Stable-Diffusion-WebUI to the latest version?
Yes
No, really. Please save us both some trouble and update the SD-WebUI and Extension and restart before posting this.
Reply 'OK' Below to acknowledge that you did this.
I have the "git pull https://github.com/AUTOMATIC1111/stable-diffusion-webui" in my "wenui-user.bat" file
Describe the bug
Some other issues have this same problem, but their solution of reverting to an old version of dreambooth doesn't work for me.
I have a 8gb 3070 graphics card and a bit over a week ago was able to use LORA to train a model on my graphics card, I followed This Video on how to set up the settings for running dreambooth on less than 8gb of vram (using the "LORA DB - Low VRAM" settings at 4:10).
The settings worked and I was able to actually train models without it saying "CUDA out of memory". I didn't update my WebUi or dreambooth for a while until yesterday when I updated to the latest versions of both. I tried training a new model using the same settings and got the error "Exception training model: 'No executable batch size found, reached zero" I fiddled around with the settings a bit and it still wouldn't work.
Then I checked here and saw some people were having the same issue. People were recommending I go back to older versions of the extension so I did, I went back to the version I was using when it was working fine, eb47a0b, I ran it again, same settings, and got a different error "Exception training model: too many values to unpack (expected 2)"
Then I tried going back to the older version of the webui I was using when it worked 4af3ca5 and tried again, same error. Then I tried using commit fab41d8 and got the same "too many values to unpack" error
Finally I made a brand new, fresh install of the webui in a different folder using the version where it was working, manually installed the old version of dreambooth that was working, and made a new model, and manually added in the settings following the video just as I did over a week ago when it worked. And even then I still get the same "too many values to unpack" error.
At this point I don't know if its something wrong with the extension, the webui, or something else like python or torch or whatever, I have no knowledge of code at all so I can't deduce the issue any more than this.
I'll provide the logs for the "Exception training model: too many values to unpack (expected 2)" bug in the comments.
These logs are for the "Exception training model: 'No executable batch size found, reached zero" bug
Provide logs
Steps: 0%| | 0/9200 [00:00<?, ?it/s]OOM Detected, reducing batch/grad size to 0/1.
Traceback (most recent call last):
File "E:\stable-diffusion-webui\extensions\sd_dreambooth_extension\dreambooth\memory.py", line 116, in decorator
return function(batch_size, grad_size, prof, *args, **kwargs)
File "E:\stable-diffusion-webui\extensions\sd_dreambooth_extension\dreambooth\train_dreambooth.py", line 861, in inner_loop
accelerator.backward(loss)
File "E:\stable-diffusion-webui\venv\lib\site-packages\accelerate\accelerator.py", line 1314, in backward
self.scaler.scale(loss).backward(**kwargs)
File "E:\stable-diffusion-webui\venv\lib\site-packages\torch_tensor.py", line 396, in backward
torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs)
File "E:\stable-diffusion-webui\venv\lib\site-packages\torch\autograd_init_.py", line 173, in backward
Variable.execution_engine.run_backward( # Calls into the C++ engine to run the backward pass
File "E:\stable-diffusion-webui\venv\lib\site-packages\torch\autograd\function.py", line 253, in apply
return user_fn(self, *args)
File "E:\stable-diffusion-webui\venv\lib\site-packages\torch\utils\checkpoint.py", line 146, in backward
torch.autograd.backward(outputs_with_grad, args_with_grad)
File "E:\stable-diffusion-webui\venv\lib\site-packages\torch\autograd_init.py", line 173, in backward
Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass
RuntimeError: CUDA out of memory. Tried to allocate 20.00 MiB (GPU 0; 8.00 GiB total capacity; 7.18 GiB already allocated; 0 bytes free; 7.28 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
Steps: 0%| | 0/9200 [00:03<?, ?it/s]
Traceback (most recent call last):
File "E:\stable-diffusion-webui\extensions\sd_dreambooth_extension\scripts\dreambooth.py", line 561, in start_training
result = main(config, use_txt2img=use_txt2img)
File "E:\stable-diffusion-webui\extensions\sd_dreambooth_extension\dreambooth\train_dreambooth.py", line 973, in main
return inner_loop()
File "E:\stable-diffusion-webui\extensions\sd_dreambooth_extension\dreambooth\memory.py", line 114, in decorator
raise RuntimeError("No executable batch size found, reached zero.")
RuntimeError: No executable batch size found, reached zero.
Training completed, reloading SD Model.
Restored system models.
Returning result: Exception training model: 'No executable batch size found, reached zero.'.
Environment
Windows 10 Home
Version: 21H2
If Windows - WSL or native?
Native
What GPU are you using?
Asus DUAL GeForce RTX 3070 8 GB
Screenshots/Config
db_config.txt
The text was updated successfully, but these errors were encountered: