Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory Management and PYTORCH_CUDA_ALLOC_CONF #124

Closed
charliesdad opened this issue Nov 11, 2022 · 4 comments
Closed

Memory Management and PYTORCH_CUDA_ALLOC_CONF #124

charliesdad opened this issue Nov 11, 2022 · 4 comments

Comments

@charliesdad
Copy link

Have you read the Readme?

Have you completely restarted the stable-diffusion-webUI, not just reloaded the UI?

Have you updated Dreambooth to the latest revision?

Please find the following lines in the console (After "Installing Requirements for WebUI") and paste them below. If you don't see these lines in the console, then update Dreambooth

Dreambooth revision is 3b3a8002da3a780c276934e8bcf308aa822f1e22 Diffusers version is 0.8.0.dev0. Torch version is 1.12.1+cu116. Torch vision version is 0.13.1+cu116.

Describe the bug
A clear and concise description of what the bug is.

Provide logs
If a crash has occurred, please provide the entire stack trace from the log, including the last few log messages before the crash occurred.

Environment
What OS?
If Windows - WSL or native?
What GPU are you using?

@charliesdad
Copy link
Author

Dreambooth revision is 4684f0f
[!] Not using xformers memory efficient attention.
Diffusers version is 0.8.0.dev0.
Torch version is 1.12.1+cu116.
Torch vision version is 0.13.1+cu116.
Copying 8Bit Adam files for Windows.

i am getting the following. It seems to be asking me to manage the memory as reserve is greater than allocated, and says to refer to documentation memory management, but as yet i cant find that?

CUDA SETUP: Loading binary C:\Users\DieselMEDIA\SuperStableDiffusion\stable-diffusion-webui\venv\lib\site-packages\bitsandbytes\libbitsandbytes_cuda116.dll...
Ignoring non-image file: D:\AI\photosofDIESELELKINS\20070419_0030.JPG
Ignoring non-image file: D:\AI\photosofDIESELELKINS\DSC03035 - Copy.JPG
Ignoring non-image file: D:\AI\photosofDIESELELKINS\DSC03666black.JPG
Ignoring non-image file: D:\AI\photosofDIESELELKINS\DSC08694.JPG
Ignoring non-image file: D:\AI\photosofDIESELELKINS\DSC08696.JPG
Ignoring non-image file: D:\AI\photosofDIESELELKINS\DSC08697.JPG
Ignoring non-image file: D:\AI\photosofDIESELELKINS\DSC08698.JPG
Ignoring non-image file: D:\AI\photosofDIESELELKINS\DSC08701.JPG
Ignoring non-image file: D:\AI\photosofDIESELELKINS\DSC08702 - Copy.JPG
Ignoring non-image file: D:\AI\photosofDIESELELKINS\DSC08702.JPG
Ignoring non-image file: D:\AI\photosofDIESELELKINS\DSC08704.JPG
Ignoring non-image file: D:\AI\photosofDIESELELKINS\DSC08706 - Copy.JPG
Ignoring non-image file: D:\AI\photosofDIESELELKINS\DSC08706.JPG
Ignoring non-image file: D:\AI\photosofDIESELELKINS\DSC08709.JPG
Ignoring non-image file: D:\AI\photosofDIESELELKINS\DSC08710 (2).JPG
Ignoring non-image file: D:\AI\photosofDIESELELKINS\DSC08710 (2)picasa.JPG
Ignoring non-image file: D:\AI\photosofDIESELELKINS\DSC08712.JPG
Ignoring non-image file: D:\AI\photosofDIESELELKINS\DSC08713.JPG
Ignoring non-image file: D:\AI\photosofDIESELELKINS\DSC08715.JPG
Ignoring non-image file: D:\AI\photosofDIESELELKINS\DSC08716.JPG
Ignoring non-image file: D:\AI\photosofDIESELELKINS\DSC08717.JPG
Ignoring non-image file: D:\AI\photosofDIESELELKINS\DSC08718.JPG
Ignoring non-image file: D:\AI\photosofDIESELELKINS\DSC08725.JPG
Ignoring non-image file: D:\AI\photosofDIESELELKINS\DSC08727.JPG
Ignoring non-image file: D:\AI\photosofDIESELELKINS\DSC08734.JPG
Ignoring non-image file: D:\AI\photosofDIESELELKINS\DSC08735.JPG
Ignoring non-image file: D:\AI\photosofDIESELELKINS\DSC08740.JPG
Ignoring non-image file: D:\AI\photosofDIESELELKINS\DSC08741.JPG
Ignoring non-image file: D:\AI\photosofDIESELELKINS\DSC08759.JPG
Ignoring non-image file: D:\AI\photosofDIESELELKINS\DSC08768.JPG
Ignoring non-image file: D:\AI\photosofDIESELELKINS\DSC08770.JPG
Ignoring non-image file: D:\AI\photosofDIESELELKINS\DSC08771.JPG
Ignoring non-image file: D:\AI\photosofDIESELELKINS\DSC08773.JPG
Ignoring non-image file: D:\AI\photosofDIESELELKINS\DSC08777.JPG
Ignoring non-image file: D:\AI\photosofDIESELELKINS\DSC08782.JPG
Ignoring non-image file: D:\AI\photosofDIESELELKINS\DSC08799.JPG
Ignoring non-image file: D:\AI\photosofDIESELELKINS\DSC08811.JPG
Ignoring non-image file: D:\AI\photosofDIESELELKINS\DSC08814.JPG
Ignoring non-image file: D:\AI\photosofDIESELELKINS\DSC08816.JPG
Ignoring non-image file: D:\AI\photosofDIESELELKINS\DSC08820.JPG
Ignoring non-image file: D:\AI\photosofDIESELELKINS\DSC08822.JPG
Ignoring non-image file: D:\AI\photosofDIESELELKINS\DSC08834.JPG
Ignoring non-image file: D:\AI\photosofDIESELELKINS\DSC08845.JPG
Ignoring non-image file: D:\AI\photosofDIESELELKINS\DSC08848.JPG
Ignoring non-image file: D:\AI\photosofDIESELELKINS\DSC08850.JPG
Ignoring non-image file: D:\AI\photosofDIESELELKINS\DSC08858.JPG
Ignoring non-image file: D:\AI\photosofDIESELELKINS\DSC08893.JPG
Ignoring non-image file: D:\AI\photosofDIESELELKINS\DSC08896.JPG
Ignoring non-image file: D:\AI\photosofDIESELELKINS\DSC08897.JPG
Ignoring non-image file: D:\AI\photosofDIESELELKINS\DSC08906.JPG
Ignoring non-image file: D:\AI\photosofDIESELELKINS\DSC08907.JPG
Ignoring non-image file: D:\AI\photosofDIESELELKINS\DSC08908.JPG
Ignoring non-image file: D:\AI\photosofDIESELELKINS\DSC08912.JPG
Ignoring non-image file: D:\AI\photosofDIESELELKINS\DSC08913.JPG
Ignoring non-image file: D:\AI\photosofDIESELELKINS\DSC08915.JPG
Ignoring non-image file: D:\AI\photosofDIESELELKINS\DSC08916 - Copy.JPG
Ignoring non-image file: D:\AI\photosofDIESELELKINS\DSC08918.JPG
Ignoring non-image file: D:\AI\photosofDIESELELKINS\DSC08934.JPG
Ignoring non-image file: D:\AI\photosofDIESELELKINS\DSC08937.JPG
Ignoring non-image file: D:\AI\photosofDIESELELKINS\DSC08945.JPG
Ignoring non-image file: D:\AI\photosofDIESELELKINS\DSC08956.JPG
Ignoring non-image file: D:\AI\photosofDIESELELKINS\DSC08971.JPG
Ignoring non-image file: D:\AI\photosofDIESELELKINS\DSC08980.JPG
Ignoring non-image file: D:\AI\photosofDIESELELKINS\DSC08996.JPG
Ignoring non-image file: D:\AI\photosofDIESELELKINS\DSC09001.JPG
Ignoring non-image file: D:\AI\photosofDIESELELKINS\DSC09004.JPG
Ignoring non-image file: D:\AI\photosofDIESELELKINS\DSC09059.JPG
Ignoring non-image file: D:\AI\photosofDIESELELKINS\DSC09070.JPG
Ignoring non-image file: D:\AI\photosofDIESELELKINS\DSC09074.JPG
Ignoring non-image file: D:\AI\photosofDIESELELKINS\DSC09087.JPG
Ignoring non-image file: D:\AI\photosofDIESELELKINS\DSC09088.JPG
Ignoring non-image file: D:\AI\photosofDIESELELKINS\DSC09096.JPG
Ignoring non-image file: D:\AI\photosofDIESELELKINS\DSC09099.JPG
Ignoring non-image file: D:\AI\photosofDIESELELKINS\DSC09100.JPG
Ignoring non-image file: D:\AI\photosofDIESELELKINS\DSC09118.JPG
Ignoring non-image file: D:\AI\photosofDIESELELKINS\DSC09120.JPG
Ignoring non-image file: D:\AI\photosofDIESELELKINS\DSC09151.JPG
Ignoring non-image file: D:\AI\photosofDIESELELKINS\DSC09200 - Copy.JPG
Ignoring non-image file: D:\AI\photosofDIESELELKINS\DSC09200.JPG
Ignoring non-image file: D:\AI\photosofDIESELELKINS\DSC09227.JPG
Ignoring non-image file: D:\AI\photosofDIESELELKINS\DSC09228.JPG
Ignoring non-image file: D:\AI\photosofDIESELELKINS\DSC09231.JPG
Ignoring non-image file: D:\AI\photosofDIESELELKINS\DSC09274.JPG
Ignoring non-image file: D:\AI\photosofDIESELELKINS\DSC09282.JPG
Ignoring non-image file: D:\AI\photosofDIESELELKINS\DSC09283.JPG
Ignoring non-image file: D:\AI\photosofDIESELELKINS\DSC09321.JPG
Ignoring non-image file: D:\AI\photosofDIESELELKINS\DSC09323.JPG
Ignoring non-image file: D:\AI\photosofDIESELELKINS\DSC09382.JPG
Ignoring non-image file: D:\AI\photosofDIESELELKINS\DSC09546.JPG
Ignoring non-image file: D:\AI\photosofDIESELELKINS\DSC09549.JPG
Scheduler Loaded
Allocated: 0.2GB
Reserved: 0.2GB

Steps: 0%| | 0/1200 [00:00<?, ?it/s] Exception while training: CUDA out of memory. Tried to allocate 1024.00 MiB (GPU 0; 12.00 GiB total capacity; 9.67 GiB already allocated; 0 bytes free; 10.37 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
Allocated: 8.1GB
Reserved: 10.4GB

Traceback (most recent call last):
File "C:\Users\DieselMEDIA\SuperStableDiffusion\stable-diffusion-webui\extensions\sd_dreambooth_extension\dreambooth\train_dreambooth.py", line 923, in main
accelerator.backward(loss)
File "C:\Users\DieselMEDIA\SuperStableDiffusion\stable-diffusion-webui\venv\lib\site-packages\accelerate\accelerator.py", line 882, in backward
self.scaler.scale(loss).backward(**kwargs)
File "C:\Users\DieselMEDIA\SuperStableDiffusion\stable-diffusion-webui\venv\lib\site-packages\torch_tensor.py", line 396, in backward
torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs)
File "C:\Users\DieselMEDIA\SuperStableDiffusion\stable-diffusion-webui\venv\lib\site-packages\torch\autograd_init_.py", line 173, in backward
Variable.execution_engine.run_backward( # Calls into the C++ engine to run the backward pass
File "C:\Users\DieselMEDIA\SuperStableDiffusion\stable-diffusion-webui\venv\lib\site-packages\torch\autograd\function.py", line 253, in apply
return user_fn(self, *args)
File "C:\Users\DieselMEDIA\SuperStableDiffusion\stable-diffusion-webui\venv\lib\site-packages\torch\utils\checkpoint.py", line 146, in backward
torch.autograd.backward(outputs_with_grad, args_with_grad)
File "C:\Users\DieselMEDIA\SuperStableDiffusion\stable-diffusion-webui\venv\lib\site-packages\torch\autograd_init
.py", line 173, in backward
Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass
RuntimeError: CUDA out of memory. Tried to allocate 1024.00 MiB (GPU 0; 12.00 GiB total capacity; 9.67 GiB already allocated; 0 bytes free; 10.37 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
CLEANUP:
Allocated: 7.1GB
Reserved: 10.4GB

Cleanup Complete.
Allocated: 6.9GB
Reserved: 7.7GB

Steps: 0%| | 0/1200 [00:16<?, ?it/s]
Training completed, reloading SD Model.
Allocated: 0.0GB
Reserved: 7.7GB

Memory output: {'VRAM cleared.': '0.0/0.0GB', 'Training completed, reloading SD Model.': '0.0/7.7GB'}
Re-applying optimizations...
Returning result: Training finished. Total lifetime steps: 3

Windows 10 pro
WSL
palit rtx 3060 12gb

@LePrau
Copy link

LePrau commented Nov 11, 2022

The error message is "RuntimeError: CUDA out of memory. Tried to allocate 1024.00 MiB (GPU 0; 12.00 GiB total capacity; 9.67 GiB already allocated; 0 bytes free; 10.37 GiB reserved in total by PyTorch)"

Basically, the process to aquire more VRAM than is available. Since 10.37 + 1 Gb would be less than 12 it seems you could still reduce the base VRAM usage on your system somehow (closing all programs except the browser).

Maybe some of the settings I used to run with a 10Gb 3080 work for you: #84 (comment)

@charliesdad
Copy link
Author

charliesdad commented Nov 13, 2022

The error message is "RuntimeError: CUDA out of memory. Tried to allocate 1024.00 MiB (GPU 0; 12.00 GiB total capacity; 9.67 GiB already allocated; 0 bytes free; 10.37 GiB reserved in total by PyTorch)"

Basically, the process to aquire more VRAM than is available. Since 10.37 + 1 Gb would be less than 12 it seems you could still reduce the base VRAM usage on your system somehow (closing all programs except the browser).

Maybe some of the settings I used to run with a 10Gb 3080 work for you: #84 (comment)

It seemed to work for one short session but the results were terrible
since then its back to the same error.
from every tutorial ive watched i thought a 12gb card would be okay?

From stuff ive read, am i right that pytorch is reserving the memory which is causing the error?

from today -

C:\Users\DieselMEDIA\SuperStableDiffusion\stable-diffusion-webui>git pull
Already up to date.

C:\Users\DieselMEDIA\SuperStableDiffusion\stable-diffusion-webui>pause
Press any key to continue . . .
'#set' is not recognized as an internal or external command,
operable program or batch file.
venv "C:\Users\DieselMEDIA\SuperStableDiffusion\stable-diffusion-webui\venv\Scripts\Python.exe"
Python 3.10.7 (tags/v3.10.7:6cc6b13, Sep 5 2022, 14:08:36) [MSC v.1933 64 bit (AMD64)]
Commit hash: 98947d173e3f1667eba29c904f681047dea9de90
Installing requirements for Web UI
Error running install.py for extension extensions\sd_dreambooth_extension.
Command: "C:\Users\DieselMEDIA\SuperStableDiffusion\stable-diffusion-webui\venv\Scripts\python.exe" "extensions\sd_dreambooth_extension\install.py"
Error code: 1
stdout: loading Dreambooth reqs from C:\Users\DieselMEDIA\SuperStableDiffusion\stable-diffusion-webui\extensions\sd_dreambooth_extension\requirements.txt
Checking Dreambooth requirements.
Checking torch and torchvision versions

stderr: Traceback (most recent call last):
File "C:\Users\DieselMEDIA\SuperStableDiffusion\stable-diffusion-webui\extensions\sd_dreambooth_extension\install.py", line 21, in
run(f'"{sys.executable}" -m {torch_cmd}', "Checking torch and torchvision versions", "Couldn't install torch")
File "C:\Users\DieselMEDIA\SuperStableDiffusion\stable-diffusion-webui\launch.py", line 34, in run
raise RuntimeError(message)
RuntimeError: Couldn't install torch.
Command: "C:\Users\DieselMEDIA\SuperStableDiffusion\stable-diffusion-webui\venv\Scripts\python.exe" -m "pip install torch==1.12.1+cu116 torchvision==0.13.1+cu116 --extra-index-url https://download.pytorch.org/whl/cu116"#
Error code: 1
stdout:
stderr: C:\Users\DieselMEDIA\SuperStableDiffusion\stable-diffusion-webui\venv\Scripts\python.exe: Error while finding module specification for 'pip install torch==1.12.1+cu116 torchvision==0.13.1+cu116 --extra-index-url https://download.pytorch.org/whl/cu116#' (ModuleNotFoundError: No module named 'pip install torch==1')

Launching Web UI with arguments: --disable-safe-unpickle
Preloading Dreambooth!
[!] Not using xformers memory efficient attention.
LatentDiffusion: Running in eps-prediction mode
DiffusionWrapper has 859.52 M params.
making attention of type 'vanilla' with 512 in_channels
Working with z of shape (1, 4, 32, 32) = 4096 dimensions.
making attention of type 'vanilla' with 512 in_channels
Loading weights [81761151] from C:\Users\DieselMEDIA\SuperStableDiffusion\stable-diffusion-webui\models\Stable-diffusion\v1-5-pruned-emaonly.ckpt
Global Step: 840000
Applying cross attention optimization (Doggettx).
Model loaded.
Loaded a total of 0 textual inversion embeddings.
Embeddings:
Running on local URL: http://127.0.0.1:7860

To create a public link, set share=True in launch().
Starting Dreambooth training...
VRAM cleared.
Allocated: 0.0GB
Reserved: 0.0GB

Loaded model.
Allocated: 0.0GB
Reserved: 0.0GB

===================================BUG REPORT===================================
Welcome to bitsandbytes. For bug reports, please submit your error trace to: https://github.com/TimDettmers/bitsandbytes/issues
For effortless bug reporting copy-paste your error into this form: https://docs.google.com/forms/d/e/1FAIpQLScPB8emS3Thkp66nvqwmjTEgxp8Y9ufuWTzFyr9kJ5AoI47dQ/viewform?usp=sf_link

CUDA SETUP: Loading binary C:\Users\DieselMEDIA\SuperStableDiffusion\stable-diffusion-webui\venv\lib\site-packages\bitsandbytes\libbitsandbytes_cuda116.dll...
Scheduler Loaded
Allocated: 0.6GB
Reserved: 0.7GB

Steps: 0%| | 0/4300 [00:00<?, ?it/s] Exception while training: CUDA out of memory. Tried to allocate 2.00 GiB (GPU 0; 12.00 GiB total capacity; 6.83 GiB already allocated; 0 bytes free; 9.74 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
Allocated: 5.5GB
Reserved: 9.7GB

Traceback (most recent call last):
File "C:\Users\DieselMEDIA\SuperStableDiffusion\stable-diffusion-webui\extensions\sd_dreambooth_extension\dreambooth\train_dreambooth.py", line 967, in main
noise_pred = unet(noisy_latents, timesteps, encoder_hidden_states).sample
File "C:\Users\DieselMEDIA\SuperStableDiffusion\stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "C:\Users\DieselMEDIA\SuperStableDiffusion\stable-diffusion-webui\venv\lib\site-packages\accelerate\utils\operations.py", line 507, in call
return convert_to_fp32(self.model_forward(*args, **kwargs))
File "C:\Users\DieselMEDIA\SuperStableDiffusion\stable-diffusion-webui\venv\lib\site-packages\torch\amp\autocast_mode.py", line 12, in decorate_autocast
return func(*args, **kwargs)
File "C:\Users\DieselMEDIA\SuperStableDiffusion\stable-diffusion-webui\venv\lib\site-packages\diffusers\models\unet_2d_condition.py", line 333, in forward
sample = upsample_block(
File "C:\Users\DieselMEDIA\SuperStableDiffusion\stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "C:\Users\DieselMEDIA\SuperStableDiffusion\stable-diffusion-webui\venv\lib\site-packages\diffusers\models\unet_2d_blocks.py", line 1185, in forward
hidden_states = torch.utils.checkpoint.checkpoint(
File "C:\Users\DieselMEDIA\SuperStableDiffusion\stable-diffusion-webui\venv\lib\site-packages\torch\utils\checkpoint.py", line 235, in checkpoint
return CheckpointFunction.apply(function, preserve, *args)
File "C:\Users\DieselMEDIA\SuperStableDiffusion\stable-diffusion-webui\venv\lib\site-packages\torch\utils\checkpoint.py", line 96, in forward
outputs = run_function(*args)
File "C:\Users\DieselMEDIA\SuperStableDiffusion\stable-diffusion-webui\venv\lib\site-packages\diffusers\models\unet_2d_blocks.py", line 1178, in custom_forward
return module(*inputs, return_dict=return_dict)
File "C:\Users\DieselMEDIA\SuperStableDiffusion\stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "C:\Users\DieselMEDIA\SuperStableDiffusion\stable-diffusion-webui\venv\lib\site-packages\diffusers\models\attention.py", line 204, in forward
hidden_states = block(hidden_states, context=encoder_hidden_states, timestep=timestep)
File "C:\Users\DieselMEDIA\SuperStableDiffusion\stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "C:\Users\DieselMEDIA\SuperStableDiffusion\stable-diffusion-webui\venv\lib\site-packages\diffusers\models\attention.py", line 406, in forward
hidden_states = self.attn1(norm_hidden_states) + hidden_states
File "C:\Users\DieselMEDIA\SuperStableDiffusion\stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "C:\Users\DieselMEDIA\SuperStableDiffusion\stable-diffusion-webui\venv\lib\site-packages\diffusers\models\attention.py", line 497, in forward
hidden_states = self._attention(query, key, value)
File "C:\Users\DieselMEDIA\SuperStableDiffusion\stable-diffusion-webui\venv\lib\site-packages\diffusers\models\attention.py", line 515, in _attention
attention_probs = attention_scores.softmax(dim=-1)
RuntimeError: CUDA out of memory. Tried to allocate 2.00 GiB (GPU 0; 12.00 GiB total capacity; 6.83 GiB already allocated; 0 bytes free; 9.74 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
Exception while training: CUDA out of memory. Tried to allocate 2.00 GiB (GPU 0; 12.00 GiB total capacity; 6.79 GiB already allocated; 0 bytes free; 9.74 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
Allocated: 5.5GB
Reserved: 9.7GB

Traceback (most recent call last):
File "C:\Users\DieselMEDIA\SuperStableDiffusion\stable-diffusion-webui\extensions\sd_dreambooth_extension\dreambooth\train_dreambooth.py", line 967, in main
noise_pred = unet(noisy_latents, timesteps, encoder_hidden_states).sample
File "C:\Users\DieselMEDIA\SuperStableDiffusion\stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "C:\Users\DieselMEDIA\SuperStableDiffusion\stable-diffusion-webui\venv\lib\site-packages\accelerate\utils\operations.py", line 507, in call
return convert_to_fp32(self.model_forward(*args, **kwargs))
File "C:\Users\DieselMEDIA\SuperStableDiffusion\stable-diffusion-webui\venv\lib\site-packages\torch\amp\autocast_mode.py", line 12, in decorate_autocast
return func(*args, **kwargs)
File "C:\Users\DieselMEDIA\SuperStableDiffusion\stable-diffusion-webui\venv\lib\site-packages\diffusers\models\unet_2d_condition.py", line 333, in forward
sample = upsample_block(
File "C:\Users\DieselMEDIA\SuperStableDiffusion\stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "C:\Users\DieselMEDIA\SuperStableDiffusion\stable-diffusion-webui\venv\lib\site-packages\diffusers\models\unet_2d_blocks.py", line 1185, in forward
hidden_states = torch.utils.checkpoint.checkpoint(
File "C:\Users\DieselMEDIA\SuperStableDiffusion\stable-diffusion-webui\venv\lib\site-packages\torch\utils\checkpoint.py", line 235, in checkpoint
return CheckpointFunction.apply(function, preserve, *args)
File "C:\Users\DieselMEDIA\SuperStableDiffusion\stable-diffusion-webui\venv\lib\site-packages\torch\utils\checkpoint.py", line 96, in forward
outputs = run_function(*args)
File "C:\Users\DieselMEDIA\SuperStableDiffusion\stable-diffusion-webui\venv\lib\site-packages\diffusers\models\unet_2d_blocks.py", line 1178, in custom_forward
return module(*inputs, return_dict=return_dict)
File "C:\Users\DieselMEDIA\SuperStableDiffusion\stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "C:\Users\DieselMEDIA\SuperStableDiffusion\stable-diffusion-webui\venv\lib\site-packages\diffusers\models\attention.py", line 204, in forward
hidden_states = block(hidden_states, context=encoder_hidden_states, timestep=timestep)
File "C:\Users\DieselMEDIA\SuperStableDiffusion\stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "C:\Users\DieselMEDIA\SuperStableDiffusion\stable-diffusion-webui\venv\lib\site-packages\diffusers\models\attention.py", line 406, in forward
hidden_states = self.attn1(norm_hidden_states) + hidden_states
File "C:\Users\DieselMEDIA\SuperStableDiffusion\stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "C:\Users\DieselMEDIA\SuperStableDiffusion\stable-diffusion-webui\venv\lib\site-packages\diffusers\models\attention.py", line 497, in forward
hidden_states = self._attention(query, key, value)
File "C:\Users\DieselMEDIA\SuperStableDiffusion\stable-diffusion-webui\venv\lib\site-packages\diffusers\models\attention.py", line 515, in _attention
attention_probs = attention_scores.softmax(dim=-1)
RuntimeError: CUDA out of memory. Tried to allocate 2.00 GiB (GPU 0; 12.00 GiB total capacity; 6.79 GiB already allocated; 0 bytes free; 9.74 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
Exception while training: CUDA out of memory. Tried to allocate 2.00 GiB (GPU 0; 12.00 GiB total capacity; 6.79 GiB already allocated; 0 bytes free; 9.74 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
Allocated: 5.5GB
Reserved: 9.7GB

Traceback (most recent call last):
File "C:\Users\DieselMEDIA\SuperStableDiffusion\stable-diffusion-webui\extensions\sd_dreambooth_extension\dreambooth\train_dreambooth.py", line 967, in main
noise_pred = unet(noisy_latents, timesteps, encoder_hidden_states).sample
File "C:\Users\DieselMEDIA\SuperStableDiffusion\stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "C:\Users\DieselMEDIA\SuperStableDiffusion\stable-diffusion-webui\venv\lib\site-packages\accelerate\utils\operations.py", line 507, in call
return convert_to_fp32(self.model_forward(*args, **kwargs))
File "C:\Users\DieselMEDIA\SuperStableDiffusion\stable-diffusion-webui\venv\lib\site-packages\torch\amp\autocast_mode.py", line 12, in decorate_autocast
return func(*args, **kwargs)
File "C:\Users\DieselMEDIA\SuperStableDiffusion\stable-diffusion-webui\venv\lib\site-packages\diffusers\models\unet_2d_condition.py", line 333, in forward
sample = upsample_block(
File "C:\Users\DieselMEDIA\SuperStableDiffusion\stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "C:\Users\DieselMEDIA\SuperStableDiffusion\stable-diffusion-webui\venv\lib\site-packages\diffusers\models\unet_2d_blocks.py", line 1185, in forward
hidden_states = torch.utils.checkpoint.checkpoint(
File "C:\Users\DieselMEDIA\SuperStableDiffusion\stable-diffusion-webui\venv\lib\site-packages\torch\utils\checkpoint.py", line 235, in checkpoint
return CheckpointFunction.apply(function, preserve, *args)
File "C:\Users\DieselMEDIA\SuperStableDiffusion\stable-diffusion-webui\venv\lib\site-packages\torch\utils\checkpoint.py", line 96, in forward
outputs = run_function(*args)
File "C:\Users\DieselMEDIA\SuperStableDiffusion\stable-diffusion-webui\venv\lib\site-packages\diffusers\models\unet_2d_blocks.py", line 1178, in custom_forward
return module(*inputs, return_dict=return_dict)
File "C:\Users\DieselMEDIA\SuperStableDiffusion\stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "C:\Users\DieselMEDIA\SuperStableDiffusion\stable-diffusion-webui\venv\lib\site-packages\diffusers\models\attention.py", line 204, in forward
hidden_states = block(hidden_states, context=encoder_hidden_states, timestep=timestep)
File "C:\Users\DieselMEDIA\SuperStableDiffusion\stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "C:\Users\DieselMEDIA\SuperStableDiffusion\stable-diffusion-webui\venv\lib\site-packages\diffusers\models\attention.py", line 406, in forward
hidden_states = self.attn1(norm_hidden_states) + hidden_states
File "C:\Users\DieselMEDIA\SuperStableDiffusion\stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "C:\Users\DieselMEDIA\SuperStableDiffusion\stable-diffusion-webui\venv\lib\site-packages\diffusers\models\attention.py", line 497, in forward
hidden_states = self._attention(query, key, value)
File "C:\Users\DieselMEDIA\SuperStableDiffusion\stable-diffusion-webui\venv\lib\site-packages\diffusers\models\attention.py", line 515, in _attention
attention_probs = attention_scores.softmax(dim=-1)
RuntimeError: CUDA out of memory. Tried to allocate 2.00 GiB (GPU 0; 12.00 GiB total capacity; 6.79 GiB already allocated; 0 bytes free; 9.74 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
Exception while training: CUDA out of memory. Tried to allocate 2.00 GiB (GPU 0; 12.00 GiB total capacity; 6.79 GiB already allocated; 0 bytes free; 9.74 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
Allocated: 5.5GB
Reserved: 9.7GB

Traceback (most recent call last):
File "C:\Users\DieselMEDIA\SuperStableDiffusion\stable-diffusion-webui\extensions\sd_dreambooth_extension\dreambooth\train_dreambooth.py", line 967, in main
noise_pred = unet(noisy_latents, timesteps, encoder_hidden_states).sample
File "C:\Users\DieselMEDIA\SuperStableDiffusion\stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "C:\Users\DieselMEDIA\SuperStableDiffusion\stable-diffusion-webui\venv\lib\site-packages\accelerate\utils\operations.py", line 507, in call
return convert_to_fp32(self.model_forward(*args, **kwargs))
File "C:\Users\DieselMEDIA\SuperStableDiffusion\stable-diffusion-webui\venv\lib\site-packages\torch\amp\autocast_mode.py", line 12, in decorate_autocast
return func(*args, **kwargs)
File "C:\Users\DieselMEDIA\SuperStableDiffusion\stable-diffusion-webui\venv\lib\site-packages\diffusers\models\unet_2d_condition.py", line 333, in forward
sample = upsample_block(
File "C:\Users\DieselMEDIA\SuperStableDiffusion\stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "C:\Users\DieselMEDIA\SuperStableDiffusion\stable-diffusion-webui\venv\lib\site-packages\diffusers\models\unet_2d_blocks.py", line 1185, in forward
hidden_states = torch.utils.checkpoint.checkpoint(
File "C:\Users\DieselMEDIA\SuperStableDiffusion\stable-diffusion-webui\venv\lib\site-packages\torch\utils\checkpoint.py", line 235, in checkpoint
return CheckpointFunction.apply(function, preserve, *args)
File "C:\Users\DieselMEDIA\SuperStableDiffusion\stable-diffusion-webui\venv\lib\site-packages\torch\utils\checkpoint.py", line 96, in forward
outputs = run_function(*args)
File "C:\Users\DieselMEDIA\SuperStableDiffusion\stable-diffusion-webui\venv\lib\site-packages\diffusers\models\unet_2d_blocks.py", line 1178, in custom_forward
return module(*inputs, return_dict=return_dict)
File "C:\Users\DieselMEDIA\SuperStableDiffusion\stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "C:\Users\DieselMEDIA\SuperStableDiffusion\stable-diffusion-webui\venv\lib\site-packages\diffusers\models\attention.py", line 204, in forward
hidden_states = block(hidden_states, context=encoder_hidden_states, timestep=timestep)
File "C:\Users\DieselMEDIA\SuperStableDiffusion\stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "C:\Users\DieselMEDIA\SuperStableDiffusion\stable-diffusion-webui\venv\lib\site-packages\diffusers\models\attention.py", line 406, in forward
hidden_states = self.attn1(norm_hidden_states) + hidden_states
File "C:\Users\DieselMEDIA\SuperStableDiffusion\stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "C:\Users\DieselMEDIA\SuperStableDiffusion\stable-diffusion-webui\venv\lib\site-packages\diffusers\models\attention.py", line 497, in forward
hidden_states = self._attention(query, key, value)
File "C:\Users\DieselMEDIA\SuperStableDiffusion\stable-diffusion-webui\venv\lib\site-packages\diffusers\models\attention.py", line 515, in _attention
attention_probs = attention_scores.softmax(dim=-1)
RuntimeError: CUDA out of memory. Tried to allocate 2.00 GiB (GPU 0; 12.00 GiB total capacity; 6.79 GiB already allocated; 0 bytes free; 9.74 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
Exception while training: CUDA out of memory. Tried to allocate 2.00 GiB (GPU 0; 12.00 GiB total capacity; 6.79 GiB already allocated; 0 bytes free; 9.74 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
Allocated: 5.5GB
Reserved: 9.7GB

Traceback (most recent call last):
File "C:\Users\DieselMEDIA\SuperStableDiffusion\stable-diffusion-webui\extensions\sd_dreambooth_extension\dreambooth\train_dreambooth.py", line 967, in main
noise_pred = unet(noisy_latents, timesteps, encoder_hidden_states).sample
File "C:\Users\DieselMEDIA\SuperStableDiffusion\stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "C:\Users\DieselMEDIA\SuperStableDiffusion\stable-diffusion-webui\venv\lib\site-packages\accelerate\utils\operations.py", line 507, in call
return convert_to_fp32(self.model_forward(*args, **kwargs))
File "C:\Users\DieselMEDIA\SuperStableDiffusion\stable-diffusion-webui\venv\lib\site-packages\torch\amp\autocast_mode.py", line 12, in decorate_autocast
return func(*args, **kwargs)
File "C:\Users\DieselMEDIA\SuperStableDiffusion\stable-diffusion-webui\venv\lib\site-packages\diffusers\models\unet_2d_condition.py", line 333, in forward
sample = upsample_block(
File "C:\Users\DieselMEDIA\SuperStableDiffusion\stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "C:\Users\DieselMEDIA\SuperStableDiffusion\stable-diffusion-webui\venv\lib\site-packages\diffusers\models\unet_2d_blocks.py", line 1185, in forward
hidden_states = torch.utils.checkpoint.checkpoint(
File "C:\Users\DieselMEDIA\SuperStableDiffusion\stable-diffusion-webui\venv\lib\site-packages\torch\utils\checkpoint.py", line 235, in checkpoint
return CheckpointFunction.apply(function, preserve, *args)
File "C:\Users\DieselMEDIA\SuperStableDiffusion\stable-diffusion-webui\venv\lib\site-packages\torch\utils\checkpoint.py", line 96, in forward
outputs = run_function(*args)
File "C:\Users\DieselMEDIA\SuperStableDiffusion\stable-diffusion-webui\venv\lib\site-packages\diffusers\models\unet_2d_blocks.py", line 1178, in custom_forward
return module(*inputs, return_dict=return_dict)
File "C:\Users\DieselMEDIA\SuperStableDiffusion\stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "C:\Users\DieselMEDIA\SuperStableDiffusion\stable-diffusion-webui\venv\lib\site-packages\diffusers\models\attention.py", line 204, in forward
hidden_states = block(hidden_states, context=encoder_hidden_states, timestep=timestep)
File "C:\Users\DieselMEDIA\SuperStableDiffusion\stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "C:\Users\DieselMEDIA\SuperStableDiffusion\stable-diffusion-webui\venv\lib\site-packages\diffusers\models\attention.py", line 406, in forward
hidden_states = self.attn1(norm_hidden_states) + hidden_states
File "C:\Users\DieselMEDIA\SuperStableDiffusion\stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "C:\Users\DieselMEDIA\SuperStableDiffusion\stable-diffusion-webui\venv\lib\site-packages\diffusers\models\attention.py", line 497, in forward
hidden_states = self._attention(query, key, value)
File "C:\Users\DieselMEDIA\SuperStableDiffusion\stable-diffusion-webui\venv\lib\site-packages\diffusers\models\attention.py", line 515, in _attention
attention_probs = attention_scores.softmax(dim=-1)
RuntimeError: CUDA out of memory. Tried to allocate 2.00 GiB (GPU 0; 12.00 GiB total capacity; 6.79 GiB already allocated; 0 bytes free; 9.74 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
Exception while training: CUDA out of memory. Tried to allocate 2.00 GiB (GPU 0; 12.00 GiB total capacity; 6.79 GiB already allocated; 0 bytes free; 9.74 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
Allocated: 5.5GB
Reserved: 9.7GB

Traceback (most recent call last):
File "C:\Users\DieselMEDIA\SuperStableDiffusion\stable-diffusion-webui\extensions\sd_dreambooth_extension\dreambooth\train_dreambooth.py", line 967, in main
noise_pred = unet(noisy_latents, timesteps, encoder_hidden_states).sample
File "C:\Users\DieselMEDIA\SuperStableDiffusion\stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "C:\Users\DieselMEDIA\SuperStableDiffusion\stable-diffusion-webui\venv\lib\site-packages\accelerate\utils\operations.py", line 507, in call
return convert_to_fp32(self.model_forward(*args, **kwargs))
File "C:\Users\DieselMEDIA\SuperStableDiffusion\stable-diffusion-webui\venv\lib\site-packages\torch\amp\autocast_mode.py", line 12, in decorate_autocast
return func(*args, **kwargs)
File "C:\Users\DieselMEDIA\SuperStableDiffusion\stable-diffusion-webui\venv\lib\site-packages\diffusers\models\unet_2d_condition.py", line 333, in forward
sample = upsample_block(
File "C:\Users\DieselMEDIA\SuperStableDiffusion\stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "C:\Users\DieselMEDIA\SuperStableDiffusion\stable-diffusion-webui\venv\lib\site-packages\diffusers\models\unet_2d_blocks.py", line 1185, in forward
hidden_states = torch.utils.checkpoint.checkpoint(
File "C:\Users\DieselMEDIA\SuperStableDiffusion\stable-diffusion-webui\venv\lib\site-packages\torch\utils\checkpoint.py", line 235, in checkpoint
return CheckpointFunction.apply(function, preserve, *args)
File "C:\Users\DieselMEDIA\SuperStableDiffusion\stable-diffusion-webui\venv\lib\site-packages\torch\utils\checkpoint.py", line 96, in forward
outputs = run_function(*args)
File "C:\Users\DieselMEDIA\SuperStableDiffusion\stable-diffusion-webui\venv\lib\site-packages\diffusers\models\unet_2d_blocks.py", line 1178, in custom_forward
return module(*inputs, return_dict=return_dict)
File "C:\Users\DieselMEDIA\SuperStableDiffusion\stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "C:\Users\DieselMEDIA\SuperStableDiffusion\stable-diffusion-webui\venv\lib\site-packages\diffusers\models\attention.py", line 204, in forward
hidden_states = block(hidden_states, context=encoder_hidden_states, timestep=timestep)
File "C:\Users\DieselMEDIA\SuperStableDiffusion\stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "C:\Users\DieselMEDIA\SuperStableDiffusion\stable-diffusion-webui\venv\lib\site-packages\diffusers\models\attention.py", line 406, in forward
hidden_states = self.attn1(norm_hidden_states) + hidden_states
File "C:\Users\DieselMEDIA\SuperStableDiffusion\stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "C:\Users\DieselMEDIA\SuperStableDiffusion\stable-diffusion-webui\venv\lib\site-packages\diffusers\models\attention.py", line 497, in forward
hidden_states = self._attention(query, key, value)
File "C:\Users\DieselMEDIA\SuperStableDiffusion\stable-diffusion-webui\venv\lib\site-packages\diffusers\models\attention.py", line 515, in _attention
attention_probs = attention_scores.softmax(dim=-1)
RuntimeError: CUDA out of memory. Tried to allocate 2.00 GiB (GPU 0; 12.00 GiB total capacity; 6.79 GiB already allocated; 0 bytes free; 9.74 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
CLEANUP:
Allocated: 3.9GB
Reserved: 9.7GB

Cleanup Complete.
Allocated: 3.2GB
Reserved: 3.8GB

Steps: 0%| | 0/4300 [00:15<?, ?it/s]
Training completed, reloading SD Model.
Allocated: 0.0GB
Reserved: 3.8GB

Memory output: {'VRAM cleared.': '0.0/0.0GB', 'Training completed, reloading SD Model.': '0.0/3.8GB'}
Re-applying optimizations...
Returning result: Training finished. Total lifetime steps: 3

@d8ahazard
Copy link
Owner

Sorry for the delay in response. So, it looks like you're using a 12GB GPU, which should work on windows if you set all of the optimizations.

I've added a wizard button to the "advanced" settings that will attempt to set the proper params for your machine so you can train. Give it a go, see if that helps.

I've linked this to the main Optimization thread, happy to continue the discussion there to help get you training.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants