DreamBooth Stable Diffusion training now possible in 10 GB VRAM, and it runs about 2 times faster. #35

ShivamShrirao · 2022-09-26T23:15:52Z

Hey, So I managed to run Stable Diffusion dreambooth training in just 17.7GB GPU usage by replacing the attention with memory efficient flash attention from xformers. Along with using way less memory, it also runs 2 times faster. So it's possible to train SD in 24GB GPUs now. Tested on Nvidia A10G, took 15-20 mins to train. I hope it's helpful.

Code in my fork: https://github.com/ShivamShrirao/diffusers/blob/main/examples/dreambooth/

Can even train on batch size of 2.

With some more tweaks it might be possible to train even on 16 GB gpus.

And it works, Outputs: Me in Fortnite

huggingface/diffusers#554 (comment)

TemporalLabsLLC-SOL · 2022-09-27T07:20:03Z

Very cool. Doing what I can for 16gb too.

TemporalLabsLLC-SOL · 2022-09-27T10:28:18Z

I'm running into issues with it finding the gpus I think. 4xA10G. I'll post code tomorrow.

ShivamShrirao · 2022-09-27T13:32:43Z

Wow, Using the 8bit adam optimizer from bitsandbytes along with xformers reduces the memory usage to 12.5 GB.
Colab: https://colab.research.google.com/github/ShivamShrirao/diffusers/blob/main/examples/dreambooth/DreamBooth_Stable_Diffusion.ipynb

Code: https://github.com/ShivamShrirao/diffusers/blob/main/examples/dreambooth/

Daniel-Kelvich · 2022-09-27T13:34:03Z

There is no such file.
404 Client Error: Entry Not Found for url: https://huggingface.co/CompVis/stable-diffusion-v1-4/resolve/main/config.json

Edit: Issue resolved.

Mistborn-First-Era · 2022-09-27T15:37:18Z

Do you have a donation link? I don't have much, but you are doing great work.

ShivamShrirao · 2022-09-27T18:11:41Z

Do you have a donation link? I don't have much, but you are doing great work.

Hey, Thanks. No donation link haha. Good to hear you liked it. It has been quite fun to do for me.

pdjohntony · 2022-09-27T18:19:19Z

@ShivamShrirao I've been trying to run your notebook on Runpod with Pytorch and an A5000 but I'm getting an error during pip install "Building wheel for xformers (setup.py) ... error".
Training starts with a bitsandbytes bug report but runs and eventually after 20 min of training it crashes.

I'd also love to donate if I can get this working.

pdjohntony · 2022-09-27T20:30:38Z

There is no such file. 404 Client Error: Entry Not Found for url: https://huggingface.co/CompVis/stable-diffusion-v1-4/resolve/main/config.json

Edit: Issue resolved.

@Daniel-Kelvich How did you fix this?

ShivamShrirao · 2022-09-27T20:31:49Z

@pdjohntony What error are you facing ? If 404, it may be due to not being authenticated with huggingface cli.

pdjohntony · 2022-09-27T20:43:58Z

@ShivamShrirao I managed to get your dreambooth example working but its been running for 2 hours now on an A5000.

Since thats taking so long, I spun up another instance on vast with 2 A5000's but now I'm getting the 404. It shouldn't be an auth issue with huggingface as a logged in on the CLI and it appeared to download the model for a while before getting this 404 error.

The following values were not passed to `accelerate launch` and had defaults used instead:
        `--num_cpu_threads_per_process` was set to `24` to improve out-of-box performance
To avoid this warning pass in values for each of the problematic parameters or run `accelerate config`.
Traceback (most recent call last):
  File "/opt/conda/envs/ldm/lib/python3.8/site-packages/transformers/configuration_utils.py", line 596, in _get_config_dict
    resolved_config_file = cached_path(
  File "/opt/conda/envs/ldm/lib/python3.8/site-packages/transformers/utils/hub.py", line 282, in cached_path
    output_path = get_from_cache(
  File "/opt/conda/envs/ldm/lib/python3.8/site-packages/transformers/utils/hub.py", line 486, in get_from_cache
    _raise_for_status(r)
  File "/opt/conda/envs/ldm/lib/python3.8/site-packages/transformers/utils/hub.py", line 409, in _raise_for_status
    raise EntryNotFoundError(f"404 Client Error: Entry Not Found for url: {request.url}")
transformers.utils.hub.EntryNotFoundError: 404 Client Error: Entry Not Found for url: https://huggingface.co/CompVis/stable-diffusion-v1-4/resolve/main/config.json

roar-emaus · 2022-09-27T21:06:49Z

Great work! I managed to run it in a google colab. I was just wondering, how do I get checkpoint files that I can use later on from the model files that are stored?

I could only find the
feature_extractor logs model_index.json safety_checker scheduler text_encoder tokenizer unet vae folders/files that were stored in the --output_dir=$OUTPUT_DIR after it was done training.

ShivamShrirao · 2022-09-27T21:08:22Z

@roar-emaus These are the diffuser version of weights. I have added an inference example in colab on how to use them in diffusers. For others you will need to convert them.

roar-emaus · 2022-09-27T21:24:38Z

@roar-emaus These are the diffuser version of weights. I have added an inference example in colab on how to use them in diffusers. For others you will need to convert them.

Thank you! will test it tomorrow :)

Ai-Artsca · 2022-09-27T22:14:15Z

finally got it to work, how can we use the model to reuse in a stable colab @ShivamShrirao ? I have used the inference but how do i save my model, i havent even been able to find what folder its in lol, any info on how to convert it into a ckpt?? great work !!

ShivamShrirao · 2022-09-27T22:19:30Z

finally got it to work, how can we use the model to reuse in a stable colab @ShivamShrirao ? I have used the inference but how do i save my model, i havent even been able to find what folder its in lol, any info on how to convert it into a ckpt?? great work !!

I haven't figured out yet how to convert to single ckpt to use in other repos. Currently the whole folder is your model, you can save the whole folder until someone figures it out. This needs to be reversed https://github.com/huggingface/diffusers/blob/main/scripts/convert_original_stable_diffusion_to_diffusers.py

hopibel · 2022-09-27T22:36:04Z

@ShivamShrirao If I'm reading things right, 8bit AdamW should be a drop in replacement and the modified CrossAttention class seems like it should just be able to replace the one in ldm/modules/attention.py in this repository. Sadly can't test it myself because bitsandbytes has a C extension that uses CUDA and I'm on AMD

Ai-Artsca · 2022-09-28T02:01:09Z

successfully trained one model, but my second time training im getting an error @ShivamShrirao

Steps: 2% 18/1000 [00:56<45:45, 2.80s/it, loss=0.536, lr=5e-6]Traceback (most recent call last):
File "train_dreambooth.py", line 606, in
main()
File "train_dreambooth.py", line 527, in main
for step, batch in enumerate(train_dataloader):
File "/usr/local/lib/python3.7/dist-packages/accelerate/data_loader.py", line 357, in iter
next_batch = next(dataloader_iter)
File "/usr/local/lib/python3.7/dist-packages/torch/utils/data/dataloader.py", line 681, in next
data = self._next_data()
File "/usr/local/lib/python3.7/dist-packages/torch/utils/data/dataloader.py", line 721, in _next_data
data = self._dataset_fetcher.fetch(index) # may raise StopIteration
File "/usr/local/lib/python3.7/dist-packages/torch/utils/data/_utils/fetch.py", line 49, in fetch
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/usr/local/lib/python3.7/dist-packages/torch/utils/data/_utils/fetch.py", line 49, in
data = [self.dataset[idx] for idx in possibly_batched_index]
File "train_dreambooth.py", line 268, in getitem
instance_image = Image.open(self.instance_images_path[index % self.num_instance_images])
File "/usr/local/lib/python3.7/dist-packages/PIL/Image.py", line 2843, in open
fp = builtins.open(filename, "rb")
IsADirectoryError: [Errno 21] Is a directory: '/content/data/sks/.ipynb_checkpoints'
Steps: 2% 18/1000 [00:56<51:30, 3.15s/it, loss=0.536, lr=5e-6]
Traceback (most recent call last):
File "/usr/local/bin/accelerate", line 8, in
sys.exit(main())
File "/usr/local/lib/python3.7/dist-packages/accelerate/commands/accelerate_cli.py", line 43, in main
args.func(args)
File "/usr/local/lib/python3.7/dist-packages/accelerate/commands/launch.py", line 837, in launch_command
simple_launcher(args)
File "/usr/local/lib/python3.7/dist-packages/accelerate/commands/launch.py", line 354, in simple_launcher
raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)
subprocess.CalledProcessError: Command '['/usr/bin/python3', 'train_dreambooth.py', '--pretrained_model_name_or_path=CompVis/stable-diffusion-v1-4', '--use_auth_token', '--instance_data_dir=/content/data/sks', '--class_data_dir=/content/data/gfx', '--output_dir=/content/models/sks', '--with_prior_preservation', '--instance_prompt=photo of sks gfx', '--class_prompt=photo of a gfx', '--resolution=512', '--use_8bit_adam', '--train_batch_size=1', '--gradient_accumulation_steps=1', '--learning_rate=5e-6', '--lr_scheduler=constant', '--lr_warmup_steps=0', '--num_class_images=200', '--max_train_steps=1000']' returned non-zero exit status 1.

TemporalLabsLLC-SOL · 2022-09-28T02:34:47Z

Very nice progress! Digging in more now

Daniel-Kelvich · 2022-09-28T06:30:51Z

@pdjohntony try to update transformers library pip install -U transformers

ClashSAN · 2022-09-28T06:40:02Z

@ShivamShrirao I'm assuming you mean only the items in the imv folder make up the ckpt file, I deleted my colab and only saved those items to the google drive

binarymind · 2022-09-28T08:03:37Z

@ShivamShrirao

in the collab

  --instance_prompt="photo of imv{CLASS_NAME}" \
  --class_prompt="photo of a {CLASS_NAME}" \

are no f strings, they should be right ?

cheers

ShivamShrirao · 2022-09-28T08:05:32Z

@binarymind Not required here cause it executes as a shell command.

binarymind · 2022-09-28T08:43:15Z

ok thanks !

during this cell I got the following result

The following values were not passed to `accelerate launch` and had defaults used instead:
	`--num_processes` was set to a value of `1`
	`--num_machines` was set to a value of `1`
	`--mixed_precision` was set to a value of `'no'`
	`--num_cpu_threads_per_process` was set to `32` to improve out-of-box performance
To avoid this warning pass in values for each of the problematic parameters or run `accelerate config`.
/opt/conda/lib/python3.7/site-packages/accelerate/accelerator.py:179: UserWarning: `log_with=tensorboard` was passed but no supported trackers are currently installed.
  warnings.warn(f"`log_with={log_with}` was passed but no supported trackers are currently installed.")
Fetching 16 files: 100%|█████████████████████| 16/16 [00:00<00:00, 13678.94it/s]
Generating class images:   0%|                           | 0/25 [00:00<?, ?it/s]FATAL: this function is for sm80, but was built for sm750
FATAL: this function is for sm80, but was built for sm750

my nvidia-smi is the following

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 470.94       Driver Version: 470.94       CUDA Version: 11.4     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA RTX A6000    On   | 00000000:0F:00.0 Off |                  Off |
| 30%   27C    P8    26W / 300W |      1MiB / 48685MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

I tried also to do the

%pip install git+https://github.com/facebookresearch/xformers@1d31a3a#egg=xformers some cells above as it was not working.
currently stucked there

binarymind · 2022-09-28T08:53:56Z

Lol I fixed my problem by removing the f strings I added.... sorry

edit: ah nope was not that, launched again the notebook on a new repo and the problem appear again, looking at it

TheChapster · 2022-09-28T11:35:57Z

I'm hoping for a (fingers crossed not too distant) future version of this that can run on requirements of a 3080. Will put it into reach of many more people including myself. Keep up the great work!!

JoeMcGuire · 2022-09-28T11:47:20Z

I'm not having any success. Trying to use V100 on colab.

Generating class images:   0% 0/50 [00:06<?, ?it/s]
Traceback (most recent call last):
  File "train_dreambooth.py", line 606, in <module>
    main()
  File "train_dreambooth.py", line 362, in main
    images = pipeline(example["prompt"]).images
  File "/usr/local/lib/python3.7/dist-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
    return func(*args, **kwargs)
  File "/usr/local/lib/python3.7/dist-packages/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion.py", line 259, in __call__
    noise_pred = self.unet(latent_model_input, t, encoder_hidden_states=text_embeddings).sample
  File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "/usr/local/lib/python3.7/dist-packages/diffusers/models/unet_2d_condition.py", line 254, in forward
    encoder_hidden_states=encoder_hidden_states,
  File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "/usr/local/lib/python3.7/dist-packages/diffusers/models/unet_blocks.py", line 565, in forward
    hidden_states = attn(hidden_states, context=encoder_hidden_states)
  File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "/usr/local/lib/python3.7/dist-packages/diffusers/models/attention.py", line 155, in forward
    hidden_states = block(hidden_states, context=context)
  File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "/usr/local/lib/python3.7/dist-packages/diffusers/models/attention.py", line 204, in forward
    hidden_states = self.attn1(self.norm1(hidden_states)) + hidden_states
  File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "/usr/local/lib/python3.7/dist-packages/diffusers/models/attention.py", line 288, in forward
    hidden_states = xformers.ops.memory_efficient_attention(query, key, value)
  File "/usr/local/lib/python3.7/dist-packages/xformers/ops.py", line 575, in memory_efficient_attention
    query=query, key=key, value=value, attn_bias=attn_bias, p=p
  File "/usr/local/lib/python3.7/dist-packages/xformers/ops.py", line 196, in forward_no_grad
    causal=isinstance(attn_bias, LowerTriangularMask),
  File "/usr/local/lib/python3.7/dist-packages/torch/_ops.py", line 143, in __call__
    return self._op(*args, **kwargs or {})
RuntimeError: CUDA error: no kernel image is available for execution on the device
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Traceback (most recent call last):
  File "/usr/local/bin/accelerate", line 8, in <module>
    sys.exit(main())
  File "/usr/local/lib/python3.7/dist-packages/accelerate/commands/accelerate_cli.py", line 43, in main
    args.func(args)
  File "/usr/local/lib/python3.7/dist-packages/accelerate/commands/launch.py", line 837, in launch_command
    simple_launcher(args)
  File "/usr/local/lib/python3.7/dist-packages/accelerate/commands/launch.py", line 354, in simple_launcher
    raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)
subprocess.CalledProcessError: Command '['/usr/bin/python3', 'train_dreambooth.py', '--pretrained_model_name_or_path=CompVis/stable-diffusion-v1-4', '--use_auth_token', '--instance_data_dir=/content/data/sks', '--class_data_dir=/content/data/dog', '--output_dir=/content/models/sks', '--with_prior_preservation', '--instance_prompt=photo of sks dog', '--class_prompt=photo of a dog', '--resolution=512', '--use_8bit_adam', '--train_batch_size=1', '--gradient_accumulation_steps=1', '--learning_rate=5e-6', '--lr_scheduler=constant', '--lr_warmup_steps=0', '--num_class_images=200', '--max_train_steps=600']' returned non-zero exit status 1

ShivamShrirao · 2022-09-28T11:50:14Z

@JoeMcGuire you will need to compile the xformers, current wheels only support T4 GPU.

1blackbar · 2022-09-28T13:02:54Z

there are xformers for p100 on this colab precompiled, how to incorporate those into dreambooth ? It will cover colab pro
https://colab.research.google.com/github/TheLastBen/fast-stable-diffusion/blob/main/fast_stable_diffusion_AUTOMATIC1111.ipynb#scrollTo=a---cT2rwUQj

under installing xformers
Also how about optional googledrive cell to upload trained model + prune cell to get it to 2gb?
If some of You will compile whl for p100 please download and store it in gdrive to share

1blackbar · 2022-09-28T15:14:06Z

yeah , now its kinda not useable on webuis and most people are on webuis, huggingface love their bins also default 600 steps are pretty bad, not sure why its default ? should be more like at least 2000

Blucknote · 2022-09-28T20:09:47Z

Any chances to run on 12GB rtx 3060?
I'm getting Tried to allocate 4.00 GiB (GPU 0; 12.00 GiB total capacity; 4.81 GiB already allocated; 890.00 MiB free; 8.81 GiB reserved in total by PyTorch) error even with --use_8bit_adam flag

ShivamShrirao · 2022-09-28T20:17:38Z

@Blucknote hopefully pretty soon. I have gotten the GPU usage to 11.187 GB, but there are a few bugs due to which the model output quality isn't good right now even for higher precision. Will update once quality gets better.

TemporalLabsLLC-SOL · 2022-09-28T20:27:48Z

Can we get a link to the json or description on that?

TemporalLabsLLC-SOL · 2022-09-29T01:44:27Z

The following values were not passed to accelerate launch and had defaults used instead:
--num_cpu_threads_per_process was set to 4 to improve out-of-box performance
To avoid this warning pass in values for each of the problematic parameters or run accelerate config.
Traceback (most recent call last):
File "train_dreambooth.py", line 608, in
main()
File "train_dreambooth.py", line 394, in main
tokenizer = CLIPTokenizer.from_pretrained(
File "c:\users\urban\anaconda3\envs\ldm\lib\site-packages\transformers\tokenization_utils_base.py", line 1764, in from_pretrained
raise EnvironmentError(
OSError: Can't load tokenizer for '/CompVis/stable-diffusion-v1-4'. If you were trying to load it from 'https://huggingface.co/models', make sure you don't have a local directory with the same name. Otherwise, make sure '/CompVis/stable-diffusion-v1-4' is the correct path to a directory containing all relevant files for a CLIPTokenizer tokenizer.
Traceback (most recent call last):
File "c:\users\urban\anaconda3\envs\ldm\lib\runpy.py", line 194, in _run_module_as_main
return _run_code(code, main_globals, None,
File "c:\users\urban\anaconda3\envs\ldm\lib\runpy.py", line 87, in run_code
exec(code, run_globals)
File "C:\Users\Urban\anaconda3\envs\ldm\Scripts\accelerate.exe_main.py", line 7, in
File "c:\users\urban\anaconda3\envs\ldm\lib\site-packages\accelerate\commands\accelerate_cli.py", line 43, in main
args.func(args)
File "c:\users\urban\anaconda3\envs\ldm\lib\site-packages\accelerate\commands\launch.py", line 837, in launch_command
simple_launcher(args)
File "c:\users\urban\anaconda3\envs\ldm\lib\site-packages\accelerate\commands\launch.py", line 354, in simple_launcher
raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)

SOLVE = pip install --upgrade transformers

TemporalLabsLLC-SOL · 2022-09-29T01:45:30Z

I've tried both local directories matching and making sure there are zero that match. So close. Would appreciate any help anybody has to offer.

konimaki2022 · 2022-09-29T20:18:34Z

Hello, i have trained on an RTX 2060 with a stable consumption of 10.8GB of VRAM and at an amazing speed, between 5 and 10 minutes!

These are the details of my configuration:

torch and torchvision compiled with support for cuda 11.6
accelerate configured to use --mixed_precision with bf16
reduced size of training images with --resolution=256
with 3-5 images for instance, and 12-20 images for class, 1000 training steps.

I obtain very good results.

guumaster · 2022-09-29T20:34:59Z

@konimaki2022 can you share your notebook?

konimaki2022 · 2022-09-29T21:11:35Z

@guumaster sorry I haven't created a notebook in Google Colab yet, I run it on my local computer with Ubuntu 20.04, no cloud.

TemporalLabsLLC-SOL · 2022-09-30T03:54:21Z

@guumaster sorry I haven't created a notebook in Google Colab yet, I run it on my local computer with Ubuntu 20.04, no cloud.

I think Ubuntu is the key. Because we have to redirect Cuda drivers to invoke adam right in windows it's cause two straight days of work. Close hopefully

TemporalLabsLLC-SOL · 2022-10-01T05:18:13Z

I've learned a lot and I think a more stable and universal windows local solution is close.

ShivamShrirao · 2022-10-02T13:58:37Z

https://github.com/ShivamShrirao/diffusers/tree/main/examples/dreambooth

New version now trains in 10 GB.

TheChapster · 2022-10-02T15:15:35Z

https://github.com/ShivamShrirao/diffusers/tree/main/examples/dreambooth

New version now trains in 10 GB.

Awesome!! I assume this wont work with a 10GB GPU still, due to other apps using it. If anyone knows of a way to get it working with that, such as utilising shared memory (not worrying about a decrease in performance), that would be fantastic!! If not, I look forward to future progressions!

ShivamShrirao · 2022-10-02T15:23:54Z

@TheChapster It might work on linux where you can have no other application running on the GPU, or might need just a few modifications. I don't have a 10GB GPU to test it so can't confirm.

hopibel · 2022-10-02T16:15:00Z

https://github.com/ShivamShrirao/diffusers/tree/main/examples/dreambooth

New version now trains in 10 GB.

Can we get a row or two in the table with all optimizations on except for use_8bit_adam? The bitsandbytes library relies on a C extension to wrap some CUDA functions, so it can't be used on AMD

ShivamShrirao · 2022-10-02T16:17:59Z

@hopibel Check the last row.

hopibel · 2022-10-02T16:20:48Z

Ah, missed it somehow. Dang, looks too close to 16GB to fit

AmericanPresidentJimmyCarter · 2022-10-02T17:11:52Z

With xformers and triton in this my fork at FP16 it trains with slightly less than 14 GB... I haven't pushed the branch but it seems fine.

This is using stable-diffusion and EMA weights, not diffusers at all.

ShivamShrirao · 2022-10-03T01:31:29Z

Now you can convert diffusers weights to ckpt, thanks to https://gist.github.com/jachiam/8a5c0b607e38fcc585168b90c686eb05

I have updated it in my colab.

andreae293 · 2022-10-06T22:31:02Z

With xformers and triton in this my fork at FP16 it trains with slightly less than 14 GB... I haven't pushed the branch but it seems fine.

This is using stable-diffusion and EMA weights, not diffusers at all.

can you push it? thanks

Jarfeh · 2022-10-24T04:34:09Z

With xformers and triton in this my fork at FP16 it trains with slightly less than 14 GB... I haven't pushed the branch but it seems fine.

This is using stable-diffusion and EMA weights, not diffusers at all.

Like andrae293, I too would like to see you push this to be available :)

feffy380 · 2022-10-24T04:59:25Z

@Jarfeh This repo seems abandoned. Use ShivamShrirao's diffusers fork instead. It includes all the optimizations discussed here and some new ones

titusfx · 2022-11-09T11:10:02Z

@Jarfeh I agree with @feffy380

https://github.com/ShivamShrirao/diffusers/tree/main/examples/dreambooth

Hbhatt-merexgenAI · 2023-07-04T13:01:20Z

I tried to run the Google Colab, I have RTX 3060 12Gb but doesnt work

torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 2.25 GiB (GPU 0; 11.75 GiB total capacity; 8.06 GiB already allocated; 1.95 GiB free; 8.12 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 7772) of binary: /home/merexai-dev/miniconda3/envs/tf/bin/python
Traceback (most recent call last):

ShivamShrirao changed the title ~~DreamBooth Stable Diffusion training now possible in 24GB GPUs, and it runs about 2 times faster.~~ DreamBooth Stable Diffusion training now possible in 18GB VRAM, and it runs about 2 times faster. Sep 27, 2022

ShivamShrirao mentioned this issue Sep 27, 2022

This is gamechanging. Wow. #4

Open

ShivamShrirao changed the title ~~DreamBooth Stable Diffusion training now possible in 18GB VRAM, and it runs about 2 times faster.~~ DreamBooth Stable Diffusion training now possible in 12.5GB VRAM, and it runs about 2 times faster. Sep 27, 2022

ShivamShrirao changed the title ~~DreamBooth Stable Diffusion training now possible in 12.5GB VRAM, and it runs about 2 times faster.~~ DreamBooth Stable Diffusion training now possible in 10 GB VRAM, and it runs about 2 times faster. Oct 2, 2022

DreamBooth Stable Diffusion training now possible in 10 GB VRAM, and it runs about 2 times faster. #35

DreamBooth Stable Diffusion training now possible in 10 GB VRAM, and it runs about 2 times faster. #35

Comments

ShivamShrirao commented Sep 26, 2022 • edited Loading

Can even train on batch size of 2.

TemporalLabsLLC-SOL commented Sep 27, 2022

TemporalLabsLLC-SOL commented Sep 27, 2022

ShivamShrirao commented Sep 27, 2022 • edited Loading

Daniel-Kelvich commented Sep 27, 2022 • edited Loading

Mistborn-First-Era commented Sep 27, 2022

ShivamShrirao commented Sep 27, 2022

pdjohntony commented Sep 27, 2022 • edited Loading

pdjohntony commented Sep 27, 2022

ShivamShrirao commented Sep 27, 2022

pdjohntony commented Sep 27, 2022

roar-emaus commented Sep 27, 2022

ShivamShrirao commented Sep 27, 2022 • edited Loading

roar-emaus commented Sep 27, 2022

Ai-Artsca commented Sep 27, 2022

ShivamShrirao commented Sep 27, 2022

hopibel commented Sep 27, 2022

Ai-Artsca commented Sep 28, 2022

TemporalLabsLLC-SOL commented Sep 28, 2022

Daniel-Kelvich commented Sep 28, 2022 • edited Loading

ClashSAN commented Sep 28, 2022 • edited Loading

binarymind commented Sep 28, 2022

ShivamShrirao commented Sep 28, 2022

binarymind commented Sep 28, 2022 • edited Loading

binarymind commented Sep 28, 2022 • edited Loading

TheChapster commented Sep 28, 2022 • edited Loading

JoeMcGuire commented Sep 28, 2022

ShivamShrirao commented Sep 28, 2022

1blackbar commented Sep 28, 2022 • edited Loading

1blackbar commented Sep 28, 2022 • edited Loading

Blucknote commented Sep 28, 2022

ShivamShrirao commented Sep 28, 2022

TemporalLabsLLC-SOL commented Sep 28, 2022

TemporalLabsLLC-SOL commented Sep 29, 2022 • edited Loading

TemporalLabsLLC-SOL commented Sep 29, 2022

konimaki2022 commented Sep 29, 2022

guumaster commented Sep 29, 2022

konimaki2022 commented Sep 29, 2022

TemporalLabsLLC-SOL commented Sep 30, 2022

TemporalLabsLLC-SOL commented Oct 1, 2022

ShivamShrirao commented Oct 2, 2022

TheChapster commented Oct 2, 2022 • edited Loading

ShivamShrirao commented Oct 2, 2022

hopibel commented Oct 2, 2022

ShivamShrirao commented Oct 2, 2022

hopibel commented Oct 2, 2022

AmericanPresidentJimmyCarter commented Oct 2, 2022 • edited Loading

ShivamShrirao commented Oct 3, 2022

andreae293 commented Oct 6, 2022

Jarfeh commented Oct 24, 2022

feffy380 commented Oct 24, 2022

titusfx commented Nov 9, 2022

Hbhatt-merexgenAI commented Jul 4, 2023

ShivamShrirao commented Sep 26, 2022 •

edited

Loading

ShivamShrirao commented Sep 27, 2022 •

edited

Loading

Daniel-Kelvich commented Sep 27, 2022 •

edited

Loading

pdjohntony commented Sep 27, 2022 •

edited

Loading

ShivamShrirao commented Sep 27, 2022 •

edited

Loading

Daniel-Kelvich commented Sep 28, 2022 •

edited

Loading

ClashSAN commented Sep 28, 2022 •

edited

Loading

binarymind commented Sep 28, 2022 •

edited

Loading

binarymind commented Sep 28, 2022 •

edited

Loading

TheChapster commented Sep 28, 2022 •

edited

Loading

1blackbar commented Sep 28, 2022 •

edited

Loading

1blackbar commented Sep 28, 2022 •

edited

Loading

TemporalLabsLLC-SOL commented Sep 29, 2022 •

edited

Loading

TheChapster commented Oct 2, 2022 •

edited

Loading

AmericanPresidentJimmyCarter commented Oct 2, 2022 •

edited

Loading