Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DreamBooth Stable Diffusion training now possible in 10 GB VRAM, and it runs about 2 times faster. #35

Open
ShivamShrirao opened this issue Sep 26, 2022 · 52 comments

Comments

@ShivamShrirao
Copy link

ShivamShrirao commented Sep 26, 2022

Hey, So I managed to run Stable Diffusion dreambooth training in just 17.7GB GPU usage by replacing the attention with memory efficient flash attention from xformers. Along with using way less memory, it also runs 2 times faster. So it's possible to train SD in 24GB GPUs now. Tested on Nvidia A10G, took 15-20 mins to train. I hope it's helpful.

Code in my fork: https://github.com/ShivamShrirao/diffusers/blob/main/examples/dreambooth/

Screenshot_20220927_042425

Can even train on batch size of 2.

Screenshot_20220927_042648

With some more tweaks it might be possible to train even on 16 GB gpus.

And it works, Outputs: Me in Fortnite
Me in Fortnite

huggingface/diffusers#554 (comment)

@TemporalLabsLLC-SOL
Copy link

Very cool. Doing what I can for 16gb too.

@ShivamShrirao ShivamShrirao changed the title DreamBooth Stable Diffusion training now possible in 24GB GPUs, and it runs about 2 times faster. DreamBooth Stable Diffusion training now possible in 18GB VRAM, and it runs about 2 times faster. Sep 27, 2022
@TemporalLabsLLC-SOL
Copy link

I'm running into issues with it finding the gpus I think. 4xA10G. I'll post code tomorrow.

@ShivamShrirao
Copy link
Author

ShivamShrirao commented Sep 27, 2022

Wow, Using the 8bit adam optimizer from bitsandbytes along with xformers reduces the memory usage to 12.5 GB.
Colab: https://colab.research.google.com/github/ShivamShrirao/diffusers/blob/main/examples/dreambooth/DreamBooth_Stable_Diffusion.ipynb
Screenshot_20220927_213651
Code: https://github.com/ShivamShrirao/diffusers/blob/main/examples/dreambooth/
Screenshot_20220927_185927

@Daniel-Kelvich
Copy link

Daniel-Kelvich commented Sep 27, 2022

There is no such file.
404 Client Error: Entry Not Found for url: https://huggingface.co/CompVis/stable-diffusion-v1-4/resolve/main/config.json

Edit: Issue resolved.

@ShivamShrirao ShivamShrirao changed the title DreamBooth Stable Diffusion training now possible in 18GB VRAM, and it runs about 2 times faster. DreamBooth Stable Diffusion training now possible in 12.5GB VRAM, and it runs about 2 times faster. Sep 27, 2022
@Mistborn-First-Era
Copy link

Do you have a donation link? I don't have much, but you are doing great work.

@ShivamShrirao
Copy link
Author

Do you have a donation link? I don't have much, but you are doing great work.

Hey, Thanks. No donation link haha. Good to hear you liked it. It has been quite fun to do for me.

@pdjohntony
Copy link

pdjohntony commented Sep 27, 2022

@ShivamShrirao I've been trying to run your notebook on Runpod with Pytorch and an A5000 but I'm getting an error during pip install "Building wheel for xformers (setup.py) ... error".
Training starts with a bitsandbytes bug report but runs and eventually after 20 min of training it crashes.

I'd also love to donate if I can get this working.

@pdjohntony
Copy link

There is no such file. 404 Client Error: Entry Not Found for url: https://huggingface.co/CompVis/stable-diffusion-v1-4/resolve/main/config.json

Edit: Issue resolved.

@Daniel-Kelvich How did you fix this?

@ShivamShrirao
Copy link
Author

@pdjohntony What error are you facing ? If 404, it may be due to not being authenticated with huggingface cli.

@pdjohntony
Copy link

@ShivamShrirao I managed to get your dreambooth example working but its been running for 2 hours now on an A5000.

Since thats taking so long, I spun up another instance on vast with 2 A5000's but now I'm getting the 404. It shouldn't be an auth issue with huggingface as a logged in on the CLI and it appeared to download the model for a while before getting this 404 error.

The following values were not passed to `accelerate launch` and had defaults used instead:
        `--num_cpu_threads_per_process` was set to `24` to improve out-of-box performance
To avoid this warning pass in values for each of the problematic parameters or run `accelerate config`.
Traceback (most recent call last):
  File "/opt/conda/envs/ldm/lib/python3.8/site-packages/transformers/configuration_utils.py", line 596, in _get_config_dict
    resolved_config_file = cached_path(
  File "/opt/conda/envs/ldm/lib/python3.8/site-packages/transformers/utils/hub.py", line 282, in cached_path
    output_path = get_from_cache(
  File "/opt/conda/envs/ldm/lib/python3.8/site-packages/transformers/utils/hub.py", line 486, in get_from_cache
    _raise_for_status(r)
  File "/opt/conda/envs/ldm/lib/python3.8/site-packages/transformers/utils/hub.py", line 409, in _raise_for_status
    raise EntryNotFoundError(f"404 Client Error: Entry Not Found for url: {request.url}")
transformers.utils.hub.EntryNotFoundError: 404 Client Error: Entry Not Found for url: https://huggingface.co/CompVis/stable-diffusion-v1-4/resolve/main/config.json

@roar-emaus
Copy link

Great work! I managed to run it in a google colab. I was just wondering, how do I get checkpoint files that I can use later on from the model files that are stored?

I could only find the
feature_extractor logs model_index.json safety_checker scheduler text_encoder tokenizer unet vae folders/files that were stored in the --output_dir=$OUTPUT_DIR after it was done training.

@ShivamShrirao
Copy link
Author

ShivamShrirao commented Sep 27, 2022

@roar-emaus These are the diffuser version of weights. I have added an inference example in colab on how to use them in diffusers. For others you will need to convert them.

@roar-emaus
Copy link

@roar-emaus These are the diffuser version of weights. I have added an inference example in colab on how to use them in diffusers. For others you will need to convert them.

Thank you! will test it tomorrow :)

@Ai-Artsca
Copy link

finally got it to work, how can we use the model to reuse in a stable colab @ShivamShrirao ? I have used the inference but how do i save my model, i havent even been able to find what folder its in lol, any info on how to convert it into a ckpt?? great work !!

@ShivamShrirao
Copy link
Author

finally got it to work, how can we use the model to reuse in a stable colab @ShivamShrirao ? I have used the inference but how do i save my model, i havent even been able to find what folder its in lol, any info on how to convert it into a ckpt?? great work !!

I haven't figured out yet how to convert to single ckpt to use in other repos. Currently the whole folder is your model, you can save the whole folder until someone figures it out. This needs to be reversed https://github.com/huggingface/diffusers/blob/main/scripts/convert_original_stable_diffusion_to_diffusers.py

@hopibel
Copy link

hopibel commented Sep 27, 2022

@ShivamShrirao If I'm reading things right, 8bit AdamW should be a drop in replacement and the modified CrossAttention class seems like it should just be able to replace the one in ldm/modules/attention.py in this repository. Sadly can't test it myself because bitsandbytes has a C extension that uses CUDA and I'm on AMD

@Ai-Artsca
Copy link

successfully trained one model, but my second time training im getting an error @ShivamShrirao

Steps: 2% 18/1000 [00:56<45:45, 2.80s/it, loss=0.536, lr=5e-6]Traceback (most recent call last):
File "train_dreambooth.py", line 606, in
main()
File "train_dreambooth.py", line 527, in main
for step, batch in enumerate(train_dataloader):
File "/usr/local/lib/python3.7/dist-packages/accelerate/data_loader.py", line 357, in iter
next_batch = next(dataloader_iter)
File "/usr/local/lib/python3.7/dist-packages/torch/utils/data/dataloader.py", line 681, in next
data = self._next_data()
File "/usr/local/lib/python3.7/dist-packages/torch/utils/data/dataloader.py", line 721, in _next_data
data = self._dataset_fetcher.fetch(index) # may raise StopIteration
File "/usr/local/lib/python3.7/dist-packages/torch/utils/data/_utils/fetch.py", line 49, in fetch
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/usr/local/lib/python3.7/dist-packages/torch/utils/data/_utils/fetch.py", line 49, in
data = [self.dataset[idx] for idx in possibly_batched_index]
File "train_dreambooth.py", line 268, in getitem
instance_image = Image.open(self.instance_images_path[index % self.num_instance_images])
File "/usr/local/lib/python3.7/dist-packages/PIL/Image.py", line 2843, in open
fp = builtins.open(filename, "rb")
IsADirectoryError: [Errno 21] Is a directory: '/content/data/sks/.ipynb_checkpoints'
Steps: 2% 18/1000 [00:56<51:30, 3.15s/it, loss=0.536, lr=5e-6]
Traceback (most recent call last):
File "/usr/local/bin/accelerate", line 8, in
sys.exit(main())
File "/usr/local/lib/python3.7/dist-packages/accelerate/commands/accelerate_cli.py", line 43, in main
args.func(args)
File "/usr/local/lib/python3.7/dist-packages/accelerate/commands/launch.py", line 837, in launch_command
simple_launcher(args)
File "/usr/local/lib/python3.7/dist-packages/accelerate/commands/launch.py", line 354, in simple_launcher
raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)
subprocess.CalledProcessError: Command '['/usr/bin/python3', 'train_dreambooth.py', '--pretrained_model_name_or_path=CompVis/stable-diffusion-v1-4', '--use_auth_token', '--instance_data_dir=/content/data/sks', '--class_data_dir=/content/data/gfx', '--output_dir=/content/models/sks', '--with_prior_preservation', '--instance_prompt=photo of sks gfx', '--class_prompt=photo of a gfx', '--resolution=512', '--use_8bit_adam', '--train_batch_size=1', '--gradient_accumulation_steps=1', '--learning_rate=5e-6', '--lr_scheduler=constant', '--lr_warmup_steps=0', '--num_class_images=200', '--max_train_steps=1000']' returned non-zero exit status 1.

@TemporalLabsLLC-SOL
Copy link

Very nice progress! Digging in more now

@Daniel-Kelvich
Copy link

Daniel-Kelvich commented Sep 28, 2022

@pdjohntony try to update transformers library pip install -U transformers

@ClashSAN
Copy link

ClashSAN commented Sep 28, 2022

@ShivamShrirao I'm assuming you mean only the items in the imv folder make up the ckpt file, I deleted my colab and only saved those items to the google drive

@binarymind
Copy link

@ShivamShrirao

in the collab

  --instance_prompt="photo of imv{CLASS_NAME}" \
  --class_prompt="photo of a {CLASS_NAME}" \

are no f strings, they should be right ?

cheers

@ShivamShrirao
Copy link
Author

@binarymind Not required here cause it executes as a shell command.

@binarymind
Copy link

binarymind commented Sep 28, 2022

ok thanks !

during this cell I got the following result

The following values were not passed to `accelerate launch` and had defaults used instead:
	`--num_processes` was set to a value of `1`
	`--num_machines` was set to a value of `1`
	`--mixed_precision` was set to a value of `'no'`
	`--num_cpu_threads_per_process` was set to `32` to improve out-of-box performance
To avoid this warning pass in values for each of the problematic parameters or run `accelerate config`.
/opt/conda/lib/python3.7/site-packages/accelerate/accelerator.py:179: UserWarning: `log_with=tensorboard` was passed but no supported trackers are currently installed.
  warnings.warn(f"`log_with={log_with}` was passed but no supported trackers are currently installed.")
Fetching 16 files: 100%|█████████████████████| 16/16 [00:00<00:00, 13678.94it/s]
Generating class images:   0%|                           | 0/25 [00:00<?, ?it/s]FATAL: this function is for sm80, but was built for sm750
FATAL: this function is for sm80, but was built for sm750

my nvidia-smi is the following

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 470.94       Driver Version: 470.94       CUDA Version: 11.4     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA RTX A6000    On   | 00000000:0F:00.0 Off |                  Off |
| 30%   27C    P8    26W / 300W |      1MiB / 48685MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

I tried also to do the

%pip install git+https://github.com/facebookresearch/xformers@1d31a3a#egg=xformers some cells above as it was not working.
currently stucked there

@binarymind
Copy link

binarymind commented Sep 28, 2022

Lol I fixed my problem by removing the f strings I added.... sorry

edit: ah nope was not that, launched again the notebook on a new repo and the problem appear again, looking at it

@TheChapster
Copy link

TheChapster commented Sep 28, 2022

I'm hoping for a (fingers crossed not too distant) future version of this that can run on requirements of a 3080. Will put it into reach of many more people including myself. Keep up the great work!!

@JoeMcGuire
Copy link

I'm not having any success. Trying to use V100 on colab.

Generating class images:   0% 0/50 [00:06<?, ?it/s]
Traceback (most recent call last):
  File "train_dreambooth.py", line 606, in <module>
    main()
  File "train_dreambooth.py", line 362, in main
    images = pipeline(example["prompt"]).images
  File "/usr/local/lib/python3.7/dist-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
    return func(*args, **kwargs)
  File "/usr/local/lib/python3.7/dist-packages/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion.py", line 259, in __call__
    noise_pred = self.unet(latent_model_input, t, encoder_hidden_states=text_embeddings).sample
  File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "/usr/local/lib/python3.7/dist-packages/diffusers/models/unet_2d_condition.py", line 254, in forward
    encoder_hidden_states=encoder_hidden_states,
  File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "/usr/local/lib/python3.7/dist-packages/diffusers/models/unet_blocks.py", line 565, in forward
    hidden_states = attn(hidden_states, context=encoder_hidden_states)
  File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "/usr/local/lib/python3.7/dist-packages/diffusers/models/attention.py", line 155, in forward
    hidden_states = block(hidden_states, context=context)
  File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "/usr/local/lib/python3.7/dist-packages/diffusers/models/attention.py", line 204, in forward
    hidden_states = self.attn1(self.norm1(hidden_states)) + hidden_states
  File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "/usr/local/lib/python3.7/dist-packages/diffusers/models/attention.py", line 288, in forward
    hidden_states = xformers.ops.memory_efficient_attention(query, key, value)
  File "/usr/local/lib/python3.7/dist-packages/xformers/ops.py", line 575, in memory_efficient_attention
    query=query, key=key, value=value, attn_bias=attn_bias, p=p
  File "/usr/local/lib/python3.7/dist-packages/xformers/ops.py", line 196, in forward_no_grad
    causal=isinstance(attn_bias, LowerTriangularMask),
  File "/usr/local/lib/python3.7/dist-packages/torch/_ops.py", line 143, in __call__
    return self._op(*args, **kwargs or {})
RuntimeError: CUDA error: no kernel image is available for execution on the device
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Traceback (most recent call last):
  File "/usr/local/bin/accelerate", line 8, in <module>
    sys.exit(main())
  File "/usr/local/lib/python3.7/dist-packages/accelerate/commands/accelerate_cli.py", line 43, in main
    args.func(args)
  File "/usr/local/lib/python3.7/dist-packages/accelerate/commands/launch.py", line 837, in launch_command
    simple_launcher(args)
  File "/usr/local/lib/python3.7/dist-packages/accelerate/commands/launch.py", line 354, in simple_launcher
    raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)
subprocess.CalledProcessError: Command '['/usr/bin/python3', 'train_dreambooth.py', '--pretrained_model_name_or_path=CompVis/stable-diffusion-v1-4', '--use_auth_token', '--instance_data_dir=/content/data/sks', '--class_data_dir=/content/data/dog', '--output_dir=/content/models/sks', '--with_prior_preservation', '--instance_prompt=photo of sks dog', '--class_prompt=photo of a dog', '--resolution=512', '--use_8bit_adam', '--train_batch_size=1', '--gradient_accumulation_steps=1', '--learning_rate=5e-6', '--lr_scheduler=constant', '--lr_warmup_steps=0', '--num_class_images=200', '--max_train_steps=600']' returned non-zero exit status 1

@ShivamShrirao
Copy link
Author

@JoeMcGuire you will need to compile the xformers, current wheels only support T4 GPU.

@1blackbar
Copy link

1blackbar commented Sep 28, 2022

there are xformers for p100 on this colab precompiled, how to incorporate those into dreambooth ? It will cover colab pro
https://colab.research.google.com/github/TheLastBen/fast-stable-diffusion/blob/main/fast_stable_diffusion_AUTOMATIC1111.ipynb#scrollTo=a---cT2rwUQj

under installing xformers
Also how about optional googledrive cell to upload trained model + prune cell to get it to 2gb?
If some of You will compile whl for p100 please download and store it in gdrive to share

@1blackbar
Copy link

1blackbar commented Sep 28, 2022

yeah , now its kinda not useable on webuis and most people are on webuis, huggingface love their bins also default 600 steps are pretty bad, not sure why its default ? should be more like at least 2000

@Blucknote
Copy link

Any chances to run on 12GB rtx 3060?
I'm getting Tried to allocate 4.00 GiB (GPU 0; 12.00 GiB total capacity; 4.81 GiB already allocated; 890.00 MiB free; 8.81 GiB reserved in total by PyTorch) error even with --use_8bit_adam flag

@ShivamShrirao
Copy link
Author

@Blucknote hopefully pretty soon. I have gotten the GPU usage to 11.187 GB, but there are a few bugs due to which the model output quality isn't good right now even for higher precision. Will update once quality gets better.
Screenshot_20220929_014712

@TemporalLabsLLC-SOL
Copy link

Can we get a link to the json or description on that?

@TemporalLabsLLC-SOL
Copy link

TemporalLabsLLC-SOL commented Sep 29, 2022

The following values were not passed to accelerate launch and had defaults used instead:
--num_cpu_threads_per_process was set to 4 to improve out-of-box performance
To avoid this warning pass in values for each of the problematic parameters or run accelerate config.
Traceback (most recent call last):
File "train_dreambooth.py", line 608, in
main()
File "train_dreambooth.py", line 394, in main
tokenizer = CLIPTokenizer.from_pretrained(
File "c:\users\urban\anaconda3\envs\ldm\lib\site-packages\transformers\tokenization_utils_base.py", line 1764, in from_pretrained
raise EnvironmentError(
OSError: Can't load tokenizer for '/CompVis/stable-diffusion-v1-4'. If you were trying to load it from 'https://huggingface.co/models', make sure you don't have a local directory with the same name. Otherwise, make sure '/CompVis/stable-diffusion-v1-4' is the correct path to a directory containing all relevant files for a CLIPTokenizer tokenizer.
Traceback (most recent call last):
File "c:\users\urban\anaconda3\envs\ldm\lib\runpy.py", line 194, in _run_module_as_main
return _run_code(code, main_globals, None,
File "c:\users\urban\anaconda3\envs\ldm\lib\runpy.py", line 87, in run_code
exec(code, run_globals)
File "C:\Users\Urban\anaconda3\envs\ldm\Scripts\accelerate.exe_main
.py", line 7, in
File "c:\users\urban\anaconda3\envs\ldm\lib\site-packages\accelerate\commands\accelerate_cli.py", line 43, in main
args.func(args)
File "c:\users\urban\anaconda3\envs\ldm\lib\site-packages\accelerate\commands\launch.py", line 837, in launch_command
simple_launcher(args)
File "c:\users\urban\anaconda3\envs\ldm\lib\site-packages\accelerate\commands\launch.py", line 354, in simple_launcher
raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)

SOLVE = pip install --upgrade transformers

@TemporalLabsLLC-SOL
Copy link

I've tried both local directories matching and making sure there are zero that match. So close. Would appreciate any help anybody has to offer.

@konimaki2022
Copy link

Hello, i have trained on an RTX 2060 with a stable consumption of 10.8GB of VRAM and at an amazing speed, between 5 and 10 minutes!

These are the details of my configuration:

  • torch and torchvision compiled with support for cuda 11.6
  • accelerate configured to use --mixed_precision with bf16
  • reduced size of training images with --resolution=256
  • with 3-5 images for instance, and 12-20 images for class, 1000 training steps.

I obtain very good results.

@guumaster
Copy link

@konimaki2022 can you share your notebook?

@konimaki2022
Copy link

@guumaster sorry I haven't created a notebook in Google Colab yet, I run it on my local computer with Ubuntu 20.04, no cloud.

@TemporalLabsLLC-SOL
Copy link

@guumaster sorry I haven't created a notebook in Google Colab yet, I run it on my local computer with Ubuntu 20.04, no cloud.

I think Ubuntu is the key. Because we have to redirect Cuda drivers to invoke adam right in windows it's cause two straight days of work. Close hopefully

@TemporalLabsLLC-SOL
Copy link

I've learned a lot and I think a more stable and universal windows local solution is close.

@ShivamShrirao ShivamShrirao changed the title DreamBooth Stable Diffusion training now possible in 12.5GB VRAM, and it runs about 2 times faster. DreamBooth Stable Diffusion training now possible in 10 GB VRAM, and it runs about 2 times faster. Oct 2, 2022
@ShivamShrirao
Copy link
Author

https://github.com/ShivamShrirao/diffusers/tree/main/examples/dreambooth

New version now trains in 10 GB.
Screenshot_20221002_032336

@TheChapster
Copy link

TheChapster commented Oct 2, 2022

https://github.com/ShivamShrirao/diffusers/tree/main/examples/dreambooth

New version now trains in 10 GB.

Awesome!! I assume this wont work with a 10GB GPU still, due to other apps using it. If anyone knows of a way to get it working with that, such as utilising shared memory (not worrying about a decrease in performance), that would be fantastic!! If not, I look forward to future progressions!

@ShivamShrirao
Copy link
Author

@TheChapster It might work on linux where you can have no other application running on the GPU, or might need just a few modifications. I don't have a 10GB GPU to test it so can't confirm.

@hopibel
Copy link

hopibel commented Oct 2, 2022

https://github.com/ShivamShrirao/diffusers/tree/main/examples/dreambooth

New version now trains in 10 GB.

Can we get a row or two in the table with all optimizations on except for use_8bit_adam? The bitsandbytes library relies on a C extension to wrap some CUDA functions, so it can't be used on AMD

@ShivamShrirao
Copy link
Author

@hopibel Check the last row.

@hopibel
Copy link

hopibel commented Oct 2, 2022

Ah, missed it somehow. Dang, looks too close to 16GB to fit

@AmericanPresidentJimmyCarter
Copy link

AmericanPresidentJimmyCarter commented Oct 2, 2022

With xformers and triton in this my fork at FP16 it trains with slightly less than 14 GB... I haven't pushed the branch but it seems fine.

This is using stable-diffusion and EMA weights, not diffusers at all.

@ShivamShrirao
Copy link
Author

Now you can convert diffusers weights to ckpt, thanks to https://gist.github.com/jachiam/8a5c0b607e38fcc585168b90c686eb05

I have updated it in my colab.

@andreae293
Copy link

With xformers and triton in this my fork at FP16 it trains with slightly less than 14 GB... I haven't pushed the branch but it seems fine.

This is using stable-diffusion and EMA weights, not diffusers at all.

can you push it? thanks

@Jarfeh
Copy link

Jarfeh commented Oct 24, 2022

With xformers and triton in this my fork at FP16 it trains with slightly less than 14 GB... I haven't pushed the branch but it seems fine.

This is using stable-diffusion and EMA weights, not diffusers at all.

Like andrae293, I too would like to see you push this to be available :)

@feffy380
Copy link

@Jarfeh This repo seems abandoned. Use ShivamShrirao's diffusers fork instead. It includes all the optimizations discussed here and some new ones

@titusfx
Copy link

titusfx commented Nov 9, 2022

@Hbhatt-merexgenAI
Copy link

I tried to run the Google Colab, I have RTX 3060 12Gb but doesnt work

torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 2.25 GiB (GPU 0; 11.75 GiB total capacity; 8.06 GiB already allocated; 1.95 GiB free; 8.12 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 7772) of binary: /home/merexai-dev/miniconda3/envs/tf/bin/python
Traceback (most recent call last):

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests