Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dreambooth #2002

Closed
wants to merge 58 commits into from
Closed

Dreambooth #2002

wants to merge 58 commits into from

Conversation

d8ahazard
Copy link
Collaborator

Add basic UI implementation and stuff to unpack a selected checkpoint and then use it with Dreambooth.

There's also code to re-merge the output with said selected checkpoint, but I can't currently test with my potato because I don't know how to incorporate the necessary "accelerate launch" command to make it only run on GPU.

@AUTOMATIC1111 - Need help with this bit. It's useless to me if I can't get the accelerate launch stuff to work so I can force it just to my GPU, unless you know some other magick to make it work with 8GB.

@d8ahazard
Copy link
Collaborator Author

Also, @AUTOMATIC1111, if you could check your reddit, I sent you a PM.

@bmaltais
Copy link

bmaltais commented Oct 9, 2022

Naive question… but what does this PR allow users to do? Have you found a way to separate the Dreambooth “changes” and apply them on top of other CKPT ?

or is this to create dreambooth models via webui?

@d8ahazard
Copy link
Collaborator Author

Naive question… but what does this PR allow users to do? Have you found a way to separate the Dreambooth “changes” and apply them on top of other CKPT ?

or is this to create dreambooth models via webui?

It should do all the things. First, you point it at an existing checkpoint, even a custom one.

Then, It'll extract the diffusion models for that checkpoint and set up a working directory for training.

Once set up, you tell it where your training images are, your input prompt, and your "classification" prompt. Set the number of training steps, and let it rip.

I don't have the progress bar, "intermediary images", or "save a checkpoint every N steps" bits added yet, but in theory, it should work to train. I can get it to throw an OOM error, which is what I'd expect since I'm not forcing it to run on my CPU yet.

BUT, once done, it should then take the Dreambooth generated files and merge them into the selected checkpoint, saving it along side the others.

Since I'm getting OOM errors and can't use it yet, I can't verify I have the "build a new checkpoint" parts right, but if there is a bug/mistake there, it should be fairly trivial to fix.

@bmaltais
Copy link

bmaltais commented Oct 9, 2022

Is this supporting the 12gb VRAM GPUs or restricted to 3090 and better? I have a 12GB GPU... this is why I am asking.

UPDATE:

I answered my own question... A 3060 with 12GB won't cut it:

image

But this look like a nice PR for those with a 3090.

@mcd1992
Copy link

mcd1992 commented Oct 9, 2022

I'll try and see if I can get it working with a 3090 and some of the missing features in. Will edit this comment just in-case I don't get anywhere before Tues.

Notes for myself:

  • save_data_every can be 0 (disabled)
  • wrap_gradio_call func can return None (at least it is for me, will need to play in an ipython embed a bit)
  File "/home/unknown/Development/stable-diffusion-webui/modules/dreambooth/dreambooth.py", line 386, in train
    if not global_step % self.save_data_every:
ZeroDivisionError: integer division or modulo by zero

Traceback (most recent call last):
  File "/home/unknown/Development/stable-diffusion-webui/modules/ui.py", line 188, in f
    res = list(func(*args, **kwargs))
TypeError: 'NoneType' object is not iterable

@Thomas-MMJ
Copy link

Thomas-MMJ commented Oct 9, 2022

To work on a 3090 with 12GB you need to use deepspeed.

accelerate launch --use_deepspeed --zero_stage=2 --gradient_accumulation_steps=1 --offload_param_device=cpu --offload_optimizer_device=cpu train_dreambooth.py \
  --pretrained_model_name_or_path=$MODEL_NAME --use_auth_token \
  --instance_data_dir=$INSTANCE_DIR \
  --class_data_dir=$CLASS_DIR \
  --output_dir=$OUTPUT_DIR \
  --with_prior_preservation --prior_loss_weight=1.0 \
  --instance_prompt="a photo of sks dog" \
  --class_prompt="a photo of dog" \
  --resolution=512 \
  --train_batch_size=1 \
  --gradient_accumulation_steps=1 \
  --gradient_checkpointing \
  --learning_rate=5e-6 \
  --lr_scheduler="constant" \
  --lr_warmup_steps=0 \
  --num_class_images=200 \
  --max_train_steps=800 \
  --sample_batch_size=2 \
  --mixed_precision=fp16

this is from pinkred's comment on the diffusers patch - huggingface/diffusers#735

Note that TTL had to also do explicit casts rather than relying on auto to ensure that everything stayed 16bit.

@mcd1992
Copy link

mcd1992 commented Oct 9, 2022

In hindsight it might be better to just have diffusers as an optional dependency in repositories/ like xformers is; Instead of redistributing 2 py files from it in repo.

@d8ahazard
Copy link
Collaborator Author

In hindsight it might be better to just have diffusers as an optional dependency in repositories/ like xformers is; Instead of redistributing 2 py files from it in repo.

I'm only using one file from the HD repo, and it's pretty heavily modified, so not really re-distributed...

@devilismyfriend
Copy link

devilismyfriend commented Oct 10, 2022

Yeah sorry but this doesn't work for a bunch of people, exactly why is uncertain but it's OOM on my 3080 10GB with 64GB of RAM. (The TTL implementation is supposed to run at 8gb per his account)

@bmaltais
Copy link

bmaltais commented Oct 10, 2022 via email

@d8ahazard
Copy link
Collaborator Author

Yeah sorry but this doesn't work for a bunch of people, exactly why is uncertain but it's OOM on my 3080 10GB with 64GB of RAM. (The TTL implementation is supposed to run at 8gb per his account)

Weird, it's like I almost mention in my initial commit that I currently cant get this version to run due to OOM errors, which is specifically because I'm asking for help with the launch accelerate commands needed to make it run under 8GB. :P

@d8ahazard
Copy link
Collaborator Author

Perhaps you could integrate those changes… allow to run on 8gb apparently: https://www.reddit.com/r/StableDiffusion/comments/xzbc2h/guide_for_dreambooth_with_8gb_vram_under_windows/?utm_source=share&utm_medium=ios_app&utm_name=iossmf

On Mon, Oct 10, 2022 at 4:30 AM devilismyfriend @.> wrote: Yeah sorry but this doesn't work for a bunch of people, exactly why is uncertain but it's OOM on my 3080 10GB with 64GB of RAM. — Reply to this email directly, view it on GitHub <#2002 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABZA34QNSCOFBHOUR4FKDMDWCPH3NANCNFSM6AAAAAARAMZOXE . You are receiving this because you commented.Message ID: @.>

"accelerate config" is literally how I have the stand-alone version running, on windows, on 8GB right now. It's why I chose the base diffusers repo, and it's what I'm asking @AUTOMATIC1111 or anybody else for a bit of help with. ;)

image

@bmaltais
Copy link

bmaltais commented Oct 10, 2022

I will try the manual method today and then poke at things to see if I can figure something if I can get thing running manually. I have close to zero Python experience so not much hope but who knows.

@bmaltais
Copy link

OK... I see what you are talking about.. the issue is that the activation can't be done using the python script... and this is what is causing the issue. Just for a test... what if activation was done before starting webui? Would that solve this issue?

@d8ahazard
Copy link
Collaborator Author

OK... I see what you are talking about.. the issue is that the activation can't be done using the python script... and this is what is causing the issue. Just for a test... what if activation was done before starting webui? Would that solve this issue?

What do you mean by "activation"? It would either be up to the user to run "accelerate config" to set the required params (or maybe do it with a script, launch.py, etc.). The bit I need to understand is how I can run "accelerate launch" from within the UI, versus from the command-line as it's documented. I think it's possible, but I haven't tested yet.

@bmaltais
Copy link

I see. On my side I am stuck trying to make it work manually... until I can do that even the UI won't work. I have done all the installation and config but when I try to run things I get:

[2022-10-10 09:51:23,004] [INFO] [utils.py:827:see_memory_usage] Before initializing optimizer states
[2022-10-10 09:51:23,005] [INFO] [utils.py:828:see_memory_usage] MA 1.66 GB         Max_MA 1.66 GB         CA 3.27 GB         Max_CA 3 GB
[2022-10-10 09:51:23,005] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory:  used = 7.72 GB, percent = 49.5%
ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: -9) local_rank: 0 (pid: 9298) of binary: /home/bernard/anaconda3/envs/diffusers/bin/python

Add notebook launcher for training start, use the --medvram and --lowvram flags to hijack the launcher's torch_cuda_available method and pass "False" if set to force training only on CPU.
@d8ahazard
Copy link
Collaborator Author

I see. On my side I am stuck trying to make it work manually... until I can do that even the UI won't work. I have done all the installation and config but when I try to run things I get:

[2022-10-10 09:51:23,004] [INFO] [utils.py:827:see_memory_usage] Before initializing optimizer states
[2022-10-10 09:51:23,005] [INFO] [utils.py:828:see_memory_usage] MA 1.66 GB         Max_MA 1.66 GB         CA 3.27 GB         Max_CA 3 GB
[2022-10-10 09:51:23,005] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory:  used = 7.72 GB, percent = 49.5%
ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: -9) local_rank: 0 (pid: 9298) of binary: /home/bernard/anaconda3/envs/diffusers/bin/python

Give the latest commit I just made a try. Be sure to set --medvram in the COMMAND_LINE_ARGS of your launch script, or set it however. I wired in the "notebook_launcher" class from Accelerate, and then forced it to run only on CPU if medvram or lowvram is set.

Haven't verified that it trains myself, yet...but my indicator of early success has been how long the "caching latents" portion takes. If it goes fast, it's gonna OOM. If it's running slow (as it is now), then training will run after that call.

@d8ahazard
Copy link
Collaborator Author

The good news:

I can make it work on an 8GB GPU now, from the UI.

image

The bad news: It's abysmally slow, seemingly more so than when I run it manually. I suspect there are other things that can be done to make it faster...but I'll need to futz with it more.

Also, still no progress bar, no, way to interrupt/resume training, and no preview in the UI. But, hey, it will run. Progress!

@bmaltais
Copy link

The latest version fail as soon as I hit train with:

    return torch.cuda.is_available()
  [Previous line repeated 983 more times]
RecursionError: maximum recursion depth exceeded

@d8ahazard
Copy link
Collaborator Author

The latest version fail as soon as I hit train with:

    return torch.cuda.is_available()
  [Previous line repeated 983 more times]
RecursionError: maximum recursion depth exceeded

Yeah, my bad. Dumb coding error. Fixed already, do another pull.

@bmaltais
Copy link

Hummm... when using --medvram I get:

NVIDIA GeForce RTX 3060 with CUDA capability sm_86 is not compatible with the current PyTorch installation.
The current PyTorch install supports CUDA capabilities sm_37 sm_50 sm_60 sm_70.
If you want to use the NVIDIA GeForce RTX 3060 GPU with PyTorch, please check the instructions at https://pytorch.org/get-started/locally/

I guess this is not supposed to be like that...

@bmaltais
Copy link

Hummm... to pass the --medvram I need to use python launch --medvram... and this is what gives the cuda error.

I usually just run bash webui.sh to launch webui but that one does not pass parameters...

@Baquara
Copy link

Baquara commented Oct 27, 2022

This is happening on a fresh install of this PR:

image


Arguments: ('/home/privateserver/Coding/sd/test/', '/home/privateserver/Coding/sd/test2/', False, False, False) {}
Traceback (most recent call last):
  File "/home/privateserver/sd-automatic111-dreambooth/modules/ui.py", line 219, in f
    res = list(func(*args, **kwargs))
  File "/home/privateserver/sd-automatic111-dreambooth/webui.py", line 63, in f
    res = func(*args, **kwargs)
  File "/home/privateserver/sd-automatic111-dreambooth/modules/textual_inversion/ui.py", line 19, in preprocess
    modules.textual_inversion.preprocess.preprocess(*args)
TypeError: preprocess() missing 3 required positional arguments: 'process_flip', 'process_split', and 'process_caption'

Also this happens when I try to train:

image

So apparently the interface fails to get argument inputs


Error completing request
Arguments: ('test', '*', '*', 5e-06, '/home/privateserver/Coding/sd/test/', '', 1000, 500, 500, 0, False, True, True, False, False, False, False, 'no', 'constant', 512, 1, 1, 0.9, 0.999, 0.01, 1e-08, 1, 1, -1, 1, 0) {}
Traceback (most recent call last):
  File "/home/privateserver/sd-automatic111-dreambooth/modules/ui.py", line 219, in f
    res = list(func(*args, **kwargs))
  File "/home/privateserver/sd-automatic111-dreambooth/webui.py", line 63, in f
    res = func(*args, **kwargs)
  File "/home/privateserver/sd-automatic111-dreambooth/modules/dreambooth/dreambooth.py", line 664, in start_training
    out_dir, trained_steps = dream.train()
  File "/home/privateserver/sd-automatic111-dreambooth/modules/dreambooth/dreambooth.py", line 219, in train
    text_encoder = CLIPTextModel.from_pretrained(os.path.join(ex_model_path, "text_encoder"))
  File "/home/privateserver/.local/lib/python3.10/site-packages/transformers/modeling_utils.py", line 2088, in from_pretrained
    loaded_state_dict_keys = [k for k in state_dict.keys()]
AttributeError: 'NoneType' object has no attribute 'keys'

@0xItx
Copy link

0xItx commented Oct 28, 2022

With the last commit, using the default settings & no command-line arguments on rtx3090:

Exception training db: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! (when checking argument for argument mat1 in method wrapper__index_select)
Traceback (most recent call last):
  File "C:\StableDiffusion\stable-diffusion-webui\modules\dreambooth\dreambooth.py", line 436, in train
    image = pipeline(prompt, num_inference_steps=50, guidance_scale=7.5).images[0]
  File "C:\StableDiffusion\stable-diffusion-webui\venv\lib\site-packages\torch\autograd\grad_mode.py", line 27, in decorate_context
    return func(*args, **kwargs)
  File "C:\StableDiffusion\stable-diffusion-webui\venv\lib\site-packages\diffusers\pipelines\stable_diffusion\pipeline_stable_diffusion.py", line 326, in __call__
    noise_pred = self.unet(latent_model_input, t, encoder_hidden_states=text_embeddings).sample
  File "C:\StableDiffusion\stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "C:\StableDiffusion\stable-diffusion-webui\venv\lib\site-packages\diffusers\models\unet_2d_condition.py", line 287, in forward
    emb = self.time_embedding(t_emb)
  File "C:\StableDiffusion\stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "C:\StableDiffusion\stable-diffusion-webui\venv\lib\site-packages\diffusers\models\embeddings.py", line 75, in forward
    sample = self.linear_1(sample)
  File "C:\StableDiffusion\stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "C:\StableDiffusion\stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\linear.py", line 114, in forward
    return F.linear(input, self.weight, self.bias)

When I run with --xformers I get the same exception, except that wrapper__index_select is swapped with wrapper_addmm

I have plenty of free VRAM:

 CPU: False Adam: False, Prec: no, Prior: False, Grad: False, TextTr: True
 Allocated: 4.0GB
 Reserved: 4.1GB

@0xItx
Copy link

0xItx commented Oct 28, 2022

These commits solved it. Thanks!
Now I'm able to generate a ckpt without problems (and quite quickly too, even though I haven't tested 8bit adam or any of the other optimizations yet).

Windows 10 (Native, not WSL) & 3090, training takes around 15-22gb of VRAM depending on settings on how big my training dataset is.

v. 3.8 of Gradio lets us use a dictionary of keys/blocks as an input, versus one big list that has to be constantly updated, meaning we can use **kwargs for functions. :D
Add option to load previous training params after first-time training (resume).
Clean up UI, add tabbed interface, Move stuff around so it's easier to work with.
Add cancellation support for the class image generation phase, better UI messages.
Fix up image generation for UI, hook class generation to UI.
Update/cleanup requirements.
os.makedirs(self.class_data_dir)

self.logging_dir = os.path.join(self.output_dir, "logging")
self.pretrained_model_path = os.path.join(model_dir, "stable-diffusion-v1-5")
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does every model require its own copy of stable-diffusion-v1-5? Can it be downloaded just once to models/dreambooth instead?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Technically, no model requires a copy. The files are extracted from the target checkpoint when you create a new dreambooth model. They will reside on disk until training is completed and the datafolder for that model is deleted manually.

The only file that actually gets downloaded is the config file needed to load the model.

try:
print(f"Saving to {self.output_dir}")
pipeline.save_pretrained(self.output_dir)
save_checkpoint(self.total_steps + global_step, self.src,
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO it's a good idea to allow save_checkpoint to receive a pipeline (instead of a path on disk) to save on I/O & RAM

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. I can implement this, just need to decide how to handle the "save_checkpoint" call that runs in "start_training".

It's really not necessary if I update the logic a bit at the "check for save" part to ensure we're saving on the last iteration of training. Maybe I'm already doing this...

@fredconex
Copy link

I've installed webui on WSL / Debian, I'm able to use dreambooth with shivam repo, but when I use the webui I keep getting errors, I tried all 3 types of scheduler

Error completing request
Arguments: ('Fred2', '', '', 5e-06, '/home/fred/github/diffusers/examples/dreambooth/training', '', 1000, 500, 500, 0, False, True, False, False, True, False, 'fp16', 'ddim', 512, 1, 1, 0.9, 0.999, 0.01, 1e-08, 1, 1, 1, -1, 1, 0) {}
Traceback (most recent call last):
File "/home/fred/stable-diffusion-webui/modules/ui.py", line 221, in f
res = list(func(*args, **kwargs))
File "/home/fred/stable-diffusion-webui/webui.py", line 63, in f
res = func(*args, **kwargs)
File "/home/fred/stable-diffusion-webui/modules/dreambooth/dreambooth.py", line 639, in start_training
out_dir, trained_steps = dream.train()
File "/home/fred/stable-diffusion-webui/modules/dreambooth/dreambooth.py", line 345, in train
lr_scheduler = get_scheduler(
File "/home/fred/stable-diffusion-webui/venv/lib/python3.9/site-packages/diffusers/optimization.py", line 259, in get_scheduler
name = SchedulerType(name)
File "/home/fred/anaconda3/lib/python3.9/enum.py", line 384, in call
return cls.new(cls, value)
File "/home/fred/anaconda3/lib/python3.9/enum.py", line 702, in new
raise ve_exc
ValueError: 'ddim' is not a valid SchedulerType

@iznanka
Copy link

iznanka commented Oct 29, 2022

Train->advanced->sheduler
select which one you want from the dropdown list

Pass a pipeline to "save_checkpoint", versus making one twice.
Update logic for saving preview/checkpoints.
Cleanup extraneous print messages.
Move paths for /logging and main config (may break previous trainings, sorry)
Add UI Updates/status when creating new DB Model.
@d8ahazard
Copy link
Collaborator Author

Any implement on clip layer skip just like the webui does?

I'm not sure what you mean?

@iznanka
Copy link

iznanka commented Oct 30, 2022

Hi!
Is it possible to make the stable-diffusion-v1-5 folder common to all models? It takes a lot of space to store it in each model +5 GB :(

@Centurion-Rome
Copy link

Centurion-Rome commented Oct 30, 2022

Possible memory / storage saving through dehydrated models?
#3932

@d8ahazard
Copy link
Collaborator Author

Hi!
Is it possible to make the stable-diffusion-v1-5 folder common to all models? It takes a lot of space to store it in each model +5 GB :(

I'll have to review the code, but I highly doubt it can be removed or shared.

When you create a "new" DreamBooth model, it's taking data from the checkpoint you selected and extracting it into "diffusers" format. This is what lives in the /Stable-Diffusion_v1.5 folder. It's not always the same checkpoint data, I'm just using the same folder name.

I say I'll have to review the code because it might be possible to delete this after saving the first bit of training data - but I need to review the method used to convert the data back to .ckpt format and ensure it doesn't need the original folder for anything. Which, I think it does at the moment - but I also found a new method that doesn't rely on this folder, so I could potentially do away with it.

On the flip side - once you've trained a model, you can delete the folder in models/dreambooth/MODEL NAME. It's just there in case you want to resume training a model.

@d8ahazard
Copy link
Collaborator Author

Possible memory / storage saving through dehydrated models? #3932

IDK, ask @bmaltais, he wrote it. :P

Fix saving checkpoint data so ALL the saved checkpoint data is encoded, not just the unet?

Ditch the somewhat hacky conversion script in favor of the script directly from huggingface.

Don't create or use a "stable-diffusion-v-1-5" folder, just extract to /working and work from there.
Config reload from UI was broken because of additional "name" value.
Re-arrange UI (again).
Save VAE/text encoder when saving model.
@d8ahazard
Copy link
Collaborator Author

Closing, opening new PR to squash commits and make it clean.

@d8ahazard d8ahazard closed this Oct 30, 2022
DrakeRichards pushed a commit to DrakeRichards/stable-diffusion-webui that referenced this pull request Aug 18, 2023
…ocm_installer_for_navi

Improved ROCm installer for Navi 3x and ROCm 5.5+ (and experimental Navi 2x support)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.