-
Notifications
You must be signed in to change notification settings - Fork 507
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Extreme degradation of existing model when running the Dreambooth script #31
Comments
Hi, I have made some updates, Try out the new colab. |
Why is this issue closed? I tried it out after the updates yesterday, with various different settings, but I barely noticed a difference. Especially the overcooking of Johnny Depp keeps persisting, unless I lower the number of training steps low enouh the result for the trained subject no longer looks like me. IMO the issue is far from fixed! |
You may be using old code, I have yet to encounter overcooking in any of experiments after the update. |
The script I've been using can be found on https://github.com/jslegers/diffusers/blob/main/examples/dreambooth/DreamBooth_Stable_Diffusion.ipynb. It's basically just your Google Colab script with a few changes for my own convenience. I recloned https://github.com/jslegers/diffusers from https://github.com/ShivamShrirao/diffusers yesterday to make sure the code underneath the Colab script was identical to yours, and I added the new flag to my Colab. So everything should be up to date! I tried several different values for number of training steps (between 800 and 3000), number of class images (between 400 and 3500), weigh loss weight (between 0.1 and 1.0), learning rate (between 5e-7 and 5e-5) and adam weight decay (between 1e-3 and 1e-2), but those only produced worse results. The rest of the parameters in the script, I believe, we the ones that produced the BEST results. Here's some of the results I got from my experiments yesterday for Johnny Depp, after training the model on myself : |
@jslegers Most of your parameters seem different to me along with having different schedulers and such. Even colab has changed a lot than yours. I suggest you try out my colab directly with it's default parametrs and report the results. Do not change any parameter in this colab. Just upload your 5-10 images and specify the class. |
All parameters are default values except for the ones I listed... |
@jslegers your inference code is also different using wrong scheduler which causes the overcooking. Just try out the new colab. Number of training steps is also too high. |
What do you mean by "wrong scheduler"? Which parameter are we talking about? |
@jslegers check the inference section of my colab. You should go through all the cells of the colab to update your notebook. import torch
from torch import autocast
from diffusers import StableDiffusionPipeline, DDIMScheduler
from IPython.display import display
model_path = WEIGHTS_DIR # If you want to use previously trained model saved in gdrive, replace this with the full path of model in gdrive
scheduler = DDIMScheduler(beta_start=0.00085, beta_end=0.012, beta_schedule="scaled_linear", clip_sample=False, set_alpha_to_one=False)
pipe = StableDiffusionPipeline.from_pretrained(model_path, scheduler=scheduler, safety_checker=None, torch_dtype=torch.float16).to("cuda")
g_cuda = None |
So you're replacing the scheduler with a different scheduler AFTER the training is completed? Shouldn't this scheduler be part of the model generated by the script? Shouldn't it be integrated in the CKPT? It makes no sense to me to have to load a different scheduler just to get the model to work normally. This should be part of the model... |
@jslegers Schedulers are basically algorithms which specify how to produce the noise for the diffusion process. They aren't part of the network. They aren't trained. Idea is to be able to use any scheduler with any diffusion model. That's why you can also see many scheduler/sampler options in Automatic111's webui too. The ckpt only contains the weights(matrices), not code. Scheduler can't be part of it, it is not weights. |
Every SD model has a file named For example, the 1.5 release has a {
"_class_name": "PNDMScheduler",
"_diffusers_version": "0.6.0",
"beta_end": 0.012,
"beta_schedule": "scaled_linear",
"beta_start": 0.00085,
"num_train_timesteps": 1000,
"set_alpha_to_one": false,
"skip_prk_steps": true,
"steps_offset": 1,
"trained_betas": null
} If, to get my new model to behave nicely, I should use a {
"_class_name": "DDIMScheduler",
"_diffusers_version": "0.6.0",
"beta_start": 0.00085,
"beta_end": 0.012,
"beta_schedule": "scaled_linear",
"clip_sample": false,
"set_alpha_to_one": false
} After that, you can just load the model without having to load the scheduler seperately... |
That is part of the diffusers format weights. The diffusers has always had PNDMScheduler as default. While I have already changed the saved scheduler for diffusers so it has been saving with DDIM scheduler, but when talking of ckpt which is then used for inference in other webuis like automatic111, it can't contain the scheduler. The CKPT is a different format than the diffusers save format you are seeing here. The inference code in new colab is so that even if someone is using older weights, it updates to use the correct scheduler. |
If the scheduler params can't be baked into the CKPT, I prefer not to use the CKPT at all. I never did with my Dreambooth generated models anyway. And this is also one of several reasons why I prefer my own custom variation of your script. I don't want to be forced to use the CKPT file if I don't want to. And I want to choose me own "concept" name (I've been using Anyway, this is the scheduler generated by my script : {
"_class_name": "PNDMScheduler",
"_diffusers_version": "0.6.0",
"beta_end": 0.012,
"beta_schedule": "scaled_linear",
"beta_start": 0.00085,
"num_train_timesteps": 1000,
"set_alpha_to_one": false,
"skip_prk_steps": true,
"steps_offset": 1,
"trained_betas": null
} Why does it still say |
@jslegers Your code must be old. As I have been saying, please just try my new colab first and then compare. And you should use the ckpt with the webuis. They provide so many more options, the web uis decide from the many different schedulers from which u can choose along with having some good defaults. If you had used ckpt with the webuis, you likely wouldn't have gotten the overcooking issue either. I suggest you to explore more and understand what these mean. Btw your concept name |
I used code from yesterday. I see it's been updated since, so I'll give it another try today. Anyway, I just replaced the default scheduler with the scheduler you proposed and checked it on the model that provided the best output. Johnny Depp looks even more overcooked with your scheduler :
I see... I guess this would explain the impact it has on Johnny Depp. Hmmmm... Will have to try a few other identifiers... Will come back to this... |
@jslegers I have nothing more to say until you try out the new colab with its defaults. |
I made the following changes oy my Colab notebook :
Not sure why, but this seems to fix the issue. My Johnny Depp now looks perfectly crisp after 1100 learning steps. I tried using both The only issue I've had with this, is that the output dir for the model is now eg. |
output_dir contains the number at the end to indicate the number of training steps as now the script also supports saving models at different step intervals. |
If the user chooses Why not use sub-directories? So, for example, if you want to save after 1000 and 2000 steps, you have subdirectories |
Describe the bug
I did some testing regarding the impact of Dreambooth on different prompts, using the same seed.
Pretty much all of my tests produced results similar to this, when running running Dreambooth with class "man" and concept "johnslegers" :
Reproduction
Just run Dreambooth once, with "man" as a class and pretty much anything as a concept identifier.
Then compare output of "man" & a celebrity (eg. "Johnny Depp") of the original model with the new model. You'll notice rather extreme degradation.
I've tried using different config, but to no avail. The degradation persists. The degradation persists no matter how many input pics I use, how many class pics I use, what value I use for prior preservation, etc.
Logs
No response
System Info
The issue is system-independent.
See also huggingface#712
The text was updated successfully, but these errors were encountered: