Extreme degradation of existing model when running the Dreambooth script #31

jslegers · 2022-10-10T16:26:01Z

Describe the bug

I did some testing regarding the impact of Dreambooth on different prompts, using the same seed.

Pretty much all of my tests produced results similar to this, when running running Dreambooth with class "man" and concept "johnslegers" :

Reproduction

Just run Dreambooth once, with "man" as a class and pretty much anything as a concept identifier.

Then compare output of "man" & a celebrity (eg. "Johnny Depp") of the original model with the new model. You'll notice rather extreme degradation.

I've tried using different config, but to no avail. The degradation persists. The degradation persists no matter how many input pics I use, how many class pics I use, what value I use for prior preservation, etc.

Logs

No response

System Info

The issue is system-independent.

See also huggingface#712

ShivamShrirao · 2022-10-19T13:18:57Z

Hi, I have made some updates, Try out the new colab.

jslegers · 2022-10-23T22:22:43Z

Why is this issue closed?

I tried it out after the updates yesterday, with various different settings, but I barely noticed a difference.

Especially the overcooking of Johnny Depp keeps persisting, unless I lower the number of training steps low enouh the result for the trained subject no longer looks like me.

IMO the issue is far from fixed!

ShivamShrirao · 2022-10-23T22:27:41Z

You may be using old code, I have yet to encounter overcooking in any of experiments after the update.
Do share all your training parameters and inference code.

jslegers · 2022-10-23T23:06:02Z

The script I've been using can be found on https://github.com/jslegers/diffusers/blob/main/examples/dreambooth/DreamBooth_Stable_Diffusion.ipynb. It's basically just your Google Colab script with a few changes for my own convenience.

I recloned https://github.com/jslegers/diffusers from https://github.com/ShivamShrirao/diffusers yesterday to make sure the code underneath the Colab script was identical to yours, and I added the new flag to my Colab.

So everything should be up to date!

I tried several different values for number of training steps (between 800 and 3000), number of class images (between 400 and 3500), weigh loss weight (between 0.1 and 1.0), learning rate (between 5e-7 and 5e-5) and adam weight decay (between 1e-3 and 1e-2), but those only produced worse results. The rest of the parameters in the script, I believe, we the ones that produced the BEST results.

Here's some of the results I got from my experiments yesterday for Johnny Depp, after training the model on myself :

ShivamShrirao · 2022-10-23T23:11:41Z

@jslegers Most of your parameters seem different to me along with having different schedulers and such. Even colab has changed a lot than yours. I suggest you try out my colab directly with it's default parametrs and report the results. Do not change any parameter in this colab. Just upload your 5-10 images and specify the class.

jslegers · 2022-10-23T23:12:22Z

All parameters are default values except for the ones I listed...

ShivamShrirao · 2022-10-23T23:13:05Z

@jslegers your inference code is also different using wrong scheduler which causes the overcooking. Just try out the new colab. Number of training steps is also too high.

jslegers · 2022-10-23T23:25:53Z

What do you mean by "wrong scheduler"?

Which parameter are we talking about?

ShivamShrirao · 2022-10-23T23:27:38Z

@jslegers check the inference section of my colab. You should go through all the cells of the colab to update your notebook.

import torch
from torch import autocast
from diffusers import StableDiffusionPipeline, DDIMScheduler
from IPython.display import display

model_path = WEIGHTS_DIR             # If you want to use previously trained model saved in gdrive, replace this with the full path of model in gdrive

scheduler = DDIMScheduler(beta_start=0.00085, beta_end=0.012, beta_schedule="scaled_linear", clip_sample=False, set_alpha_to_one=False)
pipe = StableDiffusionPipeline.from_pretrained(model_path, scheduler=scheduler, safety_checker=None, torch_dtype=torch.float16).to("cuda")

g_cuda = None

jslegers · 2022-10-23T23:36:37Z

So you're replacing the scheduler with a different scheduler AFTER the training is completed?

Shouldn't this scheduler be part of the model generated by the script? Shouldn't it be integrated in the CKPT?

It makes no sense to me to have to load a different scheduler just to get the model to work normally. This should be part of the model...

ShivamShrirao · 2022-10-23T23:42:17Z

@jslegers Schedulers are basically algorithms which specify how to produce the noise for the diffusion process. They aren't part of the network. They aren't trained. Idea is to be able to use any scheduler with any diffusion model. That's why you can also see many scheduler/sampler options in Automatic111's webui too. The ckpt only contains the weights(matrices), not code. Scheduler can't be part of it, it is not weights.

jslegers · 2022-10-23T23:52:36Z

Every SD model has a file named scheduler_config.json in the scheduler folder that contains the configuration for the scheduler SD will be used for that model.

For example, the 1.5 release has a scheduler_config.json file with the following content :

{
  "_class_name": "PNDMScheduler",
  "_diffusers_version": "0.6.0",
  "beta_end": 0.012,
  "beta_schedule": "scaled_linear",
  "beta_start": 0.00085,
  "num_train_timesteps": 1000,
  "set_alpha_to_one": false,
  "skip_prk_steps": true,
  "steps_offset": 1,
  "trained_betas": null
}

If, to get my new model to behave nicely, I should use a DDIMScheduler with parameters beta_start=0.00085, beta_end=0.012, beta_schedule="scaled_linear", clip_sample=False and set_alpha_to_one=False, then the file scheduler_config.json of the model should contain the following content :

{
  "_class_name": "DDIMScheduler",
  "_diffusers_version": "0.6.0",
  "beta_start": 0.00085,
  "beta_end": 0.012,
  "beta_schedule": "scaled_linear",
  "clip_sample": false,
  "set_alpha_to_one": false
}

After that, you can just load the model without having to load the scheduler seperately...

ShivamShrirao · 2022-10-23T23:57:16Z

That is part of the diffusers format weights. The diffusers has always had PNDMScheduler as default. While I have already changed the saved scheduler for diffusers so it has been saving with DDIM scheduler, but when talking of ckpt which is then used for inference in other webuis like automatic111, it can't contain the scheduler. The CKPT is a different format than the diffusers save format you are seeing here.

The inference code in new colab is so that even if someone is using older weights, it updates to use the correct scheduler.

jslegers · 2022-10-24T00:06:31Z

If the scheduler params can't be baked into the CKPT, I prefer not to use the CKPT at all. I never did with my Dreambooth generated models anyway.

And this is also one of several reasons why I prefer my own custom variation of your script. I don't want to be forced to use the CKPT file if I don't want to. And I want to choose me own "concept" name (I've been using johnslegers so far, which should be rare enough and a better option than sks, which is a gun).

Anyway, this is the scheduler generated by my script :

{
  "_class_name": "PNDMScheduler",
  "_diffusers_version": "0.6.0",
  "beta_end": 0.012,
  "beta_schedule": "scaled_linear",
  "beta_start": 0.00085,
  "num_train_timesteps": 1000,
  "set_alpha_to_one": false,
  "skip_prk_steps": true,
  "steps_offset": 1,
  "trained_betas": null
}

Why does it still say PNDMScheduler here rather than DDIMScheduler?

ShivamShrirao · 2022-10-24T00:22:02Z

@jslegers Your code must be old. As I have been saying, please just try my new colab first and then compare.

And you should use the ckpt with the webuis. They provide so many more options, the web uis decide from the many different schedulers from which u can choose along with having some good defaults. If you had used ckpt with the webuis, you likely wouldn't have gotten the overcooking issue either. I suggest you to explore more and understand what these mean.

Btw your concept name johnslegers isn't rare cause it first gets divided into tokens of common words so kinda like john-sleg-ers or something. Whilesks is just a single token, the model sees your name as 3-4 different tokens. You should read about Byte Pair encoding. And yeah sks isn't a good rare token, just the rare tokens will be small usually max 3 characters of totally random combination which can't be broken down into anymore common words.

jslegers · 2022-10-24T00:27:42Z

our code must be old.

I used code from yesterday. I see it's been updated since, so I'll give it another try today.

Anyway, I just replaced the default scheduler with the scheduler you proposed and checked it on the model that provided the best output.

Johnny Depp looks even more overcooked with your scheduler :

Btw your concept name johnslegers isn't rare cause it first gets divided into tokens of common words so kinda like john-sleg-ers or something.

I see...

I guess this would explain the impact it has on Johnny Depp.

Hmmmm...

Will have to try a few other identifiers...

Will come back to this...

ShivamShrirao · 2022-10-24T00:30:09Z

@jslegers I have nothing more to say until you try out the new colab with its defaults.

jslegers · 2022-10-24T02:33:16Z

I made the following changes oy my Colab notebook :

I used the latest version of your repo to run the training script from
I added --pretrained_vae_name_or_path="stabilityai/sd-vae-ft-mse" to the parameters

Not sure why, but this seems to fix the issue. My Johnny Depp now looks perfectly crisp after 1100 learning steps.

I tried using both js and johnslegers as an identifier, but that didn't make any difference. The identifier johnslegers seems to work just as well as js... No need to switch schedulers either.

The only issue I've had with this, is that the output dir for the model is now eg. johnslegers-1-5-0.16_1100 instead of johnslegers-1-5-0.16 as is should be, which is pretty annoying. But that's something that can easily be fixed...

ShivamShrirao · 2022-10-24T02:43:54Z

output_dir contains the number at the end to indicate the number of training steps as now the script also supports saving models at different step intervals.

jslegers · 2022-10-24T02:49:27Z

output_dir contains the number at the end to indicate the number of training steps as now the script also supports saving models at different step intervals.

If the user chooses /content/data/sks as output dir, it doesn't make sense to have output in eg. /content/data/sks_1000, especially if they are not interested in saving at different intervals.

Why not use sub-directories? So, for example, if you want to save after 1000 and 2000 steps, you have subdirectories 1000 and 2000 of directory /content/data/sks. And if you do NOT want to save at different intervals, you could skip subdirectories and just save in the main directory...

jslegers added the bug Something isn't working label Oct 10, 2022

ShivamShrirao closed this as completed Oct 23, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Extreme degradation of existing model when running the Dreambooth script #31

Extreme degradation of existing model when running the Dreambooth script #31

jslegers commented Oct 10, 2022 •

edited

Loading

ShivamShrirao commented Oct 19, 2022

jslegers commented Oct 23, 2022 •

edited

Loading

ShivamShrirao commented Oct 23, 2022

jslegers commented Oct 23, 2022 •

edited

Loading

ShivamShrirao commented Oct 23, 2022

jslegers commented Oct 23, 2022 •

edited

Loading

ShivamShrirao commented Oct 23, 2022 •

edited

Loading

jslegers commented Oct 23, 2022

ShivamShrirao commented Oct 23, 2022 •

edited

Loading

jslegers commented Oct 23, 2022

ShivamShrirao commented Oct 23, 2022

jslegers commented Oct 23, 2022 •

edited

Loading

ShivamShrirao commented Oct 23, 2022 •

edited

Loading

jslegers commented Oct 24, 2022 •

edited

Loading

ShivamShrirao commented Oct 24, 2022 •

edited

Loading

jslegers commented Oct 24, 2022

ShivamShrirao commented Oct 24, 2022

jslegers commented Oct 24, 2022 •

edited

Loading

ShivamShrirao commented Oct 24, 2022

jslegers commented Oct 24, 2022 •

edited

Loading

Extreme degradation of existing model when running the Dreambooth script #31

Extreme degradation of existing model when running the Dreambooth script #31

Comments

jslegers commented Oct 10, 2022 • edited Loading

Describe the bug

Reproduction

Logs

System Info

ShivamShrirao commented Oct 19, 2022

jslegers commented Oct 23, 2022 • edited Loading

ShivamShrirao commented Oct 23, 2022

jslegers commented Oct 23, 2022 • edited Loading

ShivamShrirao commented Oct 23, 2022

jslegers commented Oct 23, 2022 • edited Loading

ShivamShrirao commented Oct 23, 2022 • edited Loading

jslegers commented Oct 23, 2022

ShivamShrirao commented Oct 23, 2022 • edited Loading

jslegers commented Oct 23, 2022

ShivamShrirao commented Oct 23, 2022

jslegers commented Oct 23, 2022 • edited Loading

ShivamShrirao commented Oct 23, 2022 • edited Loading

jslegers commented Oct 24, 2022 • edited Loading

ShivamShrirao commented Oct 24, 2022 • edited Loading

jslegers commented Oct 24, 2022

ShivamShrirao commented Oct 24, 2022

jslegers commented Oct 24, 2022 • edited Loading

ShivamShrirao commented Oct 24, 2022

jslegers commented Oct 24, 2022 • edited Loading

jslegers commented Oct 10, 2022 •

edited

Loading

jslegers commented Oct 23, 2022 •

edited

Loading

jslegers commented Oct 23, 2022 •

edited

Loading

jslegers commented Oct 23, 2022 •

edited

Loading

ShivamShrirao commented Oct 23, 2022 •

edited

Loading

ShivamShrirao commented Oct 23, 2022 •

edited

Loading

jslegers commented Oct 23, 2022 •

edited

Loading

ShivamShrirao commented Oct 23, 2022 •

edited

Loading

jslegers commented Oct 24, 2022 •

edited

Loading

ShivamShrirao commented Oct 24, 2022 •

edited

Loading

jslegers commented Oct 24, 2022 •

edited

Loading

jslegers commented Oct 24, 2022 •

edited

Loading