Why is it so difficult to train a good model with v2.0 compared to v1.5? #442

manniacmind · 2022-12-07T04:56:50Z

manniacmind
Dec 7, 2022

I've been successfully training v1.5 models with 46 images of my face and the prompt "a photo of XYZ person", at a learning rate of 5e-7 and 16.000 steps, and prior preservation in the class "person" and 1500 class images ("a photo of person") - and the results of the training were amazing! I expected to have very similar results in v2.0

But now with v2.0 it seems so much more difficult. Firstly, the automatically generated class images of "a photo of person" are almost entirely black and white (why?), and secondly, even with the same settings as above, the trained character model of my face only looks vagely like me after 16.000 steps. Is it really so much more difficult to train in SD v2?

I tried a second time with a class images batch from the prompt "a color photo of person" and I deleted all really odd images, but those training results were as bad if not worse.
And finally I tried without prior preservation and that turned out even worse.

Is there something very obvious that I missed that should be different when training in v2.0? Besides the new size of 768 instead of 512?

my settings:
For 2.0 I created a model based on 768-v-ema.ckpt and the ddim scheduler. For 1.5 I used v1-5-pruned.cpkt

my parameters in 1.5 and 2.0: 16000 steps, 11 epochs, 8bit Adam, fp16, memory attention xformers, dont cache latents unchecked, train text encoder checked, train EMA checked, pad tokens 75, the rest is default

my gpu is a RTX 3090.

zahdab · 2022-12-13T22:24:25Z

zahdab
Dec 13, 2022

i am getting pretty bad results as well not sure sd 2.0 based models are actually supported ?

0 replies

JashoBell · 2022-12-13T22:51:41Z

JashoBell
Dec 13, 2022

I'm having some troubles as well, but not sure if it's a "me" thing or the model, because honestly with all the conflicting info and implementations I wasn't having the easiest time getting anything good out of 1.5 either. My best run was my first, which produced a decent model of myself in 3000 steps out of between 10-20 images, but it wasn't particularly versatile. Since then I haven't been able to replicate with a larger number of images, different repos, settings, etc.

With respect to the class images, it might be better to run txt2img instead and just point sd_dreambooth to the directory you saved them into. I'm trying out the LORA model approach and I'm getting closer than I was previously, using a folder of 1000 "a man" generations with no manual cleaning.

1 reply

manniacmind Dec 14, 2022
Author

I'm not sure you're supposed to clean up the the regularization images.
This is just my guess, but as far as I understand they are used to differentiate your concept from the class and prevent the class from changing. I mean the class as it is already known to the model.
If you use other images than the ones created for the specific class/prompt you might change the class in unexpected ways or you might end up with a concept that is less accurately represented.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Why is it so difficult to train a good model with v2.0 compared to v1.5? #442

{{title}}

Replies: 2 comments 1 reply

{{title}}

{{title}}

{{title}}

Select a reply

Why is it so difficult to train a good model with v2.0 compared to v1.5? #442

manniacmind Dec 7, 2022

Replies: 2 comments · 1 reply

zahdab Dec 13, 2022

JashoBell Dec 13, 2022

manniacmind Dec 14, 2022 Author

manniacmind
Dec 7, 2022

Replies: 2 comments 1 reply

zahdab
Dec 13, 2022

JashoBell
Dec 13, 2022

manniacmind Dec 14, 2022
Author