Why is it so difficult to train a good model with v2.0 compared to v1.5? #442
Replies: 2 comments 1 reply
-
i am getting pretty bad results as well not sure sd 2.0 based models are actually supported ? |
Beta Was this translation helpful? Give feedback.
-
I'm having some troubles as well, but not sure if it's a "me" thing or the model, because honestly with all the conflicting info and implementations I wasn't having the easiest time getting anything good out of 1.5 either. My best run was my first, which produced a decent model of myself in 3000 steps out of between 10-20 images, but it wasn't particularly versatile. Since then I haven't been able to replicate with a larger number of images, different repos, settings, etc. With respect to the class images, it might be better to run txt2img instead and just point sd_dreambooth to the directory you saved them into. I'm trying out the LORA model approach and I'm getting closer than I was previously, using a folder of 1000 "a man" generations with no manual cleaning. |
Beta Was this translation helpful? Give feedback.
-
I've been successfully training v1.5 models with 46 images of my face and the prompt "a photo of XYZ person", at a learning rate of 5e-7 and 16.000 steps, and prior preservation in the class "person" and 1500 class images ("a photo of person") - and the results of the training were amazing! I expected to have very similar results in v2.0
But now with v2.0 it seems so much more difficult. Firstly, the automatically generated class images of "a photo of person" are almost entirely black and white (why?), and secondly, even with the same settings as above, the trained character model of my face only looks vagely like me after 16.000 steps. Is it really so much more difficult to train in SD v2?
I tried a second time with a class images batch from the prompt "a color photo of person" and I deleted all really odd images, but those training results were as bad if not worse.
And finally I tried without prior preservation and that turned out even worse.
Is there something very obvious that I missed that should be different when training in v2.0? Besides the new size of 768 instead of 512?
my settings:
For 2.0 I created a model based on 768-v-ema.ckpt and the ddim scheduler. For 1.5 I used v1-5-pruned.cpkt
my parameters in 1.5 and 2.0: 16000 steps, 11 epochs, 8bit Adam, fp16, memory attention xformers, dont cache latents unchecked, train text encoder checked, train EMA checked, pad tokens 75, the rest is default
my gpu is a RTX 3090.
Beta Was this translation helpful? Give feedback.
All reactions