Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ideal settings for face PLUS body training on current version #1173

Open
User1231300 opened this issue Dec 27, 2022 · 27 comments
Open

Ideal settings for face PLUS body training on current version #1173

User1231300 opened this issue Dec 27, 2022 · 27 comments

Comments

@User1231300
Copy link

User1231300 commented Dec 27, 2022

Hello,

first of all thank you for what you are doing for free. This topic is not meant to be a complain and I hope it is clear.

I would like to open a thread and hopefully get good answers from people and from TheLastBen about what are the ideal settings in order to train the model on a single subject and having the best results possible:

  • Assuming we have access to unlimited images and unlimited time

  • Wanting to train on both subject face + and face body shots . This can be achieved by including various angles and distances from the subject (including more or less of the body in different images).

Thank you to everyone that will contribute.

@iqddd
Copy link

iqddd commented Dec 28, 2022

I join the question. There are images: face only; body only; body and face. In most "body and face" cases, the body is cropped due to the 512px square limit. In rare cases, a part of the face is cropped.
In most of the images, the "body" is "standing upright".
But some of the images can be described as "lying on" or "posing".

@TheLastBen
Copy link
Owner

TheLastBen commented Dec 28, 2022

The ideal settings is to stay below 15 for instance images, make sure they are diverse, you can reach results in less than 800 steps, that's less than 15 minutes, so you can comfortably try different settings until you get the desired result.

when you want to resume training, try reducing the learning rate slightly to concentrate on the small details of the picture.

@tpcdaz
Copy link

tpcdaz commented Dec 28, 2022

I personally tried the new settings as I only use dreambooth for faces and the only way I get any good results are by using the previous settings. so 3000 steps for around 20+ photos, 2e-6 unet for both images and text. Trying with 10 / 15 / 20 images with 800 steps or under gives me questionable results, and although people say "just keep adding to the training" not many people can do that as colab has limits. Even though I am a premium user the 15 minute training suddenly takes an hour because of all the minor tweaks you have to do to get it looking anything good. So if I were you use 3000 steps for 20+ images, 2e-6 unet learning rate for both the text encoder and images. Takes around 45 mins on standard colab gpu or 20 mins on premium colab and it will look GREAT first time with no tweaking.

@juan9999
Copy link

yeah i had less than ideal results with latest settings and have had to lower the learning rate.

i have premium colab. are you quoting total session time or just training time? i am cheap and trying to calculate total time for the job using a premium gpu vs not

@kozka
Copy link

kozka commented Dec 30, 2022

the fast_DreamBooth-Old-Method , always worked for me the first time and I haven't gotten that quality in the models anymore.
:( ,
Now I have to try and try a thousand times to get something similar and it still doesn't come.

@TheLastBen
Copy link
Owner

@kozka set the learning rate to 2e-6 for both unet and text_enc and up the unet steps to 3000, this is exactly like before.

@LIQUIDMIND111
Copy link

LIQUIDMIND111 commented Dec 31, 2022

the fast_DreamBooth-Old-Method , always worked for me the first time and I haven't gotten that quality in the models anymore. :( , Now I have to try and try a thousand times to get something similar and it still doesn't come.

Since they removed the OLD method, NONE of my face results are favorable, except styles,

i have no problems with styles, but on faces, i have paid 2 months of Google PRO and NEVER had a good ckpt file no matter what i do and i have been using this since October ,

ALL WAS GOOD with PRIOR images and the old method...... then after introducing the renaming INSTANCE IMAGES, everything was OK if you followed instructions, but now, after the OLD page was removed, this NEW page only works for me, for styles.... NICE quality,

But since 2 weeks from now, all models that i make from a person look ugly, and very hard to get settings correct....

@TheLastBen
Copy link
Owner

TheLastBen commented Dec 31, 2022

send me 10 of your instance images and I will train the model for you to prove that it works

@Ekaitza1985
Copy link

Hello,
@TheLastBen I try with 3000 steps, 2e-6 on both and 450 on text learning and i get amazing results with my face but if i write, for example, "a beautiful portrait of will smith" seems will smith but he has my complexion and some of my features. Another error that i found is: if i write a prompt asking for a "ilustration" or "digital painting" the model ignores it and do a realistic photography and dunno why. I try on the same instalation even without rebooting the stable diffusion with SD 2.1 768px and i get a draw as i ask for. I do again the train model if you want it (i can share via my google drive) Will be nice to catch up my error. I spend 1 week trying by myself without any results before ask here, and dunno where i fail.
Ty in advance for your time and your job here

@TheLastBen
Copy link
Owner

mixing your face with other faces is a common issue with deep learning models called overfitting.

if you want your face to by stylized as painting, you need to reduce the text encoder steps to 250 and its learning rate to 1e-6

@Ekaitza1985
Copy link

@TheLastBen ty for your tip. I will do now and i will coment to you the results.

@Ekaitza1985
Copy link

Hello again @TheLastBen,
With your indications 3000steps with 2e-6 learning and 250 1e-6 i get better results on "portrait of will smith and portrait of my_token" but now ,
if i prompt:
Will Smith, d & d, fantasy, intricate, elegant, highly detailed, digital painting, artstation, concept art, matte, sharp focus, illustration, hearthstone, art by artgerm and greg rutkowski and alphonse mucha, 8k.
With negative:
deformed, cripple, ugly, additional arms, additional legs, additional head, two heads, multiple people, group of people
Euler A @ 50 steps 786px
I get amazing results!!
And if i prompt the same but with my token:
28101985KylarKyray, d & d, fantasy, intricate, elegant, highly detailed, digital painting, artstation, concept art, matte, sharp focus, illustration, hearthstone, art by artgerm and greg rutkowski and alphonse mucha, 8k.
And the same paramns.
I get awesome results but me as a Woman.

Seems that the class person are in cclonflict with i am a male? or smth wrong on the prompt? I cant understand why will smith is considered as a man as a base and me i need to put my token as a man.. bla bla ..

Thank you so much for this help! i am sooo happy to have some light on this ^^!

@LIQUIDMIND111
Copy link

Hello again @TheLastBen, With your indications 3000steps with 2e-6 learning and 250 1e-6 i get better results on "portrait of will smith and portrait of my_token" but now , if i prompt: Will Smith, d & d, fantasy, intricate, elegant, highly detailed, digital painting, artstation, concept art, matte, sharp focus, illustration, hearthstone, art by artgerm and greg rutkowski and alphonse mucha, 8k. With negative: deformed, cripple, ugly, additional arms, additional legs, additional head, two heads, multiple people, group of people Euler A @ 50 steps 786px I get amazing results!! And if i prompt the same but with my token: 28101985KylarKyray, d & d, fantasy, intricate, elegant, highly detailed, digital painting, artstation, concept art, matte, sharp focus, illustration, hearthstone, art by artgerm and greg rutkowski and alphonse mucha, 8k. And the same paramns. I get awesome results but me as a Woman.

Seems that the class person are in cclonflict with i am a male? or smth wrong on the prompt? I cant understand why will smith is considered as a man as a base and me i need to put my token as a man.. bla bla ..

Thank you so much for this help! i am sooo happy to have some light on this ^^!

did you used the new CAPTION OPTION and the regularization images section too?

@LIQUIDMIND111
Copy link

mixing your face with other faces is a common issue with deep learning models called overfitting.

if you want your face to by stylized as painting, you need to reduce the text encoder steps to 250 and its learning rate to 1e-6
Screenshot 2023-01-01 124426

is this caption section optional too? is the regularization images like the CLASS images on the old notebook?

@LIQUIDMIND111
Copy link

Hello again @TheLastBen, With your indications 3000steps with 2e-6 learning and 250 1e-6 i get better results on "portrait of will smith and portrait of my_token" but now , if i prompt: Will Smith, d & d, fantasy, intricate, elegant, highly detailed, digital painting, artstation, concept art, matte, sharp focus, illustration, hearthstone, art by artgerm and greg rutkowski and alphonse mucha, 8k. With negative: deformed, cripple, ugly, additional arms, additional legs, additional head, two heads, multiple people, group of people Euler A @ 50 steps 786px I get amazing results!! And if i prompt the same but with my token: 28101985KylarKyray, d & d, fantasy, intricate, elegant, highly detailed, digital painting, artstation, concept art, matte, sharp focus, illustration, hearthstone, art by artgerm and greg rutkowski and alphonse mucha, 8k. And the same paramns. I get awesome results but me as a Woman.

Seems that the class person are in cclonflict with i am a male? or smth wrong on the prompt? I cant understand why will smith is considered as a man as a base and me i need to put my token as a man.. bla bla ..

Thank you so much for this help! i am sooo happy to have some light on this ^^!

also how many INSTANCE images you used with this new results?

@Ekaitza1985
Copy link

Ekaitza1985 commented Jan 1, 2023

Hello again @TheLastBen, With your indications 3000steps with 2e-6 learning and 250 1e-6 i get better results on "portrait of will smith and portrait of my_token" but now , if i prompt: Will Smith, d & d, fantasy, intricate, elegant, highly detailed, digital painting, artstation, concept art, matte, sharp focus, illustration, hearthstone, art by artgerm and greg rutkowski and alphonse mucha, 8k. With negative: deformed, cripple, ugly, additional arms, additional legs, additional head, two heads, multiple people, group of people Euler A @ 50 steps 786px I get amazing results!! And if i prompt the same but with my token: 28101985KylarKyray, d & d, fantasy, intricate, elegant, highly detailed, digital painting, artstation, concept art, matte, sharp focus, illustration, hearthstone, art by artgerm and greg rutkowski and alphonse mucha, 8k. And the same paramns. I get awesome results but me as a Woman.
Seems that the class person are in cclonflict with i am a male? or smth wrong on the prompt? I cant understand why will smith is considered as a man as a base and me i need to put my token as a man.. bla bla ..
Thank you so much for this help! i am sooo happy to have some light on this ^^!

also how many INSTANCE images you used with this new results?

Hello @LIQUIDMIND111
This were my params:
28101985KylarKyray (22 pics resized on my pc and not from colab). At least you could try num_photo*100 and then increase by 300 or 500 the train.

UNet_Training_Steps: 3000
UNet_Learning_Rate: 2e-6
Text_Encoder_Training_Steps:250
Text_Encoder_Learning_Rate: 1e-6
External Cap OFF
Style Training OFF
RES 768

@LIQUIDMIND111
Copy link

Hello again @TheLastBen, With your indications 3000steps with 2e-6 learning and 250 1e-6 i get better results on "portrait of will smith and portrait of my_token" but now , if i prompt: Will Smith, d & d, fantasy, intricate, elegant, highly detailed, digital painting, artstation, concept art, matte, sharp focus, illustration, hearthstone, art by artgerm and greg rutkowski and alphonse mucha, 8k. With negative: deformed, cripple, ugly, additional arms, additional legs, additional head, two heads, multiple people, group of people Euler A @ 50 steps 786px I get amazing results!! And if i prompt the same but with my token: 28101985KylarKyray, d & d, fantasy, intricate, elegant, highly detailed, digital painting, artstation, concept art, matte, sharp focus, illustration, hearthstone, art by artgerm and greg rutkowski and alphonse mucha, 8k. And the same paramns. I get awesome results but me as a Woman.
Seems that the class person are in cclonflict with i am a male? or smth wrong on the prompt? I cant understand why will smith is considered as a man as a base and me i need to put my token as a man.. bla bla ..
Thank you so much for this help! i am sooo happy to have some light on this ^^!

also how many INSTANCE images you used with this new results?

Hello @LIQUIDMIND111 This were my params: 28101985KylarKyray (22 pics resized on my pc and not from colab). At least you could try num_photo*100 and then increase by 300 or 500 the train.

UNet_Training_Steps: 3000 UNet_Learning_Rate: 2e-6 Text_Encoder_Training_Steps:250 Text_Encoder_Learning_Rate: 1e-6 External Cap OFF Style Training OFF RES 768

thanks mate, i will try soon

@kozka
Copy link

kozka commented Jan 5, 2023

I was testing many models and many configurations etc,,
and I think that with 23 photos, 2300 unet ie-5 and 1600 of train text 1e-6
It has given me very good results the first time, with

@Ekaitza1985
Copy link

Thank you @kozka, i will try your params and write here if i get a good results too

@Ekaitza1985
Copy link

hello again @kozka , with 1600 steps on train text i get a really bad results

@kozka
Copy link

kozka commented Jan 8, 2023

ok I guess the photos I used had something to do with it,

@Ekaitza1985
Copy link

@kozka i am really don't know... i am using the same photos that i used to create a sd 1.5 model with success but redimensioned to 768. any param works well and dunno why :S.

With your OK model coudl you do a test for me?

A beautiful portrait of Will Smith, award winning photography
negative: blurry, black and white, disfigured, malformed, kitch
Eurer:_a 30 steps

and thell me if the pic generated is will smith but similar to your tained model or is will smith 100%
will be great
Ty

@kozka
Copy link

kozka commented Jan 8, 2023

I think the best thing is to use the normal model 1.5 to generate an image of willsmith next to someone and then put your face to the other person using your personal trained model, using inpainting to put it next to him or something like that, when I tried to train 2 models at the same time sometimes the images come out well and many others don't,
when you train your model and try to get willsmith out without previously training him with only images of 1 person it will mix the faces
I think you could do these two things
,1) the easiest is: you use model 1.5 to generate an image of will smith next to someone and then you change his face with your trained model
2) it is more difficult: you train the model with photos of the person you are training and also photos of willsmith.

*i have tried way 1, take a picture of willsmith with someone and then use my model in inpainting to change the face only
It didn't turn out very well but it's the first thing that came out quickly.

00112-1407387918-genaro face, award winning photography

@Ekaitza1985
Copy link

Ekaitza1985 commented Jan 8, 2023

i didn't consider that -> person it will mix the faces.
So, @kozka , i change my ask question to: how can i know if my model is consistency ? if i try only, for example " a portrait of TOEKEN" o get a photo that seems more old than the model is or really really ugly and if i try to put negative prompts or "a beautiful portrait of TOEKEN" i get an image really more beauty than the model. So i am confused on this point.

@kozka
Copy link

kozka commented Jan 8, 2023

I'm not an expert
When I create a model, the first thing I do is put ,photo token, and see what comes out, so when I retrain it I see if something improves, and I can compare it, if the photo token is very bad and does not look like the trained model, it is what they have little training or the selected photos are not the best,
but let's go if you put "a beautiful portrait of TOEKEN" if it is more handsome it is because you have made it more beautiful... but the important thing is that it resembles the trained model it will always make small variations that may not convince you of 100 photos it may only be 50 you very similar and only 10 remarkable and beautiful photos.
if you train a model a lot you will only get selfies,
and if you overtrain a model you will only get the same photos you used to train it.

@Ekaitza1985
Copy link

Ekaitza1985 commented Jan 8, 2023

Hello again @kozka !!!
First of all ty for helping me with this last msg!
I did some tests with the models that i created last days.
with LR that @LIQUIDMIND111 comented before i did:
1500 steps and text learning 150 = i try a photo of token, no negative prompts and from 10 photos 0 are similar to me
2000 steps and text learning 150 = i try a photo of token, no negative prompts and from 10 photos 3 maybe 4 are similar to me
2200 steps and text learning 1600 = i try a photo of token, no negative prompts and from 10 photos 3 maybe 4 are similar to me

So if i am not wrong i need some more unet training steps to get a better and acurate model, right?

@kozka
Copy link

kozka commented Jan 9, 2023

you are right ,
or more steps you are trying 200 by 200 and you are seeing if it improves,
or change the input photos, crop them better, 3 of the face, 3 shoulders, 3 half-length, 1 full-length. + or - is also not an exact science. but if the photos do not show the face well or it is too far away, or the background is too chaotic, the best are neutral backgrounds such as a blank wall or without many things or people. it may cost you more.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants