New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add inpainting support to pivotal tuning #152
Conversation
9ce3f86
to
5eb9880
Compare
Whoa that's pretty cool. I'll have a look thank you for this PR! |
Hey @levi, can you show us some example outputs? PR looks good to me, but it would be cool if there was example ones |
Yeah I’m currently in the process of hooking this up with my inference tool. Will have some examples in a few days. |
Hey @levi , have you had success with these models? |
I sent you an email a few days ago with some questions. Let me know. I'll have time this weekend to run some tests. |
Yes, I checked the email thanks, I just realized that follwing params are better: export MODEL_NAME="runwayml/stable-diffusion-v1-5"
export INSTANCE_DIR="./data/data_captioned"
export OUTPUT_DIR="./exps/krk_captioned_scale2"
lora_pti \
--pretrained_model_name_or_path=$MODEL_NAME \
--instance_data_dir=$INSTANCE_DIR \
--output_dir=$OUTPUT_DIR \
--train_text_encoder \
--resolution=512 \
--train_batch_size=1 \
--gradient_accumulation_steps=2 \
--gradient_checkpointing \
--scale_lr \
--learning_rate_unet=2e-4 \
--learning_rate_text=1e-6 \
--learning_rate_ti=5e-4 \
--color_jitter \
--lr_scheduler="linear" \
--lr_warmup_steps=0 \
--lr_scheduler_lora="constant" \
--lr_warmup_steps_lora=100 \
--placeholder_tokens="<s1>|<s2>" \
--placeholder_token_at_data="<krk>|<s1><s2>" \
--save_steps=100 \
--max_train_steps_ti=700 \
--max_train_steps_tuning=700 \
--perform_inversion=True \
--clip_ti_decay \
--weight_decay_ti=0.000 \
--weight_decay_lora=0.000 \
--device="cuda:0" \
--lora_rank=8 \
--use_face_segmentation_condition \
--lora_dropout_p=0.1 \
--lora_scale=2.0 \ |
I added a notebook that follows the same inference prompts as the main inference notebook. The results are pretty bad right now. With LoRa scale 1.0, the output looks like its undertrained, so I'm increasing the training steps to 3k and will report back any changes. Are there any parameter adjustments you recommend? |
@cloneofsimo ok, I updated the inference notebook to show some examples using the krk training params. The model looks good after 3000 training steps -- full training params in the inpainting_example.sh script. I wasn't able to get my Disney style model to look good so I ended up removing it from the PR. However, it's likely a parameter or two away from excellent results. |
Thanks @levi ! this looks awesome. Just speculations, but I think additionally making inpaint-masks at randomly user-preferred area might additionally speed up stuff, because at this place we only want to additionally train face-regions. I'll test this out myself as well. LGTM! |
Still playing around with the results of this, but wanted to put it up to get feedback on the implementation.
Mainly integrates ShivamShrirao/diffusers' approach to training dreambooth with inpainting. Generates a random mask of cutout rectangles for each image in the training set and then combines it with the noisy latents during training.
I kept the mask example attribute naming different to prevent any issues with masked score estimation. Potentially there is useful overlap here, like sharing a segmentation mask between inpainting training and MSE.
The mask generation function is pretty simple. TBD on the results on my end. The SD2 inpainting model card describes using the random mask generation method defined by LAMA. It's a mask strategy using chained polygons (clusters of rects) and wide rectangles of various sizes. Would be cool to implement this in a future PR. Source here: https://github.com/saic-mdal/lama/blob/358536640559121052e45f307982ee9969ae269b/saicinpainting/training/data/masks.py#L176