Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add inpainting support to pivotal tuning #152

Merged
merged 3 commits into from Feb 7, 2023

Conversation

levi
Copy link
Contributor

@levi levi commented Jan 29, 2023

Still playing around with the results of this, but wanted to put it up to get feedback on the implementation.

Mainly integrates ShivamShrirao/diffusers' approach to training dreambooth with inpainting. Generates a random mask of cutout rectangles for each image in the training set and then combines it with the noisy latents during training.

I kept the mask example attribute naming different to prevent any issues with masked score estimation. Potentially there is useful overlap here, like sharing a segmentation mask between inpainting training and MSE.

The mask generation function is pretty simple. TBD on the results on my end. The SD2 inpainting model card describes using the random mask generation method defined by LAMA. It's a mask strategy using chained polygons (clusters of rects) and wide rectangles of various sizes. Would be cool to implement this in a future PR. Source here: https://github.com/saic-mdal/lama/blob/358536640559121052e45f307982ee9969ae269b/saicinpainting/training/data/masks.py#L176

@levi levi force-pushed the levi/inpaint branch 2 times, most recently from 9ce3f86 to 5eb9880 Compare January 29, 2023 20:14
@cloneofsimo
Copy link
Owner

Whoa that's pretty cool. I'll have a look thank you for this PR!

@cloneofsimo cloneofsimo changed the base branch from master to develop January 30, 2023 02:20
@cloneofsimo
Copy link
Owner

Hey @levi, can you show us some example outputs? PR looks good to me, but it would be cool if there was example ones

@levi
Copy link
Contributor Author

levi commented Jan 31, 2023

Yeah I’m currently in the process of hooking this up with my inference tool. Will have some examples in a few days.

@cloneofsimo
Copy link
Owner

Hey @levi , have you had success with these models?

@levi
Copy link
Contributor Author

levi commented Feb 4, 2023

Hey @levi , have you had success with these models?

I sent you an email a few days ago with some questions. Let me know. I'll have time this weekend to run some tests.

@cloneofsimo
Copy link
Owner

cloneofsimo commented Feb 4, 2023

Yes, I checked the email thanks, I just realized that follwing params are better:
Note that these params aren't used to make images on the readme. paramas used to make the images are the default ones on the example script folder. These ones are the ones used to make images recently.

export MODEL_NAME="runwayml/stable-diffusion-v1-5"
export INSTANCE_DIR="./data/data_captioned"
export OUTPUT_DIR="./exps/krk_captioned_scale2"

lora_pti \
  --pretrained_model_name_or_path=$MODEL_NAME  \
  --instance_data_dir=$INSTANCE_DIR \
  --output_dir=$OUTPUT_DIR \
  --train_text_encoder \
  --resolution=512 \
  --train_batch_size=1 \
  --gradient_accumulation_steps=2 \
  --gradient_checkpointing \
  --scale_lr \
  --learning_rate_unet=2e-4 \
  --learning_rate_text=1e-6 \
  --learning_rate_ti=5e-4 \
  --color_jitter \
  --lr_scheduler="linear" \
  --lr_warmup_steps=0 \
  --lr_scheduler_lora="constant" \
  --lr_warmup_steps_lora=100 \
  --placeholder_tokens="<s1>|<s2>" \
  --placeholder_token_at_data="<krk>|<s1><s2>" \
  --save_steps=100 \
  --max_train_steps_ti=700 \
  --max_train_steps_tuning=700 \
  --perform_inversion=True \
  --clip_ti_decay \
  --weight_decay_ti=0.000 \
  --weight_decay_lora=0.000 \
  --device="cuda:0" \
  --lora_rank=8 \
  --use_face_segmentation_condition \
  --lora_dropout_p=0.1 \
  --lora_scale=2.0 \

@levi
Copy link
Contributor Author

levi commented Feb 5, 2023

I added a notebook that follows the same inference prompts as the main inference notebook. The results are pretty bad right now.

lora_pti_inpainting_example

With LoRa scale 1.0, the output looks like its undertrained, so I'm increasing the training steps to 3k and will report back any changes. Are there any parameter adjustments you recommend?

image

@levi
Copy link
Contributor Author

levi commented Feb 6, 2023

Looking a lot better after 3k steps!

image

@levi
Copy link
Contributor Author

levi commented Feb 6, 2023

Wow, I'm surprisingly impressed with a scale of 1.0.

image

@levi
Copy link
Contributor Author

levi commented Feb 6, 2023

@cloneofsimo ok, I updated the inference notebook to show some examples using the krk training params. The model looks good after 3000 training steps -- full training params in the inpainting_example.sh script. I wasn't able to get my Disney style model to look good so I ended up removing it from the PR. However, it's likely a parameter or two away from excellent results.

@cloneofsimo
Copy link
Owner

Thanks @levi ! this looks awesome. Just speculations, but I think additionally making inpaint-masks at randomly user-preferred area might additionally speed up stuff, because at this place we only want to additionally train face-regions. I'll test this out myself as well. LGTM!

@cloneofsimo cloneofsimo merged commit 6f82996 into cloneofsimo:develop Feb 7, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants