Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Scale of KD-feature loss for SD inpainting 1.5 #21

Closed
Bikesuffer opened this issue Aug 21, 2023 · 11 comments
Closed

Scale of KD-feature loss for SD inpainting 1.5 #21

Bikesuffer opened this issue Aug 21, 2023 · 11 comments

Comments

@Bikesuffer
Copy link

Bikesuffer commented Aug 21, 2023

Hi there,

I am trying to distill the Unet in SD inpainting 1.5 to a smaller Unet by using your code. (I replaced the pipeline to inpainting and the input data)
I have trained for 130K steps with batch size 64.
Right now the kd_feat_loss is around 20.

I am wondering what kd_feat_loss you have when you finished distill the Unet in your experiment?

Thank you.

@bokyeong1015
Copy link
Member

bokyeong1015 commented Aug 22, 2023

Hi, thanks for utilizing our work, glad to know that 😊
Although we haven't attempted inpainting experiments, we hope the following information can be helpful.


Here is a loss curve from our code for text-to-image synthesis, with SD-v1.4 and batch size 64 (= gradient accumulation 4 x mini batch size 16), plotted with 500-point moving average:

loss_curve_batchsz64_230822

  • The scale of KD feature loss ≫ The scale of KD output loss and SD task loss
    • As we described in our paper, we didn’t try hyperparameter tuning for loss weights, but it empirically worked well in our experiments.
  • Losses are not directly correlated with the final generation scores (FID/IS/CLIP score), especially in later iterations. In other words, lower losses did not necessarily result in better generation scores.
  • If you want to verify the learning process, we suggest examining the final metrics and/or visual examples. Nevertheless, the losses should decrease during initial iterations.

@bokyeong1015 bokyeong1015 changed the title Batch Size Scale of KD-feature loss for SD inpainting 1.5 Aug 22, 2023
@bokyeong1015
Copy link
Member

Please understand that we've changed the name of this issue, 'Batch Size' -> 'Scale of KD-feature loss for SD inpainting 1.5', to clarify the topic and make it easier for people to find in the future.

@Bikesuffer
Copy link
Author

Thanks a lot for the information.

@yajieC
Copy link

yajieC commented Aug 31, 2023

hello, does this method work for SD inpainting 1.5?

@bokyeong1015
Copy link
Member

Hi, @yajieC
We haven't tried it, but we believe our models can be used after finetuning for SD-inpainting.

Our models are compressed from SD-v1.4, and SD-v1.x models share the same architecture (with different training recipes); SD-inpainting was based on SD-v1 backbone.

@Bikesuffer
Copy link
Author

Bikesuffer commented Sep 1, 2023

hello, does this method work for SD inpainting 1.5?

Yes, it worked for me.
I have successfully distill the unet in sd inpainting 1.5 to a smaller Unet
I would say the SD_base model distilled with batch size 256(I call it IP_Base_256) generate best result for me.

@bokyeong1015
Copy link
Member

bokyeong1015 commented Sep 1, 2023

Thanks for sharing the above and this good news! Happy to know you are okay with the inpainting results using our approach :) Could we ask if you have plans to release your models and/or code?


Edit: sorry for initial misunderstanding, you've clarified that "distill the unet in sd inpainting 1.5 to a smaller Unet", which means (Teacher, Student) = (SD-inpainting 1.5, BK-SDM modified using additional input channels) <- please let us know if this is incorrect updated. Thanks again for sharing! @Bikesuffer

@Bikesuffer
Copy link
Author

Bikesuffer commented Sep 5, 2023

Thanks for sharing the above and this good news! Happy to know you are okay with the inpainting results using our approach :) Could we ask if you have plans to release your models and/or code?


Edit: sorry for initial misunderstanding, you've clarified that "distill the unet in sd inpainting 1.5 to a smaller Unet", which means (Teacher, Student) = (SD-inpainting 1.5, BK-SDM) <- please let us know if this is incorrect. Thanks again for sharing! @Bikesuffer

Hi actually the student is a modified version of bk sdm since the input of unet in inpainting pipeline is 9 channel. But all the anchor points for calculating the loss are the same as bk sdm.

@bokyeong1015
Copy link
Member

Thanks for the clarification, and we've updated the student description in the above :)

@yajieC
Copy link

yajieC commented Sep 8, 2023

hi, I tried this method, but found that the performance was very poor. My experimental configuration was to train on laion_11k data for 10k steps, and the unet is bk_tiny. And I also replaced the pipeline to inpainting and the input data. I would like to ask you for any good suggestions, thanks.

@bokyeong1015
Copy link
Member

@yajieC Thanks for your inquiry. We would like to address this in a separate discussion for making it easier for future readers to find, because it seems a different topic. Please kindly refer to our response at that link.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants