This repository contains the implementation of the paper:
Non-confusing Generation of Customized Concepts in Diffusion Models (ICML 24)
Wang Lin1,*,
Jingyuan Chen1,
Jiaxin Shi4,*,
Yichen Zhu1,
Chen Liang5,
Junzhong Miao6, Tao Jin1, Zhou Zhao1 Fei Wu1 Shuicheng Yan2 Hanwang Zhang2,3 (* Equal Contribution)
Junzhong Miao6, Tao Jin1, Zhou Zhao1 Fei Wu1 Shuicheng Yan2 Hanwang Zhang2,3 (* Equal Contribution)
1 Zhejiang University
2 Skywork AI,Singapore
3 Nanyang Technological University
4 Huawei Cloud Computing
5 Tsinghua University
6 Harbin Institute of Technology
conda create -n clif python=3.9
pip install diffusers==0.23.1
conda activate clif
We first fine-tuning the customized concepts with contrastive learning.
bash run_train_clif.sh
We then perform text inversion on customized concepts to encode visual details into token embeddings
bash run_train_ti.sh
Finally we train lora and token embeddings together
bash run_train_lora.sh
The evaluation of our method are based on two metrics: text-alignment and image-alignment following Custom Diffusion.
The prompts used in our quantitative evaluations can be found in dataset.
This code is builds on the code from the diffusers library
@InProceedings{pmlr-v235-lin24d,
title = {Non-confusing Generation of Customized Concepts in Diffusion Models},
author = {Lin, Wang and Chen, Jingyuan and Shi, Jiaxin and Zhu, Yichen and Liang, Chen and Miao, Junzhong and Jin, Tao and Zhao, Zhou and Wu, Fei and Yan, Shuicheng and Zhang, Hanwang},
booktitle = {Proceedings of the 41st International Conference on Machine Learning},
pages = {29935--29948},
year = {2024},
series = {Proceedings of Machine Learning Research},
month = {21--27 Jul},
publisher = {PMLR},
pdf = {https://raw.githubusercontent.com/mlresearch/v235/main/assets/lin24d/lin24d.pdf},
url = {https://proceedings.mlr.press/v235/lin24d.html},
}