[Project Website] [Replicate.ai Project]
StyleGAN-NADA: CLIP-Guided Domain Adaptation of Image Generators
Rinon Gal, Or Patashnik, Haggai Maron, Amit H. Bermano, Gal Chechik, Daniel Cohen-Or
Abstract:
Can a generative model be trained to produce images from a specific domain, guided by a text prompt only, without seeing any image? In other words: can an image generator be trained blindly? Leveraging the semantic power of large scale Contrastive-Language-Image-Pre-training (CLIP) models, we present a text-driven method that allows shifting a generative model to new domains, without having to collect even a single image from those domains. We show that through natural language prompts and a few minutes of training, our method can adapt a generator across a multitude of domains characterized by diverse styles and shapes. Notably, many of these modifications would be difficult or outright impossible to reach with existing methods. We conduct an extensive set of experiments and comparisons across a wide range of domains. These demonstrate the effectiveness of our approach and show that our shifted models maintain the latent-space properties that make generative models appealing for downstream tasks.
This repo contains the implementation of IDE3D-NADA, a Non-Adversarial Domain Adaptation for IDE-3D. You can find the official code of StyleGAN-NADA here.
The following diagram illustrates the process:
Here is a sample:
The code relies on the official implementation of CLIP, and the Rosinality pytorch implementation of StyleGAN2.
- Anaconda
- Pretrained StyleGAN2 generator (can be downloaded from here). You can also download a model from here and convert it with the provited script. See the colab notebook for examples.
In addition, run the following commands:
conda install --yes -c pytorch pytorch=1.7.1 torchvision cudatoolkit=<CUDA_VERSION>
pip install ftfy regex tqdm
pip install git+https://github.com/openai/CLIP.git
To convert a generator from one domain to another, use the colab notebook or run the training script in the ZSSGAN directory:
python train.py --size 512
--batch 2
--n_sample 4
--output_dir /path/to/output/dir
--lr 0.002
--frozen_gen_ckpt /path/to/pretrained_ide3d.pkl
--iter 301
--source_class "Photo"
--target_class "3D Render in the Style of Pixar"
--auto_layer_k 18
--auto_layer_iters 1
--auto_layer_batch 8
--output_interval 50
--clip_models "ViT-B/32" "ViT-B/16"
--clip_model_weights 1.0 1.0
--mixing 0.0
--save_interval 100
--ide3d
Where you should adjust size to match the size of the pre-trained model, and the source_class and target_class descriptions control the direction of change. For an explenation of each argument (and a few additional options), please consult ZSSGAN/options/train_options.py. For most modifications these default parameters should be good enough. See the colab notebook for more detailed directions.
21/08/2021 Instead of using source and target texts, you can now target a style represented by a few images. Simply replace the --source_class
and --target_class
options with:
--style_img_dir /path/to/img/dir
where the directory should contain a few images (png, jpg or jpeg) with the style you want to mimic. There is no need to normalize or preprocess the images in any form.
We will add some pretrained models soon.
We now provide a simple dockerized interface for training models. The UI currently supports a subset of the colab options, but does not require repeated setups.
In order to use the docker version, you must have a CUDA compatible GPU and must install nvidia-docker and docker-compose first.
After cloning the repo, simply run:
cd StyleGAN-nada/
docker-compose up
- Downloading the docker for the first time may take a few minutes.
- While the docker is running, the UI should be available under http://localhost:8888/
- The UI was tested using an RTX3080 GPU with 16GB of RAM. Smaller GPUs may run into memory limits with large models.
If you find the UI useful and want it expended to allow easier access to saved models, support for real image editing etc., please let us know.
Since the adpated model shares the same latent space with the original IDE3D model, it supports 3D-Aware face editing. You can replace the original model path to the apdated one and run the ui following instructions. Here is a sample:
python gen_images.py
--network /path/to/apdated_ide3d.pkl
--seeds 58,96,174,180,179,185
--trunc 0.7
--outdir out
python gen_videos.py
--network /path/to/apdated_ide3d.pkl
--seeds 58,96,174,180,179,185
--grid 3x2
--trunc 0.7
--outdir out
--image_mode image_seg
If you make use of our work, please cite the following papers:
@article{sun2022ide,
title = {IDE-3D: Interactive Disentangled Editing for High-Resolution 3D-aware Portrait Synthesis},
author = {Sun, Jingxiang and Wang, Xuan and Shi, Yichun and Wang, Lizhen and Wang, Jue and Liu, Yebin},
journal = {ACM Transactions on Graphics (TOG)},
volume = {41},
number = {6},
articleno = {270},
pages = {1--10},
year = {2022},
publisher = {ACM New York, NY, USA},
doi={10.1145/3550454.3555506},
}
@inproceedings{sun2021fenerf,
title={FENeRF: Face Editing in Neural Radiance Fields},
author={Sun, Jingxiang and Wang, Xuan and Zhang, Yong and Li, Xiaoyu and Zhang, Qi and Liu, Yebin and Wang, Jue},
booktitle = {Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
year={2022}
}
@misc{gal2021stylegannada,
title={StyleGAN-NADA: CLIP-Guided Domain Adaptation of Image Generators},
author={Rinon Gal and Or Patashnik and Haggai Maron and Gal Chechik and Daniel Cohen-Or},
year={2021},
eprint={2108.00946},
archivePrefix={arXiv},
primaryClass={cs.CV}
}