PyTorch implementaton of our ECCV 2022 paper "Vector Quantized Image-to-Image Translation". You can visit our project website here.
In this paper, we propose a novel unified framework which is able to tackle image-to-image translation, unconditional generation of input domains and diverse extension based on an existing image.
Vector Quantized Image-to-Image Translation
Yu-Jie Chen*, Shin-I Cheng*, Wei-Chen Chiu, Hung-Yu Tseng, Hsin-Ying Lee
European Conference on Computer Vision (ECCV), 2022 (* equal contribution)
Please cite our paper if you find it useful for your research.
@inproceedings{chen2022eccv,
title = {Vector Quantized Image-to-Image Translation},
author = {Yu-Jie Chen and Shin-I Cheng and Wei-Chen Chiu and Hung-Yu Tseng and Hsin-Ying Lee},
booktitle = {European Conference on Computer Vision (ECCV)},
year = {2022}
}
- Prerequisities: Python 3.6 & Pytorch (at least 1.4.0)
- Clone this repo
git clone https://github.com/cyj407/VQ-I2I.git
cd VQ-I2I
- We provide a conda environment script, please run the following command after cloning our repo.
conda env create -f vqi2i_env.yml
- Yosemite (winter, summer) dataset: You can follow the instructions in CycleGAN website to download the Yosemite (winter, summer) dataset.
- AFHQ (cat, dog, wildlife) You can follow the instructions in StarGAN v2 website to download the AFHQ (cat, dog, wildlife) dataset.
- Portrait (portrait, photography): 6452 photography images from CelebA dataset, 1811 painting images downloaded and cropped from Wikiart.
- Cityscapes (street scene, semantic labeling): 3475 street scenes and the corresponding semantic labelings from the cityscapes dataset.
Please save the dataset images separately, e.g. Yosemite dataset:
trainA
directory for training summer images.trainB
directory for training winter images.testA
directory for testing summer images.testB
directory for testing winter images.
python unpair_train.py --device <gpu_num> --root_dir <dataset_path> \
--dataset <dataset_name>\
--epoch_start <epoch_start> --epoch_end <epoch_end>
- You can also append arguments for hyperparameters, e.g.:
--ne <ne> --ed <ed> --z_channel <z_channel>
.
python pair_train.py --device <gpu_num> --root_dir <dataset_path> \
--dataset <dataset_name>\
--epoch_start <epoch_start> --epoch_end <epoch_end>
- Used on Cityscapes dataset only.
- You can also append arguments for hyperparameters, e.g.:
--ne <ne> --ed <ed> --z_channel <z_channel>
.
- Save the translation results.
python save_transfer.py --device <gpu_num> --root_dir <dataset_path> --dataset <dataset_name> \
--checkpoint_dir <checkpoint_dir> --checkpoint_epoch <checkpoint_epoch> \
--save_name <save_dir_name>
--atob True
: transfer domain A to domain b; otherwise, B to A.--intra_transfer True
: enable intra-domain translation.- You can also modify arguments for hyperparameters, e.g.:
--ne <ne> --ed <ed> --z_channel <z_channel>
.
- Download the pre-trained models, here we provide the pre-trained models for the four datasets.
- Yosemite(summer, winter)256X256: --ed 512, --ne 512, --z_channel 256
- AFHQ(cat, dog)256X256: --ed 256, --ne 256, --z_channel 256
- Portrait(portrait, photography)256X256: --ed 256, --ne 256, --z_channel 256
- Cityscapes(street scene, semantic labeling)256X256: --ed 256, --ne 64, --z_channel 128
python autoregressive_train.py --device <gpu_num> --root_dir <dataset_path> \
--dataset <dataset_name> --first_stage_model <first_stage_model_path> \
--epoch_start <epoch_start> --epoch_end <epoch_end>
- You can also append arguments for hyperparameters, e.g.:
--ne <ne> --ed <ed> --z_channel <z_channel>
.
- Download the pre-trained transformer models, here we provide the pre-trained transformer model for the Yosemite dataset.
- Yosemite(summer, winter)256X256: --ed 512, --ne 512, --z_channel 256
python save_uncondtional.py --device <gpu_num> \
--root_dir <dataset_path> --dataset <dataset_name> \
--first_stage_model <first_stage_model_path> \
--transformer_model <second_stage_model_path> \
--save_name <save_dir_name>
--sty_domain 'B'
: specify to generate domain B style images
python save_extension.py --device <gpu_num> \
--root_dir <dataset_path> --dataset <dataset_name> \
--first_stage_model <first_stage_model_path> \
--transformer_model <second_stage_model_path> \
--save_name <save_dir_name>
--input_domain B
: select domain B images from the testing set as input.--sty_domain A
: select domain A as the referenced styles to achieve translation.--double_extension True
: enable the double-sided extension; defaultFalse
.--pure_extension True
: only extend the input images without translation; defaultFalse
.--extend_w <extend_pixels>
: extends for 128/192 pixels on the width; default128
.
python save_completion.py -device <gpu_num> \
--root_dir <dataset_path> --dataset <dataset_name> \
--first_stage_model <first_stage_model_path> \
--transformer_model <second_stage_model_path> \
--save_name <save_dir_name>
--input_domain B
: select domain B images from the testing set as input.--sty_domain A
: select domain A as the referenced styles to achieve translation.--pure_completion True
: only extend the input images without translation; defaultTrue
.--partial_input top-left
: given top-left corner image as the input. There are two more options,left-half
(given the left-half image as input), andtop-half
(given the top-half image as input).
- The demonstration of all applications (includes transitional stylization) are in
VQ-I2I-Applications.ipynb
Our code is based on VQGAN. The implementation of the disentanglement architecture is borrowed from MUNIT.