Cusuh Ham*,
Gemma Canet Tarrés*,
Tu Bui,
James Hays,
Zhe Lin,
John Collomosse
* equal contribution
arXiv | BibTeX | Project Page
Create and activate a Conda or Miniconda environment named cogs
with all the necessary dependencies using the following command:
conda env create -f environment.yaml
conda activate cogs
Download the Pseudosketches dataset. There are train/val input file lists under data/pseudosketches
, where each line contain a tuple of corresponding inputs:
synset_id/sketch.JPEG,synset_id/style.JPEG,synset_id/image.JPEG
The sketch and image are corresponding pairs from the Pseudosketches dataset, while the style image is selected by using a pre-trained ALADIN to compute the most stylistically similar image to the original image.
Train a VQGAN on Pseudosketches with:
python main.py --base configs/pseudosketches_vqgan.yaml -t True --gpus 0,
Then, adjust the checkpoint path under model.params.sketch_encoder_config.params.ckpt_path
in configs/pseudosketches_cogs_transformer.yaml
. Alternatively, you can download our pre-trained Pseudosketches VQGAN checkpoint and place it under checkpoints/
, which corresponds to the default checkpoint path in the config file).
For training a VQGAN on ImageNet, refer to the taming-transformers repository. Alternatively, use their pre-trained ImageNet VQGAN checkpoint and place it under checkpoints/
(the default checkpoint path in the config has the filed renamed as imagenet_vqgan_last.ckpt
). If you trained your own, adjust the checkpoint path under model.params.image_encoder_config.params.ckpt_path
in
configs/pseudosketches_cogs_transformer.yaml
.
We use the same ImageNet VQGAN to encode both the style and ground truth images.
First, download the pre-trained ALADIN checkpoints (aladin_vgg.pt
and resnet_model.pt
), and place them under cogs/modules/style/
. These checkpoints are required to compute the style loss during training.
To train the transformer stage of CoGS, run:
python main.py --base configs/pseudosketches_cogs_transformer.yaml -t True --gpus 0,
We also provide a pre-trained CoGS transformer checkpoint.
Adjust the checkpoint paths under model.params.cogs_transformer_config.params.ckpt_path
, model.params.cogs_transformer_config.image_encoder_config.params.ckpt_path
and model.params.cogs_transformer_config.sketch_encoder_config.params.ckpt_path
in configs/pseudosketches_cogs_vae.yaml
. Then to train the VAE stage of CoGS, run:
python main.py --base configs/pseudosketches_cogs_vae.yaml -t True --gpus 0,
In our experiments, the VAE is trained on a single class. We provide pre-trained VAE checkpoints for multiple classes.
After training a transformer or downloading a pre-trained model, you can sample images by running:
python scripts/sample_cogs_transformer.py --resume <path/to/ckpt/and/config> --out_dir <path/to/output/directory>
For including the optional VAE step at the end of the pipeline, you can sample images by running:
python main.py --base configs/pseudosketches_cogs_vae.yaml -t False --gpus 0,
Adjust the checkpoint path under model.params.params.ckpt_path
to the pre-trained VAE, and the name of the output folder under model.params.params.output_name
in configs/pseudosketches_cogs_vae_test.yaml
Our code is developed from the taming-transformers repository.
@article{ham2022cogs,
title={CoGS: Controllable Generation and Search from Sketch and Style},
author={Ham, Cusuh and Tarres, Gemma Canet and Bui, Tu and Hays, James and Lin, Zhe and Collomosse, John},
journal={European Conference on Computer Vision},
year={2022}
}