QC-StyleGAN - Quality Controllable Image Generation and Manipulation

Dat Viet Thanh Nguyen, Phong Tran The, Tan M. Dinh, Cuong Pham, Anh Tuan Tran
VinAI Research, Vietnam

Abstract: The introduction of high-quality image generation models, particularly the StyleGAN family, provides a powerful tool to synthesize and manipulate images. However, existing models are built upon high-quality (HQ) data as desired outputs, making them unfit for in-the-wild low-quality (LQ) images, which are common inputs for manipulation. In this work, we bridge this gap by proposing a novel GAN structure that allows for generating images with controllable quality. The network can synthesize various image degradation and restore the sharp image via a quality control code. Our proposed QC-StyleGAN can directly edit LQ images without altering their quality by applying GAN inversion and manipulation techniques. It also provides for free an image restoration solution that can handle various degradations, including noise, blur, compression artifacts, and their mixtures. Finally, we demonstrate numerous other applications such as image degradation synthesis, transfer, and interpolation.


Sample images generated by our models on FFHQ (left), AFHQ-Cat (middle), and LSUNChurch (right). For each sample, we provide a pair of sharp (top) and degraded (bottom) images.

Details of the model architecture and experimental results can be found in our following paper.

@inproceedings{
thanh2022qcstylegan,
title={{QC}-Style{GAN} - Quality Controllable Image Generation and Manipulation},
author={Dat Viet Thanh Nguyen and Phong Tran The and Tan M. Dinh and Cuong Pham and Anh Tuan Tran},
booktitle={Advances in Neural Information Processing Systems},
editor={Alice H. Oh and Alekh Agarwal and Danielle Belgrave and Kyunghyun Cho},
year={2022}
}

Please CITE our paper whenever our model implementation is used to help produce published results or incorporated into other software.

Getting Started

Requirements

Linux and Windows are supported, but we recommend Linux for performance and compatibility reasons.
1–8 high-end NVIDIA GPUs with at least 12 GB of memory. We have done all testing and development using NVIDIA DGX-A100 with 8 Tesla A100 GPUs.
64-bit Python 3.7 and PyTorch 1.7.1. See https://pytorch.org/ for PyTorch install instructions.
CUDA toolkit 11.0 or later. Use at least version 11.1 if running on RTX 3090.

The code relies heavily on custom PyTorch extensions that are compiled on the fly using NVCC. On Windows, the compilation requires Microsoft Visual Studio. We recommend installing Visual Studio Community Edition and adding it into PATH using "C:\Program Files (x86)\Microsoft Visual Studio\<VERSION>\Community\VC\Auxiliary\Build\vcvars64.bat".

Installation

Clone this repo:

git clone https://github.com/VinAIResearch/QC-StyleGAN.git
cd QC-StyleGAN

Install dependencies:

conda create -n qcgan python=3.7.3
conda activate qcgan
pip install -r requirements.txt

Preparing datasets

Datasets are stored as uncompressed ZIP archives containing uncompressed PNG files.

Legacy TFRecords datasets are not supported — see below for instructions on how to convert them.

FFHQ:

Step 1: Download the Flickr-Faces-HQ dataset as TFRecords.
Step 2: Extract images from TFRecords using dataset_tool.py from the TensorFlow version of StyleGAN2-ADA:

# Using dataset_tool.py from TensorFlow version at
# https://github.com/NVlabs/stylegan2-ada/
python ../stylegan2-ada/dataset_tool.py unpack \
    --tfrecord_dir=~/ffhq-dataset/tfrecords/ffhq --output_dir=/tmp/ffhq-unpacked

Step 3: Create ZIP archive using dataset_tool.py from this repository:

# Original 1024x1024 resolution.
python dataset_tool.py --source=/tmp/ffhq-unpacked --dest=~/datasets/ffhq.zip

# Scaled down 256x256 resolution.
python dataset_tool.py --source=/tmp/ffhq-unpacked --dest=~/datasets/ffhq256x256.zip \
    --width=256 --height=256

AFHQ: Download the AFHQ dataset and create ZIP archive:

python dataset_tool.py --source=~/downloads/afhq/train/cat --dest=~/datasets/afhqcat.zip

LSUN: Download the desired categories from the LSUN project page and convert to ZIP archive:

python dataset_tool.py --source=~/downloads/lsun/raw/church_lmdb --dest=~/datasets/lsunchurch.zip \
    --transform=center-crop --width=256 --height=256

Experiments

Model Zoo

We provide models pre-trained on the FFHQ, AFHQ Cat, and LSUN Church. You can download them manually and reference by filename LINK.

Path	Description
QC-StyleGAN	Main directory
└ pretrained	Pre-trained models
├ ffhq_256x256.pkl	QC-StyleGAN for FFHQ dataset at 256×256
├ afhqcat_512x512.pkl	QC-StyleGAN for AFHQ Cat dataset at 512×512
├ lsunchurch_256x256.pkl	QC-StyleGAN for LSUN Church at 256×256
├ G_teacher_FFHQ_256x256.pth.tar	G Teacher FFHQ at 256x256, transfer learning from FFHQ using StyleGAN2-ADA
├ G_teacher_AFHQ_Cat_512x512.pth.tar	G Teacher AFHQ Cat at 512x512, pre-trained from StyleGAN2-ADA
├ G_teacher_LSUN_Church_256x256.pth.tar	G Teacher LSUN Church at 256x256, pre-trained from StyleGAN2
├ network-pretrained-FFHQ-256x256.pkl	FFHQ at 256x256, transfer learning from FFHQ using StyleGAN2-ADA
├ network-pretrained-AFHQ-Cat-512x512.pkl	AFHQ Cat at 512x512, transfer learning from AFHQ Cat using StyleGAN2-ADA
├ network-pretrained-LSUN-Church-256x256.pkl	LSUN Church at 256x256, transfer learning from LSUN Church using StyleGAN2
├ afhq_psp.pt	pSp model, trained from FFHQ
├ ffhq_psp.pt	pSp model, trained from AFHQ

Image Generation

Pre-trained networks are stored as *.pkl files on the QC-StyleGAN Google Drive folder that can be referenced using local filenames:

# Generate FFHQ images
python generate.py --outdir=out --trunc=1 --seeds=85,265,297,849 \
    --network=./pretrained/ffhq_256x256.pkl

# Generate AFHQ Cat images
python generate.py --outdir=out --trunc=0.7 --seeds=600-605 \
    --network=./pretrained/afhqcat_512x512.pkl

# Generate LSUN Church images
python generate.py --outdir=out --seeds=0-35 --class=1 \
    --network=./pretrained/lsunchurch_256x256.pkl

Outputs from the above commands are placed under out/*.png, controlled by --outdir. Downloaded network pickles are cached under $HOME/.cache/dnnlib, which can be overridden by setting the DNNLIB_CACHE_DIR environment variable. The default PyTorch extension build directory is $HOME/.cache/torch_extensions, which can be overridden by setting TORCH_EXTENSIONS_DIR.

Training networks

In its most basic form, training new networks boils down to:

# FFHQ
python3 train.py --cfg=paper256 --outdir=./training-runs --data=~/datasets/ffhq256.zip \
	 --workers=8 --gpus=8 --batch=64 --q_dim=16 \
	 --resume=./pretrained/network-pretrained-FFHQ-256x256.pkl \
	 --teacher_ckpt=./pretrained/G_teacher_FFHQ_256x256.pth.tar
# AFHQ Cat
python3 train.py --cfg=paper512 --outdir=./training-runs --data=~/datasets/afhqcat.zip \
	--workers=8 --gpus=8 --batch=64 --q_dim=16 \
	--resume=./pretrained/network-pretrained-AFHQ-Cat-512x512.pkl \
	--teacher_ckpt=./pretrained/G_teacher_AFHQ_Cat_512x512.pth.tar
# LSUN Church
python3 train.py --cfg=church256 --outdir=./training-runs --data=~/datasets/lsunchurch.zip \
	--workers=8 --gpus=8 --batch=64 --q_dim=16 \
	--resume=./pretrained/network-pretrained-LSUN-Church-256x256.pkl \
	--teacher_ckpt=./pretrained/G_teacher_LSUN_Church_256x256.pth.tar

In this example, the results are saved to a newly created directory ~/training-runs/<ID>-mydataset-auto1, controlled by --outdir. The training exports network pickles (network-snapshot-<INT>.pkl) and example images (fakes<INT>.png) at regular intervals (controlled by --snap). For each pickle, it also evaluates FID (controlled by --metrics) and logs the resulting scores in metric-fid50k_full.jsonl (as well as TFEvents if TensorBoard is installed).

The name of the output directory reflects the training configuration. For example, 00000-mydataset-auto1 indicates that the base configuration was auto1, meaning that the hyperparameters were selected automatically for training on one GPU. The base configuration is controlled by --cfg:

Base config	Description
`auto` (default)	Automatically select reasonable defaults based on resolution and GPU count. Serves as a good starting point for new datasets but does not necessarily lead to optimal results.
`paper256`	Reproduce results for FFHQ at 256x256 using 1, 2, 4, or 8 GPUs.
`paper512`	Reproduce results for AFHQ Cat at 512x512 using 1, 2, 4, or 8 GPUs.
`church256`	Reproduce results for LSUN Church at 256x256 using 1, 2, 4, or 8 GPUs.

Quality metrics

By default, train.py automatically computes FID for each network pickle exported during training. We recommend inspecting metric-fid50k_full.jsonl (or TensorBoard) at regular intervals to monitor the training progress. When desired, the automatic computation can be disabled with --metrics=none to speed up the training slightly (3%–9%).

Additional quality metrics can also be computed after the training:

# Pre-trained network pickle: specify dataset explicitly, print result to stdout.
python calc_metrics.py --metrics=fid50k_full --data=~/datasets/ffhq256.zip --mirror=1 \
    --network=./pretrained/ffhq_256x256.pkl

Image restoration

First, run pSp to get the initial latent codes for PTI. To do so, move to the restoration/pSp folder and run the following code:

cd restoration/pSp
python scripts/inference.py \
	--out_path="INPUT_SAVE_DIR" \
	--checkpoint_path=../..pretrained_models/ffhq_psp.pt \
	--data_path="INPUT_IMAGE_DIR" \
    --stylegan_weights=../../pretrained_models/network-pretrained-FFHQ-256x256.pkl \
	--test_batch_size=4 \
	--test_workers=4 \

where --checkpoint path and --stylegan_weights is the provided pretrained pSp and QC-StyleGAN models, respectively (see the model zoo section).

After running the above script, move the the restoration/PTI folder and run the following code:

cd restoration/PTI
python inversion.py --network ../../pretrained_models/network-pretrained-FFHQ-256x256.pkl \
		    --image_dir "INPUT_IMAGE_DIR" \
		    --save_dir "INPUT_SAVE_DIR" \
		    --latent_dir "INPUT_LATENT_DIR" \
		    --gen_degraded

Image editing

First, follow the image restoration section to generate inverted latent codes and modified model weights (since we use PTI for image inversion). Then run the following script:

cd editing
python edit.py \
    -i "INPUT_LATENT_DIR" \
    -m "INPUT_MODEL_DIR" \
    -b boundaries/smiling_boundary.npy \
    -o "INPUT_SAVE_DIR" \
    -s W \

where -b is the path of the editing boundary (check interfaceGAN paper for more information), -i is the root of the latent codes generated in pSp section, -m is the root of the modified QC-StyleGAN weights in PTI section.

Acknowledgments

Our source code is developed based on the codebase of a great series of StyleGAN inversion researches from the Tel Aviv University group, which are: pSp, StyleGAN2-ADA and PTI.

For auxiliary pre-trained models, we specifically thank to MoCov2, CurricularFace and MTCNN. For editing directions, thanks to the authors of InterFaceGAN.

We leverage the PyTorch implementation of StyleGAN2-ADA for the StyleGAN model. All pre-trained StyleGAN models are from the official release of StyleGAN2. We convert the original weights exported by TensorFlow code to compatible with the PyTorch version of StyleGAN2-ADA by using the author's official script.

Overall, thank you so much to the authors for their great works and efforts to release source code and pre-trained weights.

Contacts

If you have any questions, please drop an email to thanhdatnv2712@gmail.com or open an issue in this repository.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Table of contents

QC-StyleGAN - Quality Controllable Image Generation and Manipulation

Getting Started

Requirements

Installation

Preparing datasets

Experiments

Model Zoo

Image Generation

Training networks

Quality metrics

Image restoration

Image editing

Acknowledgments

Contacts

About

Releases

Packages

Contributors 2

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
assets		assets
dnnlib		dnnlib
editing		editing
metrics		metrics
pretrained_models		pretrained_models
restoration		restoration
torch_utils		torch_utils
training		training
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
calc_metrics.py		calc_metrics.py
dataset_tool.py		dataset_tool.py
generate.py		generate.py
legacy.py		legacy.py
projector.py		projector.py
requirements.txt		requirements.txt
style_mixing.py		style_mixing.py
train.py		train.py

License

VinAIResearch/QC-StyleGAN

Folders and files

Latest commit

History

Repository files navigation

Table of contents

QC-StyleGAN - Quality Controllable Image Generation and Manipulation

Getting Started

Requirements

Installation

Preparing datasets

Experiments

Model Zoo

Image Generation

Training networks

Quality metrics

Image restoration

Image editing

Acknowledgments

Contacts

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages