Skip to content

Latest commit

 

History

History

biggan_discovery

OroJaR — BigGAN Direction Discovery

Home | PyTorch BigGAN Discovery | TensorFlow ProGAN Regularization | PyTorch Simple GAN Experiments

Complex Left Complex Left Complex Left Complex Left

This repo contains a PyTorch implementation of direction discovery for BigGAN using OroJaR. The code is based on the Hessian Penalty, we thank the authors for their excellent work.

Setup

Follow the simple setup instructions here. The pytorch version we have used to train the models is pytorch1.7.1.

Make sure you are using a recent version of PyTorch (>= 1.6.0); otherwise, you may have trouble loading our checkpoint directions.

Our visualization and training scripts automatically download a pre-trained BigGAN checkpoint for you, or you can download BigGAN model from Google Drive and put them into ./checkpoints dir.

Visualizing Pre-Trained Directions

This repo comes with pre-trained directions from the golden retrievers and churches experiments in our paper; see the checkpoints/directions/orojar directory. To generate videos showcasing each learned direction, run one of the scripts in scripts/visualize/orojar (e.g., scripts/visualize/orojar/vis_goldens_coarse.sh). This will generate several videos demonstrating each of the learned directions. Each row corresponds to a different direction, and each column applies that direction to a different sampled image from the generator. For comparison, we also include pre-trained BigGAN directions from the GAN Latent Discovery repo and Hessian Penalty repo; run scripts/visualize/vis_voynov.sh or scripts in scripts/visualize/hessian to visualize those.

You can add several options to the visualization command (see utils.py for a full list):

  • --path_size controls how "much" to move in the learned directions

  • --directions_to_vis can be used to visualize just a subset of directions (e.g., --directions_to_vis 0 5 86)

  • --fix_class, if specified, will only sample images from the given ImageNet class (you can find a mapping of class indices to human-readable labels here)

  • --load_A controls which directions checkpoint to load from; you can set it to random to visualize random orthogonal directions, coords to see what each individual z-component does, or set it to your own learned directions to visualize them

  • --val_minibatch_size controls the batching for generating the videos; decrease this if you have limited GPU memory

Note that BigGAN, by default, has quite a bit of innate disentanglement between the latent z vector and the class label. This means the directions tend to generalize well to other classes, so feel free to use a different --fix_class argument for visualizing samples of other categories in addition to categories you used for training.

Running Direction Discovery (Training)

To start direction discovery, you can run one of the scripts in scripts/discover/orojar (e.g., discover_coarse_goldens.sh, discover_mid_goldens.sh, etc.). This will launch orojar_discover.py which learns a matrix of shape (ndirs, dim_z), where ndirs indicates the number of directions being learned.

There are several training options you can play with (see utils.py for a full list):

  • --G_path can be set to a pre-trained BigGAN checkpoint to run discovery on (if set to the default value None, we will download a 128x128 model automatically for you)

  • --A_lr controls the learning rate

  • --fix_class, if specified, will restrict the sampled class input to the generator to the specified ImageNet class index. In our experiments, we restricted it to either 207 (golden retrievers) or 497 (churches), but you can try setting this argument to None and sampling classes randomly during training as well.

  • --ndirs specifies the number of directions to be learned

  • --no_ortho can be added to learn an unconstrained matrix of directions (by default, the directions are constrained to be orthonormal to prevent degenerate solutions)

  • --search_space by default is set to 'all', which searches for directions in the entirety of z-space (which by default is 120-dimensional). You can instead set --search_space coarse to search for directions in just the first 40 z-components, --search_space mid to search in the middle 40 z-components or --search_space fine to search in the final 40 z-components (the settings we used for the experiments reported in our paper). This is in a similar spirit as "style mixing" in StyleGAN, where it is often beneficial to take advantage of the natural disentanglement learned by modern GANs. For example, the first 40 z-components in vanilla BigGAN mostly correspond with factors of variation related to object pose while the middle 40 z-components mainly control factors such as lighting and background. You can use this argument to take advantage of this natural disentanglement.

  • --wandb_entity can be specified to enable logging to Weights and Biases (otherwise uses TensorBoard)

  • --vis_during_training can be added to periodically log learned direction GIFs to WandB/TensorBoard

  • --batch_size can be decreased if you run out of GPU memory (in our experiments, we used 2 GPUs with a batch size of 32)

Directions from our Paper

Below are the indices for the directions we reported . You can use --directions_to_vis <indices> to visualize selected directions.

  • Rotation: 0
  • Zoom: 7
  • Shift: 9
  • Colorization: 3
  • Lighting: 6
  • Object Lighting: 4
  • Red Color Filter: 1
  • Brightness: 5
  • White Color Filter: 13
  • Saturation: 20
  • Rotation: 0
  • Zoom: 7
  • Smoosh: 9
  • Background Removal: 0
  • Scene Lighting: 8
  • Object Lighting: 2
  • Colorize: 21
  • Red Color Filter: 5
  • Brightness: 4
  • Green Color Filter: 34
  • Saturation: 17

Citation

If our code aided your research, please cite our paper:

@InProceedings{Wei_2021_ICCV,
    author    = {Wei, Yuxiang and Shi, Yupeng and Liu, Xiao and Ji, Zhilong and Gao, Yuan and Wu, Zhongqin and Zuo, Wangmeng},
    title     = {Orthogonal Jacobian Regularization for Unsupervised Disentanglement in Image Generation},
    booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)},
    month     = {October},
    year      = {2021},
    pages     = {6721-6730}
}

Acknowledgments

This repo builds upon Hessian Penalty and Andy Brock's PyTorch BigGAN library. We thank the authors for open-sourcing their code. The original license can be found in Hessian LICENSE and BigGAN LICENSE.