This is a PyTorch implementation of the ICLR 2019 paper Variational Autoencoder with Arbitrary Conditioning. The idea of the paper is to extend the notion of Conditional Variational Autoencoders to enable arbitrary conditioning, i.e. a different subset of features is used to generate samples from the distribution where
is a random subset of observed features and b is a binary mask indicating if a feature is observed or not. The authors postulate that this can be effective especially for imputation and inpainting tasks where only a certain portion of the image is visible.
I used Python 2.7 to run and test the code. I recommend using a virtualenv environment. Here are the dependencies:
opencv-contrib-python==3.4.2.17
scipy>=0.17.0
numpy==1.14.5
torch==0.4.1
torchvision==0.2.1
scikit-image==0.14.2
You can use the requirements.txt file to install the packages.
-
Download the MNIST dataset here and place it in a directory. Process it using PyTorch's dataloader to get
training.ptandtest.pt. -
Download the CelebA dataset from here and place it in a directory. I used the aligned and cropped images of size 64x64. The directory structure should be like this:
Datasets
├── CelebA
│ └───img_align_celeba
│ ├── 000003.png
│ ├── 000048.png
│ └─── ....
└── MNIST
└───processed
├── test.pt
└── training.pt
- For CelebA, the training and validation split is created automatically.
Training networks from scratch or from pretrained networks is very easy. Simply run:
python main.py --mode train --config /path/to/config/file
For inference, and visualizations, run the following command
python main.py --mode val --config /path/to/config/file
The structure of configuration files is inspired from the Detectron framework. More details about the configuration files can be found in CONFIG.md.
The metric used by the paper is negative log-likelihood for a bernoulli distribution (since the digits are binarized). Although the log-likelihood is not expected to be very low since there can be many probable images with the same value of observed image, and hence a single solution may not be the image corresponding to the ground truth but may still be feasible. However, the metric is still well-behaved.
| Negative log-likelihood | |
|---|---|
| MNIST | 0.18557 |
The results are a little more useful for the image imputation tasks since the probability space of feasible images is now restricted given the image and mask. These are some of the baseline tasks to check sanity of the code and method. The paper measures PSNR scores between their and other methods. Note that these numbers are just some baselines used to perform a sanity check.
| PSNR | Random mask (p = 0.9) | Square mask (W = 20) |
|---|---|---|
| CelebA | 22.1513 | 28.7348 |
The other results show numbers for the specific task given for CelebA dataset.
Here is the table for PSNR of inpaintings for different masks. Higher values are better. I used a bigger architecture and ran for more iterations since I didn't have the exact architecture and training iterations in the paper. Here are some results:
| Method/Masks | Center | Pattern | Random | Half |
|---|---|---|---|---|
| Context Encoder* | 21.3 | 19.2 | 20.6 | 15.5 |
| SIIDGM* | 19.4 | 17.4 | 22.8 | 13.7 |
| VAEAC (1 sample)* | 22.1 | 21.4 | 29.3 | 14.9 |
| VAEAC (10 samples)* | 23.7 | 23.3 | 29.3 | 17.4 |
| VAEAC (this repo) | 25.02 | 24.60 | 24.93 | 17.48 |
* = values taken from the paper
Here is the table for PSNR for inpainting using other masks. Higher values are better. Again, the values of PSNR are better than reported in the paper. It could be because I'm using a bigger architecture and/or training for longer time (since the exact architecture and training times aren't specified in the paper).
| Method/Masks | O1 | O2 | O3 | O4 | O5 | O6 |
|---|---|---|---|---|---|---|
| Context Encoder* | 18.6 | 18.4 | 17.9 | 19.0 | 19.1 | 19.3 |
| GFC* | 20.0 | 19.8 | 18.8 | 19.7 | 19.5 | 20.2 |
| VAEAC (1 sample)* | 20.8 | 21.0 | 19.5 | 20.3 | 20.3 | 21.0 |
| VAEAC (10 samples)* | 22.0 | 22.2 | 20.8 | 21.7 | 21.8 | 22.2 |
| VAEAC (this repo) | 27.59 | 33.20 | 30.32 | 31.38 | 32.28 | 28.20 |
* = values taken from the paper
Here are some qualitative results on MNIST. The first image is the input to VAEAC. The other images are the samples drawn from the network. The last image is the ground truth. Note that given the partially observed input, the network successfully learns to output feasible outputs. The conditioning is arbitrary because each instance has a different subset of pixels which is observed.
Here are some results on CelebA dataset with only 10% pixels retained in the input.
Check out RESULTS.md for more results.
I have tried to follow PEP8 style guides for this project (PyLint score: 9.41/10, check out the config here). Inspiration has been taken from Detectron, this repo, and other open source repositories. If you want to contribute, please submit a PR.
17 Mar 2019: Set up the directory structure and general flow of the paper after reading and understanding the paper at the implementation level. Added dataloaders for MNIST and CelebA.
19 Mar 2019: Compiled results for MNIST. Qualitative results look good, although not as diverse as the paper showed. Quantitative results are in the table. Also added results for CelebA dataset, with box masks, and dropping of random pixels.
20 Mar 2019: Starting actual experiments performed in the paper now that sanity check is done.
21 Mar 2019: Ran experiments on three tasks for CelebA - Center, Random, and Half. Both qualitative and quantitative results are at par with the paper.
22 Mar 2019: Ran experiments on all 4 tasks and added qualitative and quantitative results. Also wrote the code for the second set of experiments with O1-O6 masks. Quantitative results look good.
- Upload all pretrained models to Drive.
If you liked this repository, and would like to use it in your work, consider citing the original paper
@inproceedings{
ivanov2018variational,
title={Variational Autoencoder with Arbitrary Conditioning},
author={Oleg Ivanov and Michael Figurnov and Dmitry Vetrov},
booktitle={International Conference on Learning Representations},
year={2019},
url={https://openreview.net/forum?id=SyxtJh0qYm},
}










