Skip to content

dschles70/symvae-aistats2024

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

12 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

symvae-aistats2024

Implementations of experiments from the paper:

Symmetric Equilibrium Learning of VAEs. Boris Flach, Dmitrij Schlesinger, Alexander Shekhovtsov, Proceedings of The 27th International Conference on Artificial Intelligence and Statistics, PMLR 238:3214-3222, 2024.
https://proceedings.mlr.press/v238/flach24a.html

The repository contains three directories: mnist-ladder, mnist-reverse and fmnist-ladder. Thereby, ...-ladder means the direct encoder factorization order as in ladder-VAEs, whereas ...-reverse corresponds to the reverse encoder factorization order as in the Wake-Sleep algorithm, mnist/fmnist denote the dataset used.

The code is pretty similar in all three cases, hence, we shortly describe the usage right here. For the description of the learning methods we refer to the paper. Implementation details can be found in the code.

1) Learning the models

In order to learn models, go to the desired directory and type e.g.
python main_sym.py --call_prefix 0 --nz0 30 --nz1 100 --stepsize 1e-4 --niterations 200000
for symmetric learning and e.g.
python main_elbo.py --call_prefix 1 --nz0 30 --nz1 100 --stepsize 1e-4 --niterations 200000
for learning by ELBO maximization. In these examples we learn a HVAE with 30 bits for $z_0$ and 100 bits for $z_1$, the learning rate is 1e-4, we want to perform 200k gradient update steps (see explanations below). The option --call_prefix is a tag (can be an arbitrary string), which identifies the learned model as well as some additional produced results. This tag can be used for example for generation, continuing the learning from a checkpoint etc.

The results of a call as above are stored in three sub-directories (created if necessary): images, logs and models. The names are self-speaking. The course of the learning process (for example, losses, illustrating images etc.) can be watched using the corresponding notebooks plot_sym.ipynb and plot_elbo.ipynb.

2) Generating images

After a model is trained, the images can be generated by invoking
python generate.py --call_prefix 0 --nz0 30 --nz1 100
The resulting images are stored in ./images/generated_sh_0.png and ./images/generated_sta_0.png (for the model tagged 0, i.e. symmetric learning in this example) for images generated from random codes and from limiting distribution respectively.

3) Computing FID-scores

In order to compute FID-scores, first we need to generate lots of images and store them as image-files. This can be done by the following command:
python generate_data.py --call_prefix 0 --mode sym --nz0 30 --nz1 100
for symmetric learning or
python generate_data.py --call_prefix 1 --mode elbo --nz0 30 --nz1 100
for ELBO-maximization. After that a further subdirectory called generated_data is created and filled with generated images, the option --mode defines the name of subdirectory of generated_data the corresponding images are stored in.

After the necessary image sets are generated, FID-scores can be computed by e.g.
python -m pytorch_fid --device cuda:0 generated_data/elbo/images/ generated_data/elbo/lim_images/
(we used FID implementation from here, so please install it first). In the above example we compute FID between the original dataset and images generated from limiting distribution for the model learned by ELBO.


Finally, after performing all steps as described above for ladder VAE on MNIST we get the following FID scores:

Random code Limiting
ELBO 3.2227 36.6440
Symmetric 1.6543 4.1015

For MNIST with the reverse encoder factorization:

Random code Limiting
ELBO 5.2889 14.2256
Symmetric 1.4746 4.9559

For Fashion MNIST (ladder VAE):

Random code Limiting
ELBO 37.8115 119.1699
Symmetric 43.6204 42.3663

Additional remarks

  • Due to some historical reasons we do not split learning into epochs. We just load the corresponding dataset using standard PyTorch functionality (datasets are downloaded if necessary) and then sample batches randomly, see details in mnist.py/fmnist.py. So, one iteration in our notations is a single gradient update step for a randomly chosen data mini-batch.
  • This code is designed to work with a single GPU. If you have a multi-GPU environment, prepend all calls with something like export CUDA_VISIBLE_DEVICES=<num>; ..., where <num> is the number of GPU to be used, in order to avoid unnecessary GPU memory allocation.
  • There are some differences between the example experiments above and the results given in the paper. The latter were performed with 1M iterations for MNIST and 280k iterations for FashionMNIST. It takes however quite a long time to work. Usually, useful results are obtained much more quickly, e.g. 200k iterations are enough in order to illustrate the usage. Besides, for FashionMNIST we used 50 bits for $z_0$ and 200 bits for $z_1$ in the paper, whereas here we used 30 bits for $z_0$ and 100 bits for $z_1$ as in MNIST. As a consequence, the results / numbers above does not exactly resemble those from the paper. Qualitatively however, the behaviours of the methods are pretty similar.

About

Implementations of experiments from the paper.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published