Implementations of experiments from the paper:
Symmetric Equilibrium Learning of VAEs. Boris Flach, Dmitrij Schlesinger, Alexander Shekhovtsov, Proceedings of The 27th International Conference on Artificial Intelligence and Statistics, PMLR 238:3214-3222, 2024.
https://proceedings.mlr.press/v238/flach24a.html
The repository contains three directories: mnist-ladder
, mnist-reverse
and fmnist-ladder
. Thereby, ...-ladder
means the direct encoder factorization order as in ladder-VAEs, whereas ...-reverse
corresponds to the reverse encoder factorization order as in the Wake-Sleep algorithm, mnist
/fmnist
denote the dataset used.
The code is pretty similar in all three cases, hence, we shortly describe the usage right here. For the description of the learning methods we refer to the paper. Implementation details can be found in the code.
In order to learn models, go to the desired directory and type e.g.
python main_sym.py --call_prefix 0 --nz0 30 --nz1 100 --stepsize 1e-4 --niterations 200000
for symmetric learning and e.g.
python main_elbo.py --call_prefix 1 --nz0 30 --nz1 100 --stepsize 1e-4 --niterations 200000
for learning by ELBO maximization. In these examples we learn a HVAE with 30 bits for --call_prefix
is a tag (can be an arbitrary string), which identifies the learned model as well as some additional produced results. This tag can be used for example for generation, continuing the learning from a checkpoint etc.
The results of a call as above are stored in three sub-directories (created if necessary): images
, logs
and models
. The names are self-speaking. The course of the learning process (for example, losses, illustrating images etc.) can be watched using the corresponding notebooks plot_sym.ipynb
and plot_elbo.ipynb
.
After a model is trained, the images can be generated by invoking
python generate.py --call_prefix 0 --nz0 30 --nz1 100
The resulting images are stored in ./images/generated_sh_0.png
and ./images/generated_sta_0.png
(for the model tagged 0
, i.e. symmetric learning in this example) for images generated from random codes and from limiting distribution respectively.
In order to compute FID-scores, first we need to generate lots of images and store them as image-files. This can be done by the following command:
python generate_data.py --call_prefix 0 --mode sym --nz0 30 --nz1 100
for symmetric learning or
python generate_data.py --call_prefix 1 --mode elbo --nz0 30 --nz1 100
for ELBO-maximization. After that a further subdirectory called generated_data
is created and filled with generated images, the option --mode
defines the name of subdirectory of generated_data
the corresponding images are stored in.
After the necessary image sets are generated, FID-scores can be computed by e.g.
python -m pytorch_fid --device cuda:0 generated_data/elbo/images/ generated_data/elbo/lim_images/
(we used FID implementation from here, so please install it first). In the above example we compute FID between the original dataset and images generated from limiting distribution for the model learned by ELBO.
Finally, after performing all steps as described above for ladder VAE on MNIST we get the following FID scores:
Random code | Limiting | |
---|---|---|
ELBO | 3.2227 | 36.6440 |
Symmetric | 1.6543 | 4.1015 |
For MNIST with the reverse encoder factorization:
Random code | Limiting | |
---|---|---|
ELBO | 5.2889 | 14.2256 |
Symmetric | 1.4746 | 4.9559 |
For Fashion MNIST (ladder VAE):
Random code | Limiting | |
---|---|---|
ELBO | 37.8115 | 119.1699 |
Symmetric | 43.6204 | 42.3663 |
- Due to some historical reasons we do not split learning into epochs. We just load the corresponding dataset using standard PyTorch functionality (datasets are downloaded if necessary) and then sample batches randomly, see details in
mnist.py
/fmnist.py
. So, one iteration in our notations is a single gradient update step for a randomly chosen data mini-batch. - This code is designed to work with a single GPU. If you have a multi-GPU environment, prepend all calls with something like
export CUDA_VISIBLE_DEVICES=<num>; ...
, where<num>
is the number of GPU to be used, in order to avoid unnecessary GPU memory allocation. - There are some differences between the example experiments above and the results given in the paper. The latter were performed with 1M iterations for MNIST and 280k iterations for FashionMNIST. It takes however quite a long time to work. Usually, useful results are obtained much more quickly, e.g. 200k iterations are enough in order to illustrate the usage. Besides, for FashionMNIST we used 50 bits for
$z_0$ and 200 bits for$z_1$ in the paper, whereas here we used 30 bits for$z_0$ and 100 bits for$z_1$ as in MNIST. As a consequence, the results / numbers above does not exactly resemble those from the paper. Qualitatively however, the behaviours of the methods are pretty similar.