Skip to content


Switch branches/tags

Name already in use

A tag already exists with the provided branch name. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Are you sure you want to create this branch?

Latest commit


Git stats


Failed to load latest commit information.
Latest commit message
Commit time

Renyi divergence variational inference applied to variational auto-encoders

Update 2: 14 Sept 2016

There are two ways to implement IWAE/VAE with other alpha settings (except alpha = 1.0 which gives you the vanila VAE). One is to actually compute the energy as a scalar and let tensorflow work out the rest for you. The previous naive implementation (see did this. The other follows section 4.2 of the paper, in which you compute the gradients on a list of unormalized log importance weights, and form the final gradient by computing the weighted average of them. My internal use numpy code used this strategy.

So as another quick update I also provide the second strategy implementation in tensorflow. Please see for details -- just a few lines of changes. This is of almost the same flavor as the theano version, except that they treated VAE/IWAE as different cases. In contrast this tensorflow code handles both cases in the same way as justified by the paper.

If you want to compare both solutions, use --loss vae for the first and --loss iwae for the second. You can specify --alpha for both cases and --loss iwae also supports --alpha 1.0 (VAE). Some remarks: First the runtime for both are roughly the same (I have only tested them on my laptop). Second the produced results might differ for a few nats. This is due to the numerical issues for logsumexp.

Unfortunately, implementing VR-max following similar style of still does not give you runtime advantage. So I still keep the dirty tryout Will come back to this -- again stay tuned!


Update 1: 09 Sept 2016

Recently I found that the previous naive implementation in does not give you time savings with the max trick, when compared to my internel use numpy version. This is probably because tensorflow/theano does not automatically recognize not to compute the gradients of the samples I dropped.

So as a temporary update I provide a dirty solution (see that collects the max-weight samples and repeats the VAE procedure for them. Yes I know it's far from optimized, but at least it already gives you 2x speed-up on CPUs (and maybe 1.5x1.7x on GPUs depending on your settings).

Will come back to this issue -- stay tuned!


I provide a tensorflow implementation of VAE training with Renyi divergence. Math details can be found here:

Yingzhen Li and Richard E. Turner. Renyi divergence variational inference. (

I only included some small dataset for testing. To add in more datasets, download them somewhere else and then add them to the data/ directory.

For example, you can download:

MNIST dataset (

and include all the data files in directory data/MNIST/


and include all the data files in directory data/OMNIGLOT/

Frey Face (

and include all the data files in directory data/freyface/

To have a quick test, run

python --data [dataset name] --alpha [alpha value] -k [num of samples] --dimZ [dimension of the latents]

See file for more options. In particular, alpha = 1.0 returns the vanila VAE, alpha = 0.0 gives IWAE.

If you want to see the max trick, add in one more option --backward_pass max.


A tensorflow implementation of VAE training with Renyi divergence






No releases published


No packages published