Vanilla-VAE and usage of kld_weight #11

ksachdeva · 2020-09-22T17:30:03Z

Many thanks for this great effort.

Based on my understanding so far the original VAE does not talk about weighing the kl_divergence_loss. Later beta-vae and many other papers made the case of weighing the kl_div (and essentially treat it as a hyper-parameter).

In your implementations, I see that you consistently use kld_weight = kwards['M_N'] = batch_size/num_of_images.

Is this a norm to select the weight for kl_div loss using the ratio of batch size and a number of images?

Since in the original VAE paper no weighing was done is it okay to use it in vanilla_vae.py?

Regards
Kapil

The text was updated successfully, but these errors were encountered:

AntixK · 2020-09-27T23:28:08Z

It is just the bias correction term for accounting for the minibatch. When small batch-sizes are used, it can lead to a large variance in the KLD value. But it should work without that kld_weight term too.

simonhessner · 2021-10-08T13:26:05Z

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Vanilla-VAE and usage of kld_weight #11

Vanilla-VAE and usage of kld_weight #11

ksachdeva commented Sep 22, 2020

AntixK commented Sep 27, 2020

simonhessner commented Oct 8, 2021

Vanilla-VAE and usage of kld_weight #11

Vanilla-VAE and usage of kld_weight #11

Comments

ksachdeva commented Sep 22, 2020

AntixK commented Sep 27, 2020

simonhessner commented Oct 8, 2021