KLD Weight #56

abyildirim · 2022-04-16T22:02:30Z

Hi,

In the VAE paper (https://arxiv.org/pdf/1312.6114.pdf), the VAE loss function has no additional weight parameter for the KLD loss:

However, in the implementation of the Vanilla VAE model, the loss function is written as below:

loss = recons_loss + kld_weight * kld_loss

When I set "kld_weight" to 1 in my model, it could not learn how to reconstruct the images. If I understand correctly, the "kld_weight" reduces the effect of the KLD loss to balance it with the reconstruction loss. However, as I mentioned, it is not defined in the VAE paper. Could anyone please explain to me why this parameter is used and why it is set to 0.00025 by default?

wonjunior · 2022-05-20T15:52:56Z

It is defined in equation 8 of the paper.

abyildirim · 2022-05-20T16:08:58Z

In Equation 8, I see that the MSE loss is also scaled with N/M. However, only the KLD loss is scaled in the code. Shouldn't we scale both of them according to the equation @wonjunior ?

dorazhang93 · 2022-06-15T09:46:38Z

Hi,
I was also confused about the kld_weight here. But I think I found the proper interpretation in this paper, Beta-VAE.

Given the reconstructed loss is averaged on each pixel and kld loss averaged on each latent dimension, the M here is the dimensionality of z and N is the dimensionality of input (for images, W*H). And in this implementation, kld loss was calculated by the sum of all dimensions. so kld_weight was actually 1/N=1/4096~0.00025

angelusualle · 2023-09-07T15:05:22Z

N is the dimensionality of input (for images, W*H)

Couldn't it be W * H * channels? Another part of that doc says

over the individual pixels xn

bkkm78 · 2024-03-30T01:20:37Z

This weight is needed when you use L2 loss as the reconstruction loss. L2 loss (aka MSE) means that you're assuming a Gaussian $p_{\theta}(x|z)$, for which you need to specify a $\sigma$ for the Gaussian distribution as a hyperparameter. This is where the relative weight between the reconstruction loss and the KL divergence comes from. If you instead assume a Bernoulli distribution and thus apply a (per pixel per channel) binary cross-entropy loss, this relative weight is not necessary.

You can refer to section 2.4.3 of Carl Doersch's tutorial on VAE for more details.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

KLD Weight #56

KLD Weight #56

abyildirim commented Apr 16, 2022

wonjunior commented May 20, 2022

abyildirim commented May 20, 2022

dorazhang93 commented Jun 15, 2022

angelusualle commented Sep 7, 2023 •

edited

bkkm78 commented Mar 30, 2024 •

edited

KLD Weight #56

KLD Weight #56

Comments

abyildirim commented Apr 16, 2022

wonjunior commented May 20, 2022

abyildirim commented May 20, 2022

dorazhang93 commented Jun 15, 2022

angelusualle commented Sep 7, 2023 • edited

bkkm78 commented Mar 30, 2024 • edited

angelusualle commented Sep 7, 2023 •

edited

bkkm78 commented Mar 30, 2024 •

edited