# KL Divergence between two Gaussian Distributions
In variational autoencoders, the loss function defined on the latent variable contains a term for KL divergence between two distributions. This notebooks aims to walk through the calculation taking the commonly used Gaussian Distrubiton as Example.

## Problem Formulation
Assume we have two probability distributions
* $P(x)\sim \mathcal{N}(\mu_{1},\sigma_{1})$
* $Q(x)\sim \mathcal{N}(\mu_{2},\sigma_{2})$

We want to calculate the KL-Divergence between $P$ and $Q$, defined as
$$D_{KL}(P,Q)=\int_{-\infty}^{+\infty}P(x)\log\frac{P(x)}{Q(x)} \mathrm{d}x$$

(This example uses the natural log with base $e$)

Given $\int_{-\infty}^{+\infty}P(x)\log\frac{P(x)}{Q(x)} \mathrm{d}x =\int_{-\infty}^{+\infty}P(x)\log P(x) \mathrm{d}x - \int_{-\infty}^{+\infty}P(x)\log Q(x) \mathrm{d}x$, calculating the term $\int_{-\infty}^{+\infty}P(x)\log Q(x) \mathrm{d}x$ will be sufficient for this problem.

$$\int_{-\infty}^{+\infty}P(x)\log Q(x) \mathrm{d}x=\int_{-\infty}^{+\infty}P(X)\log\left[\frac{1}{\sqrt{2\pi{\sigma_{2}}^{2}}}
e^{-\frac{(x-\mu_{2})^{2}}{2{\sigma_{2}}^{2}}}\right]\mathrm{d}x\\
=\int_{-\infty}^{+\infty}P(x)\left[\log\frac{1}{\sqrt{2\pi{\sigma_{2}}^{2}}}-\frac{(x-\mu_{2})^{2}}{2{\sigma_{2}}^{2}}\right]\mathrm{d}x\\
=-\frac{1}{2}\log 2\pi{\sigma_{2}}^{2}\int_{-\infty}^{+\infty}P(x)\mathrm{d}x - \frac{1}{2{\sigma_{2}}^{2}}\int_{-\infty}^{+\infty}P(x)(x-\mu_{2})^{2}\mathrm{d}x\\
=-\frac{1}{2}\log 2\pi{\sigma_{2}}^{2} - \frac{1}{2{\sigma_{2}}^{2}}\int_{-\infty}^{+\infty}P(x)\left(x^{2}-2\mu_{2}x +{\mu_{2}}^{2}\right)\mathrm{d}x\\
=-\frac{1}{2}\log 2\pi{\sigma_{2}}^{2} - \frac{1}{2{\sigma_{2}}^{2}}{\mathbb{E}_{X\sim P(x)}}\left[X^{2}-2\mu_{2}X +{\mu_{2}}^{2}\right]$$

Note that $\mathbb{E}(X^{2})=\mathbb{D}(X)+\mathbb{E}^{2}(X)$, we obtain
$${\mathbb{E}_{X\sim P(x)}}\left[X^{2}\right]={\sigma_{1}}^{2}+{\mu_{1}}^{2}$$

Hence $$\int_{-\infty}^{+\infty}P(x)\log Q(x) \mathrm{d}x = 
-\frac{1}{2}\log 2\pi{\sigma_{2}}^{2} - \frac{1}{2{\sigma_{2}}^{2}}\left({\sigma_{1}}^{2}+{\mu_{1}}^{2}-2\mu_{1}\mu_{2}+{\mu_{2}}^{2}\right)\\
= -\frac{1}{2}\log 2\pi{\sigma_{2}}^{2} - \frac{1}{2{\sigma_{2}}^{2}}\left[{\sigma_{1}}^{2}+(\mu_{1}-\mu_{2})^{2}\right]$$

And finally
$$D_{KL}(P,Q)=\log \frac{\sigma_{2}}{\sigma_{1}} - \frac{1}{2} + \frac{1}{2{\sigma_{2}}^{2}}\left[{\sigma_{1}}^{2}+(\mu_{1}-\mu_{2})^{2}\right]$$