In [6]:
from IPython.display import Image

- DGM（Deep Generative Modelling），深度生成模型
    - https://www.youtube.com/watch?v=JlmOZZnjzOg
    - https://jmtomczak.github.io/blog/4/4_VAE.html
- https://arxiv.org/pdf/1312.6114
    - Kingma: Adam, Anthropic
- https://github.com/lyeoni/pytorch-mnist-VAE
- https://towardsdatascience.com/difference-between-autoencoder-ae-and-variational-autoencoder-vae-ed7be1c038f2

### LDM (latent variable models)

In [7]:
Image(url='https://jmtomczak.github.io/blog/4/lvm_diagram.png', width=400)

- 隐变量模型
    - 感兴趣的高维对象 $x\in \mathcal X^D$，对于图像，$\mathcal X\in\{0,1,\cdots,255\}$
        - $p(x)$: data distribution
    - 低维隐变量，$z\in \mathcal Z^M$ ($\mathcal Z=\mathbb R$)，将 $\mathcal Z^M$ 称为高维空间的低维流形；
- 生成过程
    - $z\sim p_\lambda(z)$: 红色部分，采样 sampling
    - $x\sim p_\theta(x|z)$: 蓝色部分，生成 generating
    - 概率建模
        - 引入隐变量 $z$ 的联合分布：$p(x,z)=p(z)p(x|z)$
- training，我们只能访问 $x$，我们将未知的部分 $z$ sum out / margin out（积分积掉）
    - $p(x)=\int p(x,z)dz=\int p(x|z)p(z)dz$
        - $p(z)=\int p(x,z)dx$
        - $p(x|z)=\frac{p(x,z)}{p(z)}$
        - $p(z|x)=\frac{p(x,z)}{p(x)}$
    - VAE 就是解决这个复杂积分的方法；

### vae recap

In [8]:
Image(url='./imgs/vae_px.png', width=400)

- VAE: Variational posterior (encoder, $(q(z|x))$) and likelihood function (decoder) are parameterized by NNs;
    - posterior: $q(z|x) \approx p(z|x)$
        - $q(z|x)=\mathcal N(\mu, \sigma^2)$
        - $p(z)=\mathcal N(0, 1)$
        - $KL(q(z|x)|p(z))=KL(\mathcal N(\mu, \sigma)|\mathcal N(0, I))=-\frac12(1+\log(\sigma^2)-\mu^2-\sigma^2)$
    - likelihood: $p(x|z)$
        - $\mathbb E_{q(z|x)}[\log p(x|z)]$
    - $\mathcal L(x)=\mathbb E_{q(z|x)}[\log p(x|z)]-KL(q(z|x)|p(z))$
- Variational autoencoder addresses the issue of **non-regularized latent space** in autoencoder and provides the **generative capability** to the entire space.
    - AutoEncoder 不是生成模型（是用来做压缩重构的），VAE 是生成模型；
    - VAE 训练是学习数据的概率分布，生成是基于从该概率分布中采样的点；
        - `mu, log_var = self.encoder(x.view(-1, 784))`
        - VAE 假设每个输入数据点在潜在空间中对应一个正态分布，而不是一个点
    - The encoder in the AE outputs **latent vectors**.
    - Instead of outputting the vectors in the latent space, the encoder of VAE outputs **parameters of a pre-defined distribution** in the latent space for every input.
        - The VAE then imposes a constraint on this latent distribution forcing it to be **a normal distribution**. This constraint makes sure that the latent space is regularized.
- loss function
    - KL: regularization
  - sum of batch
      - `recon_x.shape`: (bs, 784)
      - `log_var/mu.shape`: (bs, 2)
    ```
    def loss_function(recon_x, x, mu, log_var):
        BCE = F.binary_cross_entropy(recon_x, x.view(-1, 784), reduction='sum')
        KLD = -0.5 * torch.sum(1 + log_var - mu.pow(2) - log_var.exp())
        return BCE, KLD
    ```
- latent space dimension
    - $\mu, \log(\sigma^2)$
    - 维度可调
    - $p(z)=\mathcal N(0, I)$
    - 显然更高的维度，意味着更低的 bce loss（也许更高的 kld loss）
        - 2维只是为了可视化和后续生成的方便
- encode & sample & decode
    - z = mu + std*eps
        - $\eps\sim \mathcal N(0, I)$
    ```
    def forward(self, x):
        mu, log_var = self.encoder(x.view(-1, 784))
        z = self.sampling(mu, log_var)
        return self.decoder(z), mu, log_var
    ```


In [10]:
Image(url='./imgs/ae_nn.png', width=400)

In [9]:
Image(url='./imgs/vae_nn.png', width=400)

### latent space

- vectors sampled from overlaping distribution generates **morphed** data.
    - 潜在空间均匀分布，并且聚类之间没有显著间隙。实际上，看起来相似的数据输入聚类通常在某些区域重叠。
- 对 mu 进行的可视化
    - 2d：mu[0], mu[1]
    - 高维：tsne/umap/pca

In [5]:
Image(url='https://miro.medium.com/v2/resize:fit:1400/format:webp/1*p_xiH7i5QDzATqWdjb4a8w.png', width=600)

### coding & viz

In [20]:
# mnist_vae.py
# https://github.com/lyeoni/pytorch-mnist-VAE
# https://hackernoon.com/how-to-sample-from-latent-space-with-variational-autoencoder