# 变分自动编码器
A variation encoder is an upgraded version of an automatic encoder. Its structure is similar to that of an automatic encoder, and it is also composed of an encoder and a decoder.

Recall that the autoencoder has a problem, that is, it can't generate images arbitrarily, because we can't construct hidden vectors by ourselves. We need to input the encoding through an image to know what the hidden vector is. Then we can This problem is solved by a variable-segment automatic encoder.

In fact, the principle is particularly simple, only need to add some restrictions to the encoding process, forcing the generated implicit vector to roughly follow a standard normal distribution, which is the biggest difference from the general automatic encoder.

So that we generate a new image is very simple, we only need to give it a standard normal distribution of random implied vectors, so that we can generate the image we want through the decoder, without giving it a raw picture First code.

In general, the implicit vector we get through the encoder is not a standard normal distribution. To measure the similarity between the two distributions, we use KL divergence, which is used to represent the difference between the implicit vector and the standard normal distribution. The loss, another loss is still represented by the mean square error of the generated image and the original image.

The formula for KL divergence is as follows

$$
D{KL} (P || Q) =  \int_{-\infty}^{\infty} p(x) \log \frac{p(x)}{q(x)} dx
$$

## 重Para
To avoid calculating the integrals in KL divergence, we use the technique of re-parameters instead of generating an implicit vector each time, but generating two vectors, one for the mean and one for the standard deviation. Here we default the implicit vector after encoding. After obeying a normal distribution, a normal distribution can be multiplied by the standard deviation plus the mean to synthesize the normal distribution. Finally, loss is expected to produce a normal distribution that conforms to a standard normal distribution. That is, the mean is 0 and the variance is 1

So the standard variable-segment automatic encoder is as follows

![](https://ws4.sinaimg.cn/large/006tKfTcgy1fn15cq6n7pj30k007t0sv.jpg)

So in the end we can define our loss as the following function, summed by mean square error and KL divergence to get a total loss

```
def loss_function(recon_x, x, mu, logvar):
    """
    recon_x: generating images
    x: origin images
    mu: latent mean
    logvar: latent log variance
    """
    MSE = reconstruction_function(recon_x, x)
    # loss = 0.5 * sum(1 + log(sigma^2) - mu^2 - sigma^2)
    KLD_element = mu.pow(2).add_(logvar.exp()).mul_(-1).add_(1).add_(logvar)
    KLD = torch.sum(KLD_element).mul_(-0.5)
    # KL divergence
    return MSE + KLD
```

Below we use the mnist data set to briefly explain the variable automatic encoder


In [1]:
import os

import torch
from torch.autograd import Variable
import torch.nn.functional as F
from torch import nn
from torch.utils.data import DataLoader

from torchvision.datasets import MNIST
from torchvision import transforms as tfs
from torchvision.utils import save_image

In [2]:
im_tfs = tfs.Compose([
    tfs.ToTensor(),
tfs.Normalize([0.5, 0.5, 0.5], [0.5, 0.5, 0.5]) # standardization
])

train_set = MNIST('./mnist', transform=im_tfs)
train_data = DataLoader(train_set, batch_size=128, shuffle=True)

In [3]:
class VAE(nn.Module):
    def __init__(self):
        super(VAE, self).__init__()

        self.fc1 = nn.Linear(784, 400)
        self.fc21 = nn.Linear(400, 20) # mean
        self.fc22 = nn.Linear(400, 20) # var
        self.fc3 = nn.Linear(20, 400)
        self.fc4 = nn.Linear(400, 784)

    def encode(self, x):
        h1 = F.relu(self.fc1(x))
        return self.fc21(h1), self.fc22(h1)

    def reparametrize(self, mu, logvar):
        std = logvar.mul(0.5).exp_()
        eps = torch.FloatTensor(std.size()).normal_()
        if torch.cuda.is_available():
            eps = Variable(eps.cuda())
        else:
            eps = Variable(eps)
        return eps.mul(std).add_(mu)

    def decode(self, z):
        h3 = F.relu(self.fc3(z))
        return F.tanh(self.fc4(h3))

    def forward(self, x):
Mu, logvar = self.encode(x) #编码
z = self.reparametrize(mu, logvar) # Re-parameterized to a normal distribution
Return self.decode(z), mu, logvar # decode, and output mean variance


In [4]:
Net = VAE() # instantiate the network
if torch.cuda.is_available():
    net = net.cuda()

In [5]:
x, _ = train_set[0]
x = x.view(x.shape[0], -1)
if torch.cuda.is_available():
    x = x.cuda()
x = Variable(x)
_, mu, var = net(x)

In [8]:
print(mu)

Variable containing:

Columns 0 to 9 
-0.0307 -0.1439 -0.0435  0.3472  0.0368 -0.0339  0.0274 -0.5608  0.0280  0.2742

Columns 10 to 19 
-0.6221 -0.0894 -0.0933  0.4241  0.1611  0.3267  0.5755 -0.0237  0.2714 -0.2806
[torch.cuda.FloatTensor of size 1x20 (GPU 0)]



It can be seen that for the input, the network can output the mean and variance of the implicit variables, where the mean variance is not yet trained.

Start training below


In [6]:
reconstruction_function = nn.MSELoss(size_average=False)

def loss_function(recon_x, x, mu, logvar):
    """
    recon_x: generating images
    x: origin images
    mu: latent mean
    logvar: latent log variance
    """
    MSE = reconstruction_function(recon_x, x)
    # loss = 0.5 * sum(1 + log(sigma^2) - mu^2 - sigma^2)
    KLD_element = mu.pow(2).add_(logvar.exp()).mul_(-1).add_(1).add_(logvar)
    KLD = torch.sum(KLD_element).mul_(-0.5)
    # KL divergence
    return MSE + KLD

optimizer = torch.optim.Adam(net.parameters(), lr=1e-3)

def to_img(x):
    '''
Define a function to convert the final result back to the image
    '''
    x = 0.5 * (x + 1.)
    x = x.clamp(0, 1)
    x = x.view(x.shape[0], 1, 28, 28)
    return x

In [7]:
for e in range(100):
    for im, _ in train_data:
        im = im.view(im.shape[0], -1)
        im = Variable(im)
        if torch.cuda.is_available():
            im = im.cuda()
        recon_im, mu, logvar = net(im)
loss = loss_function(recon i'm, in, my, lover)
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()

    if (e + 1) % 20 == 0:
        print('epoch: {}, Loss: {:.4f}'.format(e + 1, loss.data[0]))
        save = to_img(recon_im.cpu().data)
        if not os.path.exists('./vae_img'):
            os.mkdir('./vae_img')
        save_image(save, './vae_img/image_{}.png'.format(e + 1))

epoch: 20, Loss: 61.5803
epoch: 40, Loss: 62.9573
epoch: 60, Loss: 63.4285
epoch: 80, Loss: 64.7138
epoch: 100, Loss: 63.3343


You can look at the results obtained with the variable-point auto-encoder, you can find that the effect is much better than the average encoder

![](https://ws1.sinaimg.cn/large/006tKfTcgy1fn1ag8832zj306q0a2gmz.jpg)

We can output the mean value of it


In [14]:
x, _ = train_set[0]
x = x.view(x.shape[0], -1)
if torch.cuda.is_available():
    x = x.cuda()
x = Variable(x)
_, mu, _ = net(x)

In [15]:
print(mu)

Variable containing:

Columns 0 to 9 
 0.3861  0.5561  1.1995 -1.6773  0.9867  0.1244 -0.3443 -1.6658  1.3332  1.1606

Columns 10 to 19 
 0.6898  0.3042  2.1044 -2.4588  0.0504  0.9743  1.1136  0.7872 -0.0777  1.6101
[torch.cuda.FloatTensor of size 1x20 (GPU 0)]



Although the variational autoencoder is better than the general autoencoder and limits the probability distribution of the output code, it still generates the loss by directly calculating the mean square error of the generated picture and the original picture. This method is not good. In the next chapter to generate a confrontation network, we will talk about the limitations of this method of calculating loss, and then introduce a new training method, which is to train the network by generating confrontation training methods instead of Directly compare the mean square error of each pixel of two images
