## VAE
- a Probabilistic take on the Autoencoder
- AE is a model which takes high dimensional input data and compresses it into a smaller representation


- __a VAE maps the input data into the parameters of a probability dist., such as the mean and variance of a Gaussian__
- Produces a continous, structured latent space, which is useful for image generation

<img src='img/0505_1.png' width='400'>  

<center> 
    Hands-On Machine Learning, 2nd edition
</center>
    


### 0. Setup

In [None]:
# !pip install -q tensorflow-probability

In [1]:
# # to generate gifs
# !pip install -q imageio
# !pip install -q git+https://github.com/tensorflow/docs

You should consider upgrading via the 'c:\users\hj\anaconda3\envs\test_ten\python.exe -m pip install --upgrade pip' command.
You should consider upgrading via the 'c:\users\hj\anaconda3\envs\test_ten\python.exe -m pip install --upgrade pip' command.


In [1]:
from IPython import display

import glob
import imageio
import matplotlib.pyplot as plt
import numpy as np
import tensorflow as tf
import tensorflow_probability as tfp
from tensorflow.keras import layers
import time

### 1. Load the MNIST dataset
- A vector of 784 integers (28*28)
- 0-255


- Model each pixel with a Bernoulli dist and statically binarize the data (정적으로 이진화한다)

In [2]:
(train_images, _), (test_images, _) = tf.keras.datasets.mnist.load_data()

In [5]:
def preprocess_image(images):
    images = images.reshape((images.shape[0], 28, 28, 1)) / 255.
    return np.where(images > .5, 1.0, 0.0).astype('float32')

In [6]:
train_images = preprocess_image(train_images)
test_images = preprocess_image(test_images)

In [7]:
train_size = 60000
batch_size = 32
test_size = 10000

### 2. Batch and shuffle the data


In [9]:
train_dataset = (tf.data.Dataset.from_tensor_slices(train_images)
                 .shuffle(train_size).batch(batch_size))
test_data = (tf.data.Dataset.from_tensor_slices(test_images)
            .shuffle(test_size).batch(batch_size))

    - from_tensor_slices : numpy array나 list를 tensor dataset으로 변환

### 3. Define the encoder and decoder networks
- Use two small ConvNets for the Encoder and Decoder networks

> ### Encoder network
> - Defines __the approximate posterior distribution $q(z|x)$__, when $z$ is latent variable
> 
> 
> - In this example, simply model the dist. as a diagonal Gaussian
> - Outputs the mean and log-variance parameters of a factorized Gaussian (use log-variance for numerical stability)

> ### Decoder network
> - defines __the conditional distribution of the observation $p(x|z)$__
> - Model the latent dist.prior $p(z)$ as a unit Gaussian

> ### Reparameterization trick
> <img src='img/0505_2.png' width='600'>
> 
> <br>
> 
> 
> - To generate a sample $z$, you can sample from the latent distribution
> - But, by this sampling operation, backpropagation cannot flow through a random node  
> => __Reparameterization trick!__
> <img src='img/0505_4.png' width='300'>  
> 
> 
> - $\epsilon$ is a random noise, while $\mu$ and $\sigma$ is a fixed value

### Netwrok architecture
- Encoder : 2 Conv. layers & a FC layer
- Decoder : 3 Conv. transpose layers & a FC layer


- Note, it's common practice to __avoid using BN(batch normalization)__, since this may aggravate instability (stochasticity from using mini-batchs + stochasticity from sampling)

In [11]:
class CVAE(tf.keras.Model):
    '''Convolution variational autoencoder'''
    
    def __init__(self, latent_dim):
        super(CVAE, self).__init__()
        self.latent_dim = latent_dim
        self.encoder = tf.keras.Sequential(
        [
            layers.InputLayer(input_shape=(28, 28, 1)),
            layers.Conv2D(filters=32, kernel_size=3, strides=(2, 2), activation='relu'),
            layers.Conv2D(64, 3, strides=(2, 2), activation='relu'),
            layers.Flatten(),
            layers.Dense(latent_dim + latent_dim)
        ])
        
        ''' Conv2D's options
        
        filters : the number of output filters
        kernel_size : size of filter
        strides : stride (순회 간격)
        '''
        
        self.decoder = tf.keras.Sequential([
            layers.InputLayer(input_shape=(latent_dim, )),
            layers.Dense(units=7*7*32, activation=tf.nn.relu),
            layers.Reshape(target_shape=(7, 7, 32)),
            layers.Conv2DTranspose(filters=64, kernel_size=3, strides=2, padding='same',
                                  activation='relu'),
            layers.Conv2DTranspose(32, 3, strides=2, padding='same', activation='relu'),
            # No activation
            layers.Conv2DTranspose(1, 3, strides=1, padding='same')
            
        ])
        
        @tf.function  # @tf.function 데코레이터 -> 자동으로 computational graph 생성
        
        def sample(self, eps=None):
            if eps is None:
                eps = tf.random.normal(shape=(100, self.latent_dim))
            return self.decode(eps, apply_sigmoid=True)
        
        def encode(self, x):
            mean, logvar = tf.split(self.encoder(x), num_or_size_splits=2, axis=1)
            return mean, logvar

            '''num_or_size_splits : 몇개로 분리할 건지'''
        
        def reparameterize(self, mean, logvar):
            eps = tf.random.normal(shape=mean.shape)
            return eps * tf.exp(logvar * .5) + mean
        
        def decode(self, z, apply_sigmoid=False):
            logits = self.decoder(z)
            if apply_sigmoid:
                probs = tf.sigmoid(logits)
                return probs
            return logits

### 4. Define the loss function and the optimizer
- VAEs train by __maximizing ELBO__ (evidence lower bound) on the marginal log-likeligod

<img src='img/0505_7.png' width='450'>

- In pratice, optimizer the single sample Monte Carlo estimate of this expectation ($z$ is sampled from $q(z|x)$)

<img src='img/0505_6.png' width='250'>


<br>

<br>

- cf) the KL term can be computed as follows

<img src='img/0505_8.png' width='400'>

In [36]:
optimizer = tf.keras.optimizers.Adam(1e-4)

def log_noraml_pdf(sample, mean, logvar, raxis=1):
    log2pi = tf.math.log(2. * np.pi)
    return tf.reduce_sum(
        -.5 * ((sample - mean) ** 2. * tf.exp(-logvar) + logvar + log2pi),
        axis=raxis)

    ''' tf.reduce_sum() : 특정 차원을 제거하고 합함 (즉, 특정 차원 부분을 더함) '''
    
def compute_loss(model, x):
    mean, logvar = model.encode(x)
    z = model.reparameterize(mean, logvar)
    x_logit = model.decode(z)
    cross_ent = tf.nn.sigmoid_cross_entropy_with_logits(logits=x_logit, labels=x)
    logpx_z = -tf.reduce_sum(cross_ent, axis=[1, 2, 3])
    logpz = log_normal_pdf(z, 0., 0.)
    logqz_x = log_normal_pdf(z, mean, logvar)
    return -tf.reduce_mean(logpx_z + logpz - logqz_x)

@tf.function
def train_step(model, x, optimizer):
    ''' Excutes one training steps and returns the loss.
    
    This function computes the loss and gradients, and uses the latter to
    update the model's parameters.
    '''
    
    with tf.GradientTape() as tape: 
        loss = compute_loss(model, x)
        
    gradients = tape.gradient(loss, model.trainable_variables)
    optimizer.apply_gradients(zip(gradients, model.trainable_variables))
    
    '''
    tf.GradientTape() : 컨텍스트(context) 안에서 실행된 모든 연산을 
                        테이프(tape)에 "기록"
                        
    tape.gradient(loss, var) : var에 대하여 loss의 gradients 계산
    .apply_gradients : 계산된 gradients를 apply
    '''

### 5. Training
- Start by iterating over the dataset
- During each iteration, pass the image to the encoder to obtain a set of mean and log-var parameters of the approximate posterior $q(z|x)$
- Then, apply the reparameterization trick to sample from $q(z|x)$
- Finally, pass the reparameterized samples to the decoder to obtain the logits of the generative distribution $p(x|z)$

#### Generating images
- After training
- Start by sampling a set of latent vectors from Gaussian prior dist. $p(z)$
- The generator will then convert the latent sample z to logits of the observation, giving a dist. $p(x|z)$

In [14]:
epochs = 10
# set the dimensionality of the latent space to a place for visualization later
latent_dim = 2
num_examples_to_generate = 10

# keeping the random vector constant for generation (prediction) so
# it will be easier to see the improvement
random_vector_for_generation = tf.random.normal(
    shape=[num_examples_to_generate, latent_dim])
model = CVAE(latent_dim)

In [None]:
from IPython.display import SVG
from tensorflow.python.keras.utils.vis_utils import model_to_dot

In [33]:
model.encoder.summary()

Model: "sequential_2"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
conv2d_2 (Conv2D)            (None, 13, 13, 32)        320       
_________________________________________________________________
conv2d_3 (Conv2D)            (None, 6, 6, 64)          18496     
_________________________________________________________________
flatten_1 (Flatten)          (None, 2304)              0         
_________________________________________________________________
dense_2 (Dense)              (None, 4)                 9220      
Total params: 28,036
Trainable params: 28,036
Non-trainable params: 0
_________________________________________________________________


In [35]:
model.decoder.summary()

Model: "sequential_3"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
dense_3 (Dense)              (None, 1568)              4704      
_________________________________________________________________
reshape_1 (Reshape)          (None, 7, 7, 32)          0         
_________________________________________________________________
conv2d_transpose_3 (Conv2DTr (None, 14, 14, 64)        18496     
_________________________________________________________________
conv2d_transpose_4 (Conv2DTr (None, 28, 28, 32)        18464     
_________________________________________________________________
conv2d_transpose_5 (Conv2DTr (None, 28, 28, 1)         289       
Total params: 41,953
Trainable params: 41,953
Non-trainable params: 0
_________________________________________________________________


In [37]:
def generate_and_save_images(model, epoch, test_sample):
    mean, logvar = model.encode(test_sample)
    z = model.reparameterize(mean, logvar)
    predictions = model.sample(z)
    fig = plt.figure(figsize=(4, 4))
    
    for i in range(predictions.shape[0]):
        plt.subplot(4, 4, i+1)
        plt.imshow(predictions[i, :, :, 0], cmap='gray')
        plt.axis('off')
        
    # tight_layout minizes the overlat between 2 sub-plots
    plt.savefig('img/result/CVAE_image_at_epoch_{:04d}.png'.format(epoch))
    plt.show()

In [None]:
# Pick a sample of the test set for generating output images
assert batch_size >= num_examples_to_generate
