In [1]:
import numpy as np

The following code provides a basic, function-based implementation of a VAE. Each function is commented to describe its purpose and how it contributes to the VAE model. In the last cell, the __train_step__ function ties everything together, showing how the data flows through the encoder, sampling process, and decoder, and how the loss is calculated. Remember, this is a simplified version for educational purposes, and in practice, you'd use more complex models and libraries for real-world applications.

## Initializing weights

When initializing the weights and biases for the encoder and decoder in a Variational Autoencoder, we typically draw the initial weights from a normal distribution and set the biases to zero. Let $ \mathbf{W}_{\text{enc}} $ and $ \mathbf{W}_{\text{dec}} $ represent the weight matrices for the encoder and decoder, respectively, and $ \mathbf{b}_{\text{enc}} $ and $ \mathbf{b}_{\text{dec}} $ represent the bias vectors. The dimensions of these matrices and vectors are determined by the dimensions of the input data and the latent space:

$$
\mathbf{W}_{\text{enc}} \sim \mathcal{N}(0, 1), \quad \mathbf{W}_{\text{enc}} \in \mathbb{R}^{\text{input dim} \times \text{latentdim}}
$$
$$
\mathbf{b}_{\text{enc}} = \mathbf{0}, \quad \mathbf{b}_{\text{enc}} \in \mathbb{R}^{\text{latent dim}}
$$
$$
\mathbf{W}_{\text{dec}} \sim \mathcal{N}(0, 1), \quad \mathbf{W}_{\text{dec}} \in \mathbb{R}^{\text{latent dim} \times \text{input dim}}
$$
$$
\mathbf{b}_{\text{dec}} = \mathbf{0}, \quad \mathbf{b}_{\text{dec}} \in \mathbb{R}^{\text{input dim}}
$$

This initialization process is an important step in setting up a neural network before starting the training process.


In [2]:
def initialize_weights(input_dim, latent_dim):
    """
    Initialize the weights and biases for the encoder and decoder.

    :param input_dim: Dimensionality of the input data.
    :param latent_dim: Dimensionality of the latent space.
    :return: A tuple of weights and biases (enc_w, enc_b, dec_w, dec_b).
    """
    enc_w = np.random.randn(input_dim, latent_dim)
    enc_b = np.zeros(latent_dim)
    dec_w = np.random.randn(latent_dim, input_dim)
    dec_b = np.zeros(input_dim)
    return enc_w, enc_b, dec_w, dec_b

In [3]:
input_dim = 784  # Example input dimension
latent_dim = 2   # Example latent dimension
enc_w, enc_b, dec_w, dec_b = initialize_weights(input_dim, latent_dim)

In [4]:
enc_w

array([[ 0.80233873,  0.56551687],
       [ 0.05659242,  1.03044558],
       [ 0.03309176,  0.5305171 ],
       ...,
       [ 0.50934032, -0.60992901],
       [-1.59743326, -0.51895062],
       [ 0.7940917 ,  0.9068555 ]])

## Encoder

The encoder in a Variational Autoencoder maps the input data $ \mathbf{x} $ to the parameters of a latent distribution. Specifically, it computes the mean $ \boldsymbol{\mu} $ and standard deviation $ \boldsymbol{\sigma} $ of the latent space distribution. The mean is computed as a linear transformation of the input data with a weight matrix $ \mathbf{W}_{\text{enc}} $ and a bias vector $ \mathbf{b}_{\text{enc}} $. For simplicity, the standard deviation is assumed to be fixed:

$$
\boldsymbol{\mu} = \mathbf{x} \mathbf{W}_{\text{enc}} + \mathbf{b}_{\text{enc}}
$$

$$
\boldsymbol{\sigma} = \mathbf{1}
$$

Where $ \mathbf{1} $ denotes a vector of ones with the appropriate dimension.


In [6]:
def encoder(x, enc_w, enc_b):
    """
    The encoder function that maps the input to the latent space.

    :param x: Input data.
    :param enc_w: Encoder weights.
    :param enc_b: Encoder biases.
    :return: Mean (mu) and standard deviation (sigma) of the latent space distribution.
    """
    mu = np.dot(x, enc_w) + enc_b
    sigma = np.ones(mu.shape)  # Assuming a fixed standard deviation
    return mu, sigma

In [7]:
x = np.random.randn(100, input_dim)  # Random input data
mu, sigma = encoder(x, enc_w, enc_b)

In [9]:
mu

array([[-18.45172333,  31.27820171],
       [-29.80332442,  -6.23927152],
       [-23.50797805,  31.01750073],
       [-51.99421183, -35.92739959],
       [-17.99465   , -39.84129533],
       [ -5.76375495,   8.9868113 ],
       [ 13.9234647 ,  -7.64509702],
       [ 21.44715055,  14.98285369],
       [ 13.55582687,  39.58754232],
       [  1.74272721,  -7.32442935],
       [  9.49795085,  -9.36024668],
       [ 34.06248475,   4.60357008],
       [ 33.91405612,  29.53842075],
       [  5.90391893, -22.24461306],
       [-48.22923345,  -3.11899728],
       [ 41.47277708,  19.73699114],
       [ 62.05416333, -46.53936069],
       [ 51.70999476,  -6.37074961],
       [-31.20499674,  20.12755612],
       [  0.71224971, -30.70985329],
       [ 13.12195452,  27.57190414],
       [  7.96524881,  22.99133406],
       [ 23.75119982, -41.96028114],
       [ 60.9280171 , -15.8103524 ],
       [ 27.95969991,   6.2084801 ],
       [-22.20911858,  27.17389891],
       [-22.99638747,  13.01904148],
 

## Sampling

In the VAE, a random sample $ \boldsymbol{\epsilon} $ is drawn from a standard normal distribution. This sample is then scaled by the standard deviation $ \boldsymbol{\sigma} $ and shifted by the mean $ \boldsymbol{\mu} $ to produce the latent variable $ \mathbf{z} $:

$$
\mathbf{z} = \boldsymbol{\mu} + \boldsymbol{\sigma} \cdot \boldsymbol{\epsilon}
$$


In [11]:
def sampling(mu, sigma):
    """
    Sampling function to sample from the latent space distribution.

    :param mu: Mean of the latent space distribution.
    :param sigma: Standard deviation of the latent space distribution.
    :return: Sampled latent variable.
    """
    eps = np.random.randn(*mu.shape)
    return mu + sigma * eps

In [12]:
sample = sampling(mu, sigma)

In [13]:
sample

array([[-17.82047493,  30.36330822],
       [-27.60577212,  -4.82905404],
       [-24.08659155,  30.77019172],
       [-52.93688359, -34.86998295],
       [-16.24090578, -40.82944641],
       [ -6.39787977,   8.83171184],
       [ 11.90825485,  -5.7541863 ],
       [ 21.50929071,  15.24117752],
       [ 12.40592017,  38.84888615],
       [  1.52560557,  -7.03715691],
       [  9.11640169,  -9.35686068],
       [ 32.93685855,   3.79359377],
       [ 34.60874374,  28.21672448],
       [  5.97748957, -22.20837791],
       [-48.33853286,  -1.77113827],
       [ 41.07321072,  19.45506964],
       [ 62.35250474, -45.5332415 ],
       [ 50.04223778,  -6.37384827],
       [-31.22300008,  21.17241557],
       [  0.97923168, -29.70550656],
       [ 13.37089585,  30.34585583],
       [  7.86834519,  23.45927488],
       [ 23.84546619, -41.02588206],
       [ 63.56334017, -15.33687521],
       [ 28.81934442,   6.594151  ],
       [-23.66440755,  27.18166119],
       [-22.14373362,  13.4307478 ],
 

## Decoder

The output of the decoder is calculated as the dot product of the latent variable $ \mathbf{z} $ and the decoder weights $ \mathbf{W}_{\text{dec}} $, plus the decoder bias $ \mathbf{b}_{\text{dec}} $:

$$
\mathbf{o} = \mathbf{z} \mathbf{W}_{\text{dec}} + \mathbf{b}_{\text{dec}}
$$


In [14]:
def decoder(z, dec_w, dec_b):
    """
    The decoder function that maps the latent space back to the input space.

    :param z: Sampled latent variable.
    :param dec_w: Decoder weights.
    :param dec_b: Decoder biases.
    :return: Reconstructed input.
    """
    return np.dot(z, dec_w) + dec_b

In [16]:
recon_x = decoder(sample, dec_w, dec_b)

In [17]:
recon_x

array([[ 14.9086372 ,  27.55151299,  30.95954549, ...,  -2.28886593,
         48.54909035,  26.07550394],
       [ 13.98775391, -13.11149067,  27.07841003, ...,  29.80200605,
         -5.11690448,  -7.04823013],
       [ 18.34758026,  26.19219364,  37.7110792 , ...,   3.50937215,
         49.71537239,  25.85048845],
       ...,
       [  6.79284933,  22.30429762,  14.59889524, ...,  -9.03899027,
         34.80116383,  19.76771945],
       [-27.56005445, -40.70138207, -56.71474558, ...,  -4.15795978,
        -76.4437219 , -39.92852412],
       [ 12.44007216,  18.49536024,  25.60617551, ...,   1.7755081 ,
         34.6658773 ,  18.12288878]])

## Loss Function

The loss function for the VAE consists of two parts: the reconstruction loss and the KL divergence. The reconstruction loss is calculated as the mean squared error between the original input $x $ and the reconstructed input $ \hat{x} $:

$$
\text{recon loss} = \frac{1}{n} \sum_{i=1}^{n} (x_i - \hat{x}_i)^2
$$

The KL divergence loss is calculated using the mean $ \mu $ and the standard deviation $ \sigma $ of the latent space distribution:

$$
\text{KL loss} = -\frac{1}{2} \sum_{i=1}^{n} \left(1 + \log(\sigma_i^2) - \mu_i^2 - \sigma_i^2\right)
$$


In [18]:
def loss_function(x, recon_x, mu, sigma):
    """
    Loss function for the VAE, combining reconstruction loss and KL divergence.

    :param x: Original input data.
    :param recon_x: Reconstructed input data.
    :param mu: Mean of the latent space distribution.
    :param sigma: Standard deviation of the latent space distribution.
    :return: Total loss value.
    """
    recon_loss = np.mean((x - recon_x) ** 2)
    kl_loss = -0.5 * np.sum(1 + np.log(sigma**2) - mu**2 - sigma**2)
    return recon_loss + kl_loss

In [20]:
loss = loss_function(x, recon_x, mu, sigma)

In [21]:
loss

83040.88028406404

A training step in a Variational Autoencoder involves several stages. Given an input $ \mathbf{x} $, the encoder generates parameters $ \boldsymbol{\mu} $ and $ \boldsymbol{\sigma} $ for the latent distribution:

$$
\boldsymbol{\mu}, \boldsymbol{\sigma} = \text{encoder}(\mathbf{x}, \mathbf{W}_{\text{enc}}, \mathbf{b}_{\text{enc}})
$$

A latent variable $ \mathbf{z} $ is then sampled from this distribution:

$$
\mathbf{z} = \text{sampling}(\boldsymbol{\mu}, \boldsymbol{\sigma})
$$

This latent variable is passed through the decoder to produce a reconstruction $ \hat{\mathbf{x}} $ of the original input:

$$
\hat{\mathbf{x}} = \text{decoder}(\mathbf{z}, \mathbf{W}_{\text{dec}}, \mathbf{b}_{\text{dec}})
$$

The loss for this training step is computed as a combination of the reconstruction error between $ \mathbf{x} $ and $ \hat{\mathbf{x}} $, and a regularization term from the KL divergence between the latent distribution and the prior:

$$
\mathcal{L} = \text{loss function}(\mathbf{x}, \hat{\mathbf{x}}, \boldsymbol{\mu}, \boldsymbol{\sigma})
$$

This loss $ \mathcal{L} $ is then used to update the model parameters $ \mathbf{W}_{\text{enc}}, \mathbf{b}_{\text{enc}}, \mathbf{W}_{\text{dec}}, \mathbf{b}_{\text{dec}} $ through backpropagation and an optimization algorithm.


In [22]:
def train_step(x, enc_w, enc_b, dec_w, dec_b):
    """
    A single training step for the VAE.

    :param x: Input data for training.
    :param enc_w: Encoder weights.
    :param enc_b: Encoder biases.
    :param dec_w: Decoder weights.
    :param dec_b: Decoder biases.
    :return: Loss value for the current training step.
    """
    mu, sigma = encoder(x, enc_w, enc_b)
    z = sampling(mu, sigma)
    recon_x = decoder(z, dec_w, dec_b)
    return loss_function(x, recon_x, mu, sigma)

In [23]:
# Example usage
input_dim = 784  # Example input dimension
latent_dim = 2   # Example latent dimension
enc_w, enc_b, dec_w, dec_b = initialize_weights(input_dim, latent_dim)
x = np.random.randn(100, input_dim)  # Random input data
loss = train_step(x, enc_w, enc_b, dec_w, dec_b)
print(f"Loss: {loss}")

Loss: 84751.71519722772
