In [3]:
import numpy as np

The following code provides a basic, function-based implementation of a VAE. Each function is commented to describe its purpose and how it contributes to the VAE model. In the last cell, the __train_step__ function ties everything together, showing how the data flows through the encoder, sampling process, and decoder, and how the loss is calculated. Remember, this is a simplified version for educational purposes, and in practice, you'd use more complex models and libraries for real-world applications.

## Initializing weights

When initializing the weights and biases for the encoder and decoder in a Variational Autoencoder, we typically draw the initial weights from a normal distribution and set the biases to zero. Let $ \mathbf{W}_{\text{enc}} $ and $ \mathbf{W}_{\text{dec}} $ represent the weight matrices for the encoder and decoder, respectively, and $ \mathbf{b}_{\text{enc}} $ and $ \mathbf{b}_{\text{dec}} $ represent the bias vectors. The dimensions of these matrices and vectors are determined by the dimensions of the input data and the latent space:

$$
\mathbf{W}_{\text{enc}} \sim \mathcal{N}(0, 1), \quad \mathbf{W}_{\text{enc}} \in \mathbb{R}^{\text{input dim} \times \text{latentdim}}
$$
$$
\mathbf{b}_{\text{enc}} = \mathbf{0}, \quad \mathbf{b}_{\text{enc}} \in \mathbb{R}^{\text{latent dim}}
$$
$$
\mathbf{W}_{\text{dec}} \sim \mathcal{N}(0, 1), \quad \mathbf{W}_{\text{dec}} \in \mathbb{R}^{\text{latent dim} \times \text{input dim}}
$$
$$
\mathbf{b}_{\text{dec}} = \mathbf{0}, \quad \mathbf{b}_{\text{dec}} \in \mathbb{R}^{\text{input dim}}
$$

This initialization process is an important step in setting up a neural network before starting the training process.


In [4]:
def initialize_weights(input_dim, latent_dim):
    """
    Initialize the weights and biases for the encoder and decoder.

    :param input_dim: Dimensionality of the input data.
    :param latent_dim: Dimensionality of the latent space.
    :return: A tuple of weights and biases (enc_w, enc_b, dec_w, dec_b).
    """
    enc_w = np.random.randn(input_dim, latent_dim)
    enc_b = np.zeros(latent_dim)
    dec_w = np.random.randn(latent_dim, input_dim)
    dec_b = np.zeros(input_dim)
    return enc_w, enc_b, dec_w, dec_b

In [5]:
input_dim = 784  # Example input dimension
latent_dim = 2   # Example latent dimension
enc_w, enc_b, dec_w, dec_b = initialize_weights(input_dim, latent_dim)

In [6]:
enc_w.size

1568

## Encoder

The encoder in a Variational Autoencoder maps the input data $ \mathbf{x} $ to the parameters of a latent distribution. Specifically, it computes the mean $ \boldsymbol{\mu} $ and standard deviation $ \boldsymbol{\sigma} $ of the latent space distribution. The mean is computed as a linear transformation of the input data with a weight matrix $ \mathbf{W}_{\text{enc}} $ and a bias vector $ \mathbf{b}_{\text{enc}} $. For simplicity, the standard deviation is assumed to be fixed:

$$
\boldsymbol{\mu} = \mathbf{x} \mathbf{W}_{\text{enc}} + \mathbf{b}_{\text{enc}}
$$

$$
\boldsymbol{\sigma} = \mathbf{1}
$$

Where $ \mathbf{1} $ denotes a vector of ones with the appropriate dimension.


In [7]:
def encoder(x, enc_w, enc_b):
    """
    The encoder function that maps the input to the latent space.

    :param x: Input data.
    :param enc_w: Encoder weights.
    :param enc_b: Encoder biases.
    :return: Mean (mu) and standard deviation (sigma) of the latent space distribution.
    """
    mu = np.dot(x, enc_w) + enc_b
    sigma = np.ones(mu.shape)  # Assuming a fixed standard deviation
    return mu, sigma

In [8]:
x = np.random.randn(100, input_dim)  # Random input data
mu, sigma = encoder(x, enc_w, enc_b)

In [9]:
mu

array([[ 31.84215112,  18.4446171 ],
       [-12.92450276,  64.99140038],
       [  3.03866093,   2.22427563],
       [ -4.11366312,  16.58756981],
       [ -4.39058369, -10.62579385],
       [ 43.97913981,  18.74311296],
       [-12.94964908,  22.09646499],
       [-34.9839025 ,  83.54324054],
       [  8.46465696, -24.45603504],
       [-42.66305087,   4.8495479 ],
       [-17.01702975, -24.45242835],
       [ 44.15808078,  17.68679119],
       [-13.61595207,  18.65836144],
       [ 28.41908738,  70.18911245],
       [-19.44315722,  34.4889411 ],
       [-29.00803889, -53.72001225],
       [-24.81433463, -10.9910458 ],
       [-15.28726885,  -0.10059178],
       [ 11.74361542, -35.46596385],
       [ 41.8495574 ,  48.58975048],
       [-37.5564656 , -20.4067944 ],
       [ 15.57921759,  37.22530866],
       [ 16.72871821,  11.93798056],
       [  4.24876049,  -4.56299685],
       [ 20.41127361,   6.14363229],
       [ 41.89709087,   6.07309175],
       [ 46.38195749, -26.68483984],
 

## Sampling

In the VAE, a random sample $ \boldsymbol{\epsilon} $ is drawn from a standard normal distribution. This sample is then scaled by the standard deviation $ \boldsymbol{\sigma} $ and shifted by the mean $ \boldsymbol{\mu} $ to produce the latent variable $ \mathbf{z} $:

$$
\mathbf{z} = \boldsymbol{\mu} + \boldsymbol{\sigma} \cdot \boldsymbol{\epsilon}
$$


In [13]:
def sampling(mu, sigma):
    """
    Sampling function to sample from the latent space distribution.

    :param mu: Mean of the latent space distribution.
    :param sigma: Standard deviation of the latent space distribution.
    :return: Sampled latent variable.
    """
    eps = np.random.randn(*mu.shape)
    return mu + sigma * eps

In [14]:
sample = sampling(mu, sigma)

In [15]:
sample

array([[ 32.72153578,  17.23722627],
       [-12.63093904,  63.74864285],
       [  2.45277016,   1.85457519],
       [ -4.67685513,  17.19637801],
       [ -3.52709133,  -8.81530127],
       [ 43.65871159,  19.55955331],
       [-14.2537434 ,  24.05102122],
       [-35.39905019,  82.74305034],
       [  7.54047792, -24.21408225],
       [-42.30888159,   4.23175032],
       [-18.42457295, -25.83420253],
       [ 44.77847584,  14.87898252],
       [-12.53948338,  17.59151684],
       [ 29.26408463,  72.17314733],
       [-19.15289944,  34.20912149],
       [-29.58152732, -53.28987589],
       [-24.40759222, -11.99700683],
       [-16.15816136,   1.10963132],
       [ 12.37632705, -35.77236487],
       [ 40.86027632,  49.89172566],
       [-38.48876469, -20.40126653],
       [ 14.3180555 ,  37.5147953 ],
       [ 14.77285301,  12.20697339],
       [  6.59755417,  -3.38227218],
       [ 21.42864232,   6.22135028],
       [ 40.9090912 ,   6.97801677],
       [ 47.5154429 , -26.13530575],
 

## Decoder

The output of the decoder is calculated as the dot product of the latent variable $ \mathbf{z} $ and the decoder weights $ \mathbf{W}_{\text{dec}} $, plus the decoder bias $ \mathbf{b}_{\text{dec}} $:

$$
\mathbf{o} = \mathbf{z} \mathbf{W}_{\text{dec}} + \mathbf{b}_{\text{dec}}
$$


In [16]:
def decoder(z, dec_w, dec_b):
    """
    The decoder function that maps the latent space back to the input space.

    :param z: Sampled latent variable.
    :param dec_w: Decoder weights.
    :param dec_b: Decoder biases.
    :return: Reconstructed input.
    """
    return np.dot(z, dec_w) + dec_b

In [17]:
recon_x = decoder(sample, dec_w, dec_b)

In [18]:
recon_x

array([[ -35.28206563,  -82.83484423,   31.90688011, ...,   41.09382012,
          13.83334433,   42.99562771],
       [-163.22330987,  -61.14319342,  106.54160504, ...,  -56.9769415 ,
          14.85357888,   15.896075  ],
       [  -4.05761783,   -6.95319401,    3.34134047, ...,    2.7518588 ,
           1.19827091,    3.48251245],
       ...,
       [  30.82310001,   26.73971919,  -22.05664065, ...,   -0.89659981,
          -5.12322392,  -11.63526597],
       [ -30.43088138,   33.36960826,   14.15481005, ...,  -44.96856069,
          -4.06176446,  -22.47570235],
       [-127.62966993,  -43.0355889 ,   82.69961257, ...,  -48.21488702,
          10.88602279,    9.71674577]])

## Loss Function

The loss function for the VAE consists of two parts: the reconstruction loss and the KL divergence. The reconstruction loss is calculated as the mean squared error between the original input $x $ and the reconstructed input $ \hat{x} $:

$$
\text{recon loss} = \frac{1}{n} \sum_{i=1}^{n} (x_i - \hat{x}_i)^2
$$

The KL divergence loss is calculated using the mean $ \mu $ and the standard deviation $ \sigma $ of the latent space distribution:

$$
\text{KL loss} = -\frac{1}{2} \sum_{i=1}^{n} \left(1 + \log(\sigma_i^2) - \mu_i^2 - \sigma_i^2\right)
$$


In [19]:
def loss_function(x, recon_x, mu, sigma):
    """
    Loss function for the VAE, combining reconstruction loss and KL divergence.

    :param x: Original input data.
    :param recon_x: Reconstructed input data.
    :param mu: Mean of the latent space distribution.
    :param sigma: Standard deviation of the latent space distribution.
    :return: Total loss value.
    """
    recon_loss = np.mean((x - recon_x) ** 2)
    kl_loss = -0.5 * np.sum(1 + np.log(sigma**2) - mu**2 - sigma**2)
    return recon_loss + kl_loss

In [20]:
loss = loss_function(x, recon_x, mu, sigma)

In [21]:
loss

79382.41793811017

A training step in a Variational Autoencoder involves several stages. Given an input $ \mathbf{x} $, the encoder generates parameters $ \boldsymbol{\mu} $ and $ \boldsymbol{\sigma} $ for the latent distribution:

$$
\boldsymbol{\mu}, \boldsymbol{\sigma} = \text{encoder}(\mathbf{x}, \mathbf{W}_{\text{enc}}, \mathbf{b}_{\text{enc}})
$$

A latent variable $ \mathbf{z} $ is then sampled from this distribution:

$$
\mathbf{z} = \text{sampling}(\boldsymbol{\mu}, \boldsymbol{\sigma})
$$

This latent variable is passed through the decoder to produce a reconstruction $ \hat{\mathbf{x}} $ of the original input:

$$
\hat{\mathbf{x}} = \text{decoder}(\mathbf{z}, \mathbf{W}_{\text{dec}}, \mathbf{b}_{\text{dec}})
$$

The loss for this training step is computed as a combination of the reconstruction error between $ \mathbf{x} $ and $ \hat{\mathbf{x}} $, and a regularization term from the KL divergence between the latent distribution and the prior:

$$
\mathcal{L} = \text{loss function}(\mathbf{x}, \hat{\mathbf{x}}, \boldsymbol{\mu}, \boldsymbol{\sigma})
$$

This loss $ \mathcal{L} $ is then used to update the model parameters $ \mathbf{W}_{\text{enc}}, \mathbf{b}_{\text{enc}}, \mathbf{W}_{\text{dec}}, \mathbf{b}_{\text{dec}} $ through backpropagation and an optimization algorithm.


In [22]:
def train_step(x, enc_w, enc_b, dec_w, dec_b):
    """
    A single training step for the VAE.

    :param x: Input data for training.
    :param enc_w: Encoder weights.
    :param enc_b: Encoder biases.
    :param dec_w: Decoder weights.
    :param dec_b: Decoder biases.
    :return: Loss value for the current training step.
    """
    mu, sigma = encoder(x, enc_w, enc_b)
    z = sampling(mu, sigma)
    recon_x = decoder(z, dec_w, dec_b)
    return loss_function(x, recon_x, mu, sigma)

In [23]:
# Example usage
input_dim = 784  # Example input dimension
latent_dim = 2   # Example latent dimension
enc_w, enc_b, dec_w, dec_b = initialize_weights(input_dim, latent_dim)
x = np.random.randn(100, input_dim)  # Random input data
loss = train_step(x, enc_w, enc_b, dec_w, dec_b)
print(f"Loss: {loss}")

Loss: 77914.00597357837
