---
title: Neural Encoder Decoder
layout: collection
permalink: /Machine-Learning/Neural-Encoder-Decoder
collection: Machine-Learning
entries_layout: grid
mathjax: true
toc: true
categories:
  - study
tags:
  - mathematics
  - statistics
  - machine-learning 
---

# Non-Linear Latent variable

In a non-linear latent variable model out likelihood is gaussian disributed with a non-linear transformed mean vector 

$$
\mathbf{\mu} = \mathbf{f} (\mathbf{z} , \phi ) 
$$

The prior and likelihood then look like

$$
\begin{align*}
    \mathbb{P}(\mathbf{z}) &= \mathcal{N}(0, I) \\
     \mathbb{P}(\mathbf{x} | \mathbf{z}, \phi  ) &= \mathcal{N}(f(\mathbf{z}, \phi ), \sigma^2 I ) 
\end{align*}
$$

Given an observation $ \mathbf{x}  $ we would then like to understand which hidenn latent variables we're responsible for the creation of $ \mathbf{x}  $, this is given by the posterior:

$$
\mathbb{P}(\mathbf{z} | \mathbf{x} ) = \frac{\mathbb{P}(\mathbf{x} | \mathbf{z} )\mathbb{P}(\mathbf{z})}{\mathbb{P}(\mathbf{x} )}  
$$

There exist no closed form form the posterior as the mean is a non-linear function. We can also not evaluate the evidence.

But sampling from this model is easy, we just draw a latent variable from the prior, pass it through our non-linear function to get the mean and then draw $\mathbb{x}$ with this mean from the likelihood.

Looking at the marginal likelihood of the evidence

$$
\begin{align*}
    \mathbb{P}(\mathbf{x} | \phi ) 
    &= 
    \int \mathbb{P}(\mathbf{x}, \mathbf{z} | \phi ) d \mathbf{z}  \\
    &=
    \int \mathbb{P}(\mathbf{x} | \mathbf{z}, \phi  ) \mathbb{P}(\mathbf{z} ) d \mathbf{z} \\
    &=
    \int \mathcal{N}(\mathbf{f}(\mathbf{z}, \phi ), \sigma^2 I ) \mathcal{N}(0, I) d \mathbf{z}     
\end{align*}

$$

Because $f$ is thus an arbitrary function, this integral doesn't have a closed form, but we can approximate it using the jensens inequality.

$$
\begin{align*}
    \log[\mathbb{P}(\mathbf{x} | \phi ) ] 
    &= 
    \log \left[\int \mathbb{P}(\mathbf{x}, \mathbf{z} | \phi ) d \mathbf{z}  \right] \\
    &=
    \log \left[\int q(\mathbf{z})  \frac{\mathbb{P}(\mathbf{x}, \mathbf{z} | \phi )}{q(\mathbf{z} )}  d \mathbf{z}  \right] \\
    &\geq
    \int q(\mathbf{z})  \log \left[ \frac{\mathbb{P}(\mathbf{x}, \mathbf{z} | \phi )}{q(\mathbf{z} )}\right]  d \mathbf{z}   \\
\end{align*}

$$

This holds true for any distribution $q$. This lower bound is called the evidence lower bound (ELBO). 
We assume that the distribution $q$ has some parameters $ \mathbf{\theta}  $. 
The ELBO then is given as 

$$
ELBO[\mathbf{\theta}, \phi ] = \int q(\mathbf{z}) \log \left[ \frac{\mathbb{P}(\mathbf{x}, \mathbf{z} | \phi  ) }{q(\mathbf{z} | \mathbf{\theta}  )}  \right] d \mathbf{z} 
$$

Becuase we want the tighest lower bound. i.e. approximate our evidence as best as possible, we would thus like to maximize the ELBO as a function of $ \mathbf{\theta}  $ and $\phi$.

$$
\begin{align*}
    ELBO[\mathbf{\theta}, \phi]
    &=
    \int q(\mathbf{z}, \mathbf{\theta}) \log \left[ \frac{\mathbb{P}(\mathbf{x}, \mathbf{z} | \phi)}{q(\mathbf{z}| \mathbf{\theta})} \right] d \mathbf{z} \\
    &=
    \int q(\mathbf{z}, \mathbf{\theta}) \log \left[ \frac{\mathbb{P}(\mathbf{z} | \mathbf{x}, \phi) \mathbb{P}(\mathbf{x} | \phi)}{q(\mathbf{z}| \mathbf{\theta})} \right] d \mathbf{z} \\
    &=
    \int q(\mathbf{z} | \mathbf{\theta}) \log [\mathbb{P}(\mathbf{x} | \phi)] d \mathbf{z} + \int q(\mathbf{z} | \mathbf{\theta}) \log \left[ \frac{\mathbb{P}(\mathbf{z} | \mathbf{x}, \phi)}{q(\mathbf{z}|\mathbf{\theta})} \right] d \mathbf{z} \\
    &=
    \log [\mathbb{P}(\mathbf{x} | \phi)] + \int q(\mathbf{z} | \mathbf{\theta}) \log \left[ \frac{\mathbb{P}(\mathbf{z} | \mathbf{x}, \phi)}{q(\mathbf{z}|\mathbf{\theta})} \right] d \mathbf{z} \\
    &=
    \log [\mathbb{P}(\mathbf{x} | \phi)] - \mathbb{KL}[q(\mathbf{z}|\mathbf{\theta}) || \mathbb{P}(\mathbf{z}|\mathbf{x}, \phi)]
\end{align*}
$$

This is maximized when we have $q(\mathbf{z}|\mathbf{\theta}) = \mathbb{P}(\mathbf{z}|\mathbf{x}, \phi) $. 
We can also write a different expression for the ELBO

$$
\begin{align*}
    ELBO[\mathbf{\theta}, \phi] 
    &= 
    \int q(\mathbf{z} |\mathbf{\theta} ) \log \left[ \frac{\mathbb{P}(\mathbf{x}, \mathbf{z} | \phi)}{q(\mathbf{z}|\mathbf{\theta})} \right] d \mathbf{z} \\
    &=
    \int q(\mathbf{z} |\mathbf{\theta} ) \log \left[ \frac{\mathbb{P}(\mathbf{x} | \mathbf{z}, \phi) \mathbb{P}(\mathbf{z})}{q(\mathbf{z}|\mathbf{\theta})} \right] d \mathbf{z} \\
    &=
    \int q(\mathbf{z} | \mathbf{\theta}) \log [\mathbb{P}(\mathbf{x} | \mathbf{z}, \phi)] d \mathbf{z} + \int q(\mathbf{z}|\mathbf{\theta}) \log \left[ \frac{\mathbb{P}(\mathbf{z})}{q(\mathbf{z}|\mathbf{\theta})} \right] d \mathbf{z} \\
    &=
    \int q(\mathbf{z} | \mathbf{\theta}) \log [\mathbb{P}(\mathbf{x} | \mathbf{z}, \phi)] d \mathbf{z} - \mathbb{KL}(q(\mathbf{z}|\mathbf{\theta}) || \mathbb{P}(\mathbf{z}))
    
\end{align*}
$$