# Latent Variable Models


## What is a latent variable model?

**A latent variable model is a statistical model where your noisy observation variables depend on the values of hidden - "latent" - variables.**

### Example
Consider the population of hospital, where the $N$ patients  have taken $D$ measurements or tests (eg, body temperature, white blood cell count etc).  We can collect the $j$th patients's tests into a $D$ dimensional vector $$\mathbf{y_j} = [y_{j1},y_{j2},...,y_{jD}], $$ and again collect these $j$ patients into rows of a matrix $$\mathbf{Y} \in \mathbb{R}^{NxD} \text{where} \mathbf{Y}_{j,:} = \mathbf{y_{j}} $$

We may hypothesize that each set of a patients's results $\mathbf{y_j}$ may be the noisy expression of a number of low level factors - ie what diseases they have - eg: do they have flu, chicken pox, typhoid, etc... We could group these into vector $\mathbf{x_j}$, and again a matrix $\mathbf{X}$

So: we hypothesize our observations $\mathbf{Y}$ are noisy (or probabilitisic) expressions of our latents $\mathbf{X}$, such that our generative process for our data is:
$$\text{1. sample from prior: } P(\mathbf{X})$$
$$\text{2. sample from conditional: } p(\mathbf{Y}|\mathbf{X})$$

Our aim as statisticians may be to infer the most-likely latent state  $\mathbf{x_j}$ for each patient's medical records $\mathbf{y_j}$ - in this case so we give the correct treatment! In fact, if we wanted to be Bayesian, we might actually want to find the whole distribution $P(\mathbf{x_j}|\mathbf{y_j})$ to give the range of possible latent states for the observation.


There are also a number of related reasons why may wish to infer the value of these latent variables and their mapping to observations: 
  * it may be easier to perform some tasks  (like classification etc) in this latent space
  * it may help us generate new data artificially
  * it may give us some understand or intuition about the world we are looking at
  


## PPCA and PCA

A lot of people are familiar with Principle Components Analysis. This is the model where you take a cloud of data, subtract the mean, and axis align it to find the axes that explain the greatest variance. 
These 'principle components' can be considered the most 'informative' parts of the model, and a form of dimensionality reduction is only to take the leading 'k' axes of the data.

What people may not be familiar with is this is exactly the same as a noise free case of a simple latent variable model. 

