# Unsupervised Representation Learning

### What is a good representation and why do we need it?

Skim the three first pages of [Representation Learning: A Review and New Perspectives](https://arxiv.org/abs/1206.5538) and consider how and why we learn a good representation of data? 

Even though the datasets we have been using so far always had labels, $y$, connected to $x$, this is rarely the case, as most high-level abstraction labels are given by humans, which is costful and time consuming. So with a limited amount or no labels, how can we then train a classifier? We can of course learn a good representation, where similar data points moves closer together and separates from disimilar points, which can easily be clustered by either k-means, Gaussian mixture model or unsupervised KNN. 

Solution: We can learn a latent representation holding the internal generative structure of the data, $x$, which can be used for unsupervised classification.

Common shallow methods are decompositions into new feature maps. This is called blind signal separation.   

Most commonly used is Principal Component Analysis (PCA), where the different components of the features maps can be ordered according to the variance in the features they explain. Choosing the first number of principal components in the loadings matrix, L, for transforming the features, will result in a low dimensional representation favouring the directions of maximum variance. The diagonal variance matrix (eigenvalues) and loadings matrix (eigenvectors) can be obtained through the eigenvalue decomposition of the covariance matrix:

$$ [L,D] = eig(cov(X)) $$



### AE
$$ x \rightarrow z \rightarrow \hat{x} $$

### DAE
$$ x \rightarrow \tilde{x} + noise \rightarrow z \rightarrow \hat{x} $$


### VAE
$$ x \rightarrow p(z) \rightarrow \hat{x} \sim p(x|z) $$

