# Hauptkomponentenanalyse vs. 
# Denoising Variational Autoencoders

## _Intuition, Formalismus und Beispiele_

jupyter nbconvert PCAvsDVAE.ipynb --to slides --post serve

# Eine intuitive Perspektive ...

#### "... realistische, hochdimensionale Daten konzentrieren sich in der Nähe einer nichtlinearen, niedrigdimensionalen Mannigfaltigkeit ..."

![](manifold.png)

#### Aber wie lernt man die Mannigfaltigkeit und die Wahrscheinlichkeitsverteilung darauf?

![](manifold-generic.png)

# Hauptkomponentenanalyse 
# (Principal Component Analysis, PCA)
* __unsupervised__ learning
* __linear transformation__ 
    * "encode" a set of observations to a new coordinate system in which the values of the first coordinate (component) have the largest possible variance [2]
    * the resulting coordinates (components) are uncorrelated with the preceeding coordinates
* practically computing
    * __eigendecomposition of the covariance matrix__
    * __singular value decomposition__ of the observations
* used for __dimensionality reduction__
* __reconstructions of the observations__("decoding") from the leading __principal components__ have the __least total squared error__

## Grundlegende Mathematik der PCA

* Let $\{y_i\}^N_{i=1}$ be a set of $N$ observations vectors, each of size $n$, with $n\leq N$. 

* A __linear transformation__ on a finite dimensional vector can be expressed as a __matrix multiplication__: 

$$ \begin{align} x_i = W y_i \end{align} $$  
  
where $y_i \in R^{n}, x_i \in R^{m}$ and $W \in R^{nxm}$. 

* Each $j-th$ element in $x_i$ is the __inner product__ between $y_i$ and the $j-th$ column in $W$, denoted as $w_j$. Let $Y \in R^{nxN}$ be a matrix obtained by horizontally concatenating $\{y_i\}^N_{i=1}$, 

$$ Y = \begin{bmatrix} | ... | \\ y_1 ... y_N \\ | ... | \end{bmatrix} $$

* Given the __linear transformation__, it is clear that:

$$ X = W^TY,  X_0 = W^TY_0, $$

where $Y_0$ is the matrix of centered (i.e. subtract the mean from each each observation).

* In particular, when $W^T$ represents the __transformation applying Principal Component Analysis__, we denote $W = P$. Each column of $P$, denoted $\{p_j\}^n_{j=1}$ is a __loading vector__, whereas each transformed vector $\{x_i\}^N_{i=1}$ is a __principal component__.

* The first loading vector is the unit vector with which the inner products of the observations have the __greatest variance__:

$$ \max w_1^T Y_0Y_0^Tw_1, w_1^Tw_1 = 1$$

* The solution of the previous equation is the first eigenvector of the __sample covariance matrix__ $Y_0Y_0^T$ corresponding to the largest eigenvalue.

http://www.analytik.ethz.ch/vorlesungen/chemometrie/2_PCA_Monitor.pdf

# Autoencoders
* unsupervised neural network
* minimize the error of reconstructions of observations [1]
* 

https://lilianweng.github.io/lil-log/2018/08/12/from-autoencoder-to-beta-vae.html


# PCA vs. Autoencoders
*  an autoencoder with a single fully-connected hidden layer, a linear activation function and a squared error cost function is closely related to PCA - its weights span the principal subspace [3]
* 


# Variational Autoencoders




# Denoising Variational Autoencoders

https://github.com/dojoteef/dvae

https://github.com/block98k/Denoise-VAE

# References and further reading
[1] Ian Goodfellow, Yoshua Bengio and Aaron Courville, Deep Learning, MIT Press, 2016.

[2] Jerome H. Friedman, Robert Tibshirani, and Trevor Hastie, The Elements of Statistical Learning: Data Mining, Inference, and Prediction, Springer, 2017.

[3] Plaut, E., 2018. From principal subspaces to principal components with linear autoencoders. arXiv preprint arXiv:1804.10253.

[4] Im, D.I.J., Ahn, S., Memisevic, R. and Bengio, Y., 2017, February. Denoising criterion for variational auto-encoding framework. In Thirty-First AAAI Conference on Artificial Intelligence.

[5] Rolinek, M., Zietlow, D. and Martius, G., 2019. Variational Autoencoders Pursue PCA Directions (by Accident). In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 12406-12415).

[6] Lei, N., Luo, Z., Yau, S.T. and Gu, D.X., 2018. Geometric understanding of deep learning. arXiv preprint arXiv:1805.10451.

[7] Kingma, D.P. and Welling, M., 2013. Auto-encoding variational bayes. arXiv preprint arXiv:1312.6114.