# 1. Non-liner Dimensionality Reduction

## 1.1. Introduction

* **Applications of Dimensionality Reduction**

>* Modelling data on/near manifolds
>* Visualization of high-dimensional data
>* Simple building blocks for complex models (e.g. FA $\rightarrow$ LGSSMs, GPLVM $\rightarrow$ GPSSMs)

* **Dimensionality Reduction: Conceptual Space**

><img src = 'images/image1_01.png' width=500>

## 1.2. DR via Distance

* **Goal:** find a mapping that preserves the distance

>$$d^{(y)}_{nm} = ||y^{(n)}-y^{(m)}|| \approx d^{(x)}_{nm} = ||x^{(n)}-x^{(m)}||$$

* **PCA:** Linear DR

>* Data

>$$\mathcal{D} = \{ \mathbf{y}_{1}, \cdots , \mathbf{y}_{N} \} \;\;\;,\;\;\; \mathbf{y}_{n} \in \mathbb{R}^D \;\;\;,\;\;\; \text{w.l.g. assume} \;\;\; \frac{1}{N} \sum_n \mathbf{y}_{n} = \mathbf{0}$$

>* Linear projection

>$$\mathbf{x}_n = \mathbf{w}^T \mathbf{y}_n \;\;\;,\;\;\; \mathbf{x}_{n} \in \mathbb{R}^N$$

>* Variance

>$$\text{Var}(x) = \frac{1}{N} \sum_n \mathbf{x}_n \mathbf{x}_n^T = \frac{1}{N} \sum_n \mathbf{w}^T \mathbf{y}_n \mathbf{y}_n^T \mathbf{w} = \mathbf{w}^T \left( \frac{1}{N} \sum_n \mathbf{y}_n \mathbf{y}_n^T \right) \mathbf{w} = \mathbf{w}^T \mathbf{\Sigma}_y \mathbf{w}$$

>* Objective Function (regularization by setting $\mathbf{w}^T \mathbf{w} = 1$)

>$$\mathbf{w}^* = \underset{\mathbf{w}}{\text{argmax}} \; \mathbf{w}^T \mathbf{\Sigma}_y \mathbf{w} - \lambda(\mathbf{w}^T \mathbf{w} - 1)$$

>* Solution

>$$\mathbf{\Sigma}_y \mathbf{w}^* = \lambda \mathbf{w}^*$$

>$$\mathbf{w}^*:  N \text{ eigenvectors in the order of decreasing eigenvalues}$$

* **ISOMAP:** Non-linear DR / geodesic distance via neighbourhood graph

>1. Determine the **neighbors** of each point (e.g. kNN)
>2. Construct a **neighborhood graph** (connect each point to its kNNs / edge length: Euclidean distance)
>3. Compute **shortest path** between two nodes (e.g. Dijkstra's algorithm, Floyd-Warshall algorithm, ...)
>4. Compute **lower-dimensional embedding** (MDS - multidimensional scaling)

* **Dijkstra's Algorithm**

>1. Create the **unvisited set** (containing all nodes)
>1. Assign a **tentative distance** to every node ($0$ for initial node, $\infty$ for others)
>1. For every **unvisited neighbour** of the current node,
>  1. calculate the tentative distance through the current node
>  1. update if it is smaller than the current value
>1. **Remove** the current node from the unvisited set
>1. **Stop if:**
>  1. destination node is visited (when planning a route between two specific nodes)
>  1. smallest tentative distance in the unvisited set is $\infty$ (when planning a complete traversal)
>1. **Otherwise:**
>  1. Current node $\leftarrow$ unvisited node with the smallest tentative distance
>  1. Go back to **Step 3**

* **MDS** (a.k.a. **PCoA** - Principal Coordinates Analysis)

>1. Set up the **squared proximity matrix**
>$$$$
>$$D^{(2)} = [d^2_{ij}]$$
>$$$$
>2. Apply **double centering** ($n$: no. of objects) 
>$$$$
>$$B = -\frac{1}{2} JD^{(2)}J \;\;\;,\;\;\; J=I-\frac{1}{n} \mathbf{1} \mathbf{1}^T$$
>$$$$
>3. Determine $m$ largest **eigenvalues and corresponding eigenvectors** of $B$ ($m$: desired dimension)
>4. $X=E_m \Lambda_m^{1/2}$ ($E_m$: matrix of eigenvectors / $\Lambda_m$: diagonal matrix of eigenvalues)