# SVD Basics - Optimizations
Optimizations make SVD "robust":
* Reducing sensitivity to outliers (large, sparse errors)
* Handling missing data without biasing the low-rank approximation
* Using alternative norms or iterative methods to approximate SVD

## Background

### Norms

**Norms**
* Functions that assign a non-negative real number to an element in vector space, that satisfy the following properties:
  * Non-negativity: always $\geq 0$, $=0$ only for the zero vector.
  * Scalar multiplication: if element is multiplied by a scalar $a$, then norm is multiplied by $|a|$.
  * Triangle inequality: norm of a sum is $\leq$ sum of norms of individual elements.
* Norms turn abstract vector spaces into measurable ones. They generalize the idea of "distance" beyond the Physical world.

**$L_p$ vector norms**
* $L_2$ norm (Euclidean norm)
  * Size doesn't change if the vector is rotated.
$$
||\overline{x}||_2 = \sqrt{\sum_{i=1}^n x_i^2}
$$

* $L_1$ norm (Manhattan norm)
  * Penalizes small values less that $L_2$ norm.
$$
||\overline{x}||_1 = \sum_{i=1}^n{x_i}
$$

* $L_\infty$ norm (Infinity/Chebyshev norm)
  * Measures the largest absolute value in the vector.
$$
||\overline{x}||_\infty = \max_{i=1} |x_i|
$$

**Matrix norms**
* Frobenius norm - measures the total energy of the matrix grid. 
* Induced norm - if $p=2$, then equals the largest singular value of the matrix.
* Nuclear norm - sum of singular values (for SVD).

## Optimizations

**Outlier rejection**
* Data points that are significantly different from the expected distribution.
* E.g. corrupted data entries due to transmission errors in 5G networks.
* 3 steps
    1. Detect outliers using statistical methods (e.g., Z-score, IQR).
    2. Threshold and remove outliers. 
    3. Compute SVD on the cleaned data.

**Missing data imputation**
* SVD requires a full matrix
* Expectation-maximization (EM) algorithm
  * Missing values initialized with mean or median
  * SVD computed
  * Missing values updated based on SVD results
  * Iterated until convergence

## SVD Algorithms

* Alternating L1 SVD - Replaces $L_2$ norm with $L_1$ norm in the optimization problem.
$$
\arg\min_{U, \Sigma, V} ||A - (U \Sigma V^T)_{ij}||_1
$$

* Principal Component Pursuit (PCP)
* IRLS - Iterative Re-weighted Least Squares
* Kernel robust PCA
* etc