# Linear Discriminant Analysis (LDA)

---

## LDA & PCA

LDA is commonly used as a dimensionality reduction technique, in the pre-processing step for pattern classification, and has the goal to project a dataset onto a lower-dimensional space, just like PCA. What makes LDA different from PCA is that LDA is interested in the axes that maximize the separation between multiple classes.

---

## Goal

Project a feature space (a dataset $n$-dimensional samples) onto a small subspace $k$ (where $k \le n-1$) while maintaining the class-discriminatory information.

Both PCA and LDA are linear transformation techniques used for dimensional reduction. PCA is described as *unsupervised* but LDA is *supervised* because of the relation to the dependent variable.

---

## LDA Algorithm

**STEP 1**: Compute the $d$-dimensional mean vectors for the different classes from the dataset.

**STEP 2**: Compute the scatter matrices (in-between-class and within-class scatter matrix).

**STEP 3**: Compute the eigenvectors ($e_{1}, e_{2}, ..., e_{d}$) and corresponding eigenvalues ($\lambda_{1}, \lambda_{2}, ..., \lambda_{d}$) for the scatter matrices.

**STEP 4**: Sort the eigenvectors by decreasing eigenvalues and choose $k$ eigenvectors with the largest eigenvalues to form a $d \times k$ dimensional matrix **$W$** (where every columnm represents an eigenvector).

**STEP 5**: Use this $d \times k$ eigenvector matrix to transform the samples onto the new subspace. This can be summarized by the matrix multiplication: **$Y = X \times W$** (where **$X$** is an $n \times d$-dimensional matrix representing the **$n$** samples, and **$y$** are the tranformed $n \times k$-dimensional samples in the new subspace).

---

## Additional Reading

https://sebastianraschka.com/publications/