# Chapter 16: Linear Discriminant Analysis (LDA)

## Learning Objectives

In this chapter, you will learn:
- **Mathematical foundations** of Fisher's linear discriminant
- **Dimensionality reduction** for classification
- **Assumptions** about data distribution and homoscedasticity
- **Implementation** from scratch and with Scikit-learn
- **Comparison** with PCA and other dimensionality reduction techniques

## Introduction

Linear Discriminant Analysis (LDA) is both a dimensionality reduction technique and a classifier that finds linear combinations of features that best separate different classes.

**Mathematical Foundation**: LDA maximizes the ratio of between-class variance to within-class variance, finding optimal projection directions for class separation.

## Mathematical Theory

### Fisher's Linear Discriminant

LDA seeks to maximize the Fisher criterion:

$$J(w) = \frac{w^T S_B w}{w^T S_W w}$$

Where:
- $S_B$ is the between-class scatter matrix
- $S_W$ is the within-class scatter matrix
- $w$ is the projection vector

### Scatter Matrices

**Between-class scatter**: $S_B = \sum_{i=1}^{c} N_i (\mu_i - \mu)(\mu_i - \mu)^T$

**Within-class scatter**: $S_W = \sum_{i=1}^{c} \sum_{x \in C_i} (x - \mu_i)(x - \mu_i)^T$

**Citation**: Fisher's discriminant analysis and its extensions are covered in pattern recognition and multivariate statistics literature.