# Linear Discriminant Analysis

Linear Discriminant Analysis (LDA) is a well-established machine learning technique for predicting categories. Its main advantages, compared to other classification algorithms such as neural networks and random forests, are that the model is interpretable and that prediction is easy. LDA is also well-featured in popular statistical and machine learning software. This notebook will demonstrate an implementation of LDA with python.

## What is Linear Discriminant Analysis (LDA)?

LDA is a way of comparing groups for statistical significance. LDA makes some simplifying assumptions about your data:

- Each of your groups has multivariate normal (also called multivariate Gaussian) distribution. This is a generalization of the one-dimensional normal distribution to higher dimensions.
- Each group has the same covariance matrix. This is a generalization of variance to multiple dimensions.

Note that in one dimension, LDA reduces to the Student's t-test.

## How does LDA work?

LDA works by projecting the feature space (independent variables) onto a smaller subspace while preserving the class-discriminatory information. In other words, it tries to find a new set of axes for the feature space, such that:

- Maximizes the distance between the means of the classes.
- Minimizes the variation (scatter), within each category.

The two criteria can be combined into a single formula that forms a ratio of the between-class variance to the within-class variance. The axes which maximize this ratio are chosen to be the axes of the new feature space.

Mathematically, the new axes are the eigenvectors of the following equation:

$$S_W^{-1}S_Bv = \lambda v$$

where:

- $S_W$ is the within-class scatter matrix.
- $S_B$ is the between-class scatter matrix.
- $v$ is an eigenvector.
- $\lambda$ is the corresponding eigenvalue.

The scatter matrices are defined as follows:

$$S_W = \sum_{i=1}^{c} S_i$$

$$S_i = \sum_{x \in D_i} (x - m_i) (x - m_i)^T$$

$$S_B = \sum_{i=1}^{c} N_i (m_i - m) (m_i - m)^T$$

where:

- $c$ is the number of classes.
- $D_i$ is the set of observations of the $i$-th class.
- $x$ is a single observation.
- $m_i$ is the mean of the observations of the $i$-th class.
- $m$ is the overall mean of the observations, regardless of class.
- $N_i$ is the number of observations of the $i$-th class.

The eigenvectors are ranked by their corresponding eigenvalues. The eigenvector with the highest eigenvalue is the most informative axis, the eigenvector with the second highest eigenvalue is the second most informative axis, and so on. The set of eigenvectors forms a new set of axes that can be used to represent the observations.

## What are the assumptions of LDA?

LDA makes several assumptions:

- **Independence**: The independent variables are statistically independent.
- **Normality**: The independent variables follow a multivariate normal distribution.
- **Homogeneity of Variances**: The independent variables have the same variance.

If these assumptions are violated, then LDA may not be the best method for the data.

## What are the applications of LDA?

LDA has been successfully applied to many real-world problems:

- **Face Recognition**: LDA is used to recognize faces by projecting the faces into a lower-dimensional space and measuring the distances between faces.
- **Marketing**: LDA can be used to segment the market and target marketing campaigns to specific customer groups.
- **Medical Diagnosis**: LDA can be used to classify patients into different disease categories.

## What are the limitations of LDA?

LDA has several limitations:

- **Sensitive to Outliers**: LDA is sensitive to outliers because they can significantly affect the means and the variances of the independent variables.
- **Assumptions**: The assumptions of LDA may not be valid for all data sets.
- **Multicollinearity**: If the independent variables are highly correlated, then the performance of LDA may be degraded.
- **High-Dimensional Data**: When the number of independent variables is large, the calculation of the scatter matrices and their eigenvalues can be computationally expensive.

## How does LDA compare to other methods?

LDA is similar to Principal Component Analysis (PCA), another popular dimensionality reduction technique. Both LDA and PCA project the feature space onto a lower-dimensional space. PCA tries to find the axes with maximum variance, while LDA tries to find the axes for best class separability. LDA can only be used for classification problems, while PCA can be used for any type of data.

In the next section, we will apply LDA on a real-world data set using Python.

In [None]:
# Import necessary libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.discriminant_analysis import LinearDiscriminantAnalysis as LDA
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
from scipy.linalg import eigh

# Load Iris dataset
iris = load_iris()
X = iris.data
y = iris.target

# Split the data into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=123)

# Create an LDA object
lda = LDA()

# Fit the model
lda.fit(X_train, y_train)

# Predict on the test data
y_pred = lda.predict(X_test)

# Compute the accuracy
accuracy = accuracy_score(y_test, y_pred)
print('Accuracy: {:.2f}'.format(accuracy))

## Interpreting the Results

LDA provides the coefficients that gives the linear combinations of the features that best separate the classes. The larger the coefficient, the more important the corresponding feature is in separating the classes. Let's examine the coefficients:

In [None]:
coeff = lda.coef_
print('Coefficients:\n', coeff)

The coefficients tell us how much the corresponding feature contributes to the separation of the classes. The larger the coefficient, the more important the corresponding feature is in separating the classes.

## Visualizing the Results

To visualize the separation of the classes, we can project the data onto the new axes and plot the projected data:

In [None]:
# Project the data onto the new axes
X_lda = lda.transform(X)

# Plot the projected data
plt.scatter(X_lda[:, 0], X_lda[:, 1], c=y)
plt.xlabel('LD1')
plt.ylabel('LD2')
plt.title('LDA Projection')
plt.show()

In the above plot, different colors represent different classes. We can see that the classes are well separated, indicating that LDA can effectively classify the data.

## Conclusion

In this notebook, we have introduced Linear Discriminant Analysis (LDA) and demonstrated how to perform LDA in Python using the sklearn library. We have also shown how to interpret the results of LDA and visualized the separation of classes. LDA is a powerful tool for classification and dimensionality reduction that can be used in a wide variety of fields.