# Linear Discriminant Analysis (LDA)

## Introduction to LDA

**Linear Discriminant Analysis (LDA)** is a statistical method used for dimensionality reduction and classification in machine learning. At its core, LDA seeks to find the linear combinations of features that best separate different classes in a dataset. This allows for efficient representation of data while maximizing class separability.

### Historical Context

Introduced by Ronald A. Fisher in the 1930s, LDA has since become a fundamental tool in the fields of statistics and machine learning. Fisher formulated LDA as a means to find the linear combination of features that maximizes the ratio of between-class variance to within-class variance.

## Theoretical Foundations of LDA

### Mathematical Formulation

#### Objective Function

The primary goal of LDA is to maximize the between-class scatter while minimizing the within-class scatter. The objective function can be expressed as:
$$
J(W) = \frac{{\text{{det}}(W^{-1}B)}}{{\text{{det}}(W^{-1}W)}} 
$$

where $(W)$ is the weight matrix, $(B)$ is the between-class scatter matrix, and $(W^{-1})$ denotes the inverse of $(W)$.

#### Eigenvalue Decomposition

To maximize the objective function, LDA involves the eigenvalue decomposition of \(W^{-1}B\). The eigenvectors corresponding to the largest eigenvalues form the basis for the discriminant subspace.

$$ W^{-1}Bv_i = \lambda_i v_i $$

where $(v_i)$ is the $(i)$-th eigenvector, and $(\lambda_i)$ is the corresponding eigenvalue.


### Assumptions and Limitations

#### Assumptions
1. **Multivariate Normality:** LDA assumes that the features in each class follow a multivariate normal distribution.
2. **Equality of Covariance Matrices:** It assumes that the covariance matrices of different classes are equal.

#### Limitation
1. **Sensitive to Outliers:** LDA can be sensitive to outliers, impacting its performance.
2. **Assumption Violations:** If the assumptions are violated, the effectiveness of LDA may be compromised.

```{warning}
Considering these limitatuins is essential for practitioners to use LDA effectively and understand potential challenges or issues that may arise when applying the method to real-world datasets.
```

## Purpose of LDA

LDA serves a dual purpose: it reduces the dimensionality of the data while retaining information relevant for classification, making it a versatile technique in machine learning applications.

## Dimensionality Reduction with LDA

In high-dimensional spaces, the distance between data points increases, making algorithms prone to overfitting. Working with lower-dimensional data can lead to more computationally efficient models. Therefore, reduced dimensionality makes it easier to visualize data and its inherent patterns. 
LDA is often used for dimensionality reduction. The primary goal is to transform the original feature space into a lower-dimensional space while preserving the discriminatory information between classes. In other words, LDA looks for a projection of the data in a way that maximizes the separation between different classes.

### Objectives of Dimensionality Reduction

1. LDA aims to **maximize the distance between the means** of different classes, enhancing class separability.

2. Simultaneously, LDA seeks to **minimize the scatter within each class**, ensuring compact clusters.

```{admonition} Note
:class: note
LDA accomplishes these objectives by finding a subspace that best captures the essential information for classification.
```

## Classification using LDA

Once LDA has been applied for dimensionality reduction, it can be employed for classification tasks. The process involves training the model on a labeled dataset, where the class labels are known, and learning the discriminative patterns in the reduced-dimensional space.

### Working Principle of LDA in Classification

1. LDA aims to find decision boundaries that best separate different classes in the reduced space.

2. LDA can be interpreted as a probabilistic model. It models the distribution of features for each class and uses Bayes' theorem to calculate the probability of a data point belonging to a particular class.

3. Given a set of features for an unseen data point, LDA can predict the most likely class based on the learned discriminative patterns during training.

### Applications of LDA in Classification

- **Face Recognition:**
  LDA is commonly used for face recognition tasks where reducing the dimensionality of facial features enhances the accuracy of recognition.

- **Medical Diagnosis:**
  In medical diagnostics, LDA can help classify patients into different diagnostic categories based on relevant features.

- **Speech Recognition:**
  LDA can be applied to features extracted from speech signals to classify spoken words or phrases.

## Practical Applications of LDA

### Real-world Examples

#### Facial Recognition Systems:
In computer vision, LDA is applied to extract discriminative features for facial recognition. By capturing the essential characteristics that differentiate faces, LDA enhances the accuracy of recognition systems.


```{admonition} Industry Use Cases
:class: tip, dropdown 

Fraud Detection in Banking

Linear Discriminant Analysis (LDA) proves to be a valuable tool in the banking industry, particularly in the realm of fraud detection. Here's how LDA is applied to address the challenges associated with identifying fraudulent transactions:

*Banks face the constant challenge of distinguishing between legitimate and fraudulent transactions within vast datasets. The goal is to detect unusual patterns or anomalies that may indicate fraudulent activities.

*LDA helps model the characteristics of legitimate and fraudulent transactions by analyzing the underlying patterns in the data. It aims to maximize the separation between different classes, making it effective in distinguishing between normal and potentially fraudulent behavior.

* Let's denote the features of the transactions as $(X)$ and the corresponding class labels as $(y)$. The primary objective is to find the linear combination of features that maximizes the distance between the means of different classes while minimizing the spread within each class. This is achieved through the optimization of the LDA objective function.

*In a practical scenario, the transaction data would be preprocessed, ensuring proper handling of missing values and scaling of features. Next, you'd apply LDA to compute the discriminant functions, leveraging the covariance matrices to identify the most discriminative directions in the data.

*The application of LDA in fraud detection enables the creation of a model that can identify potentially fraudulent transactions based on their deviation from the learned patterns. This contributes to enhancing the security and integrity of banking systems.

By leveraging LDA, banks can significantly improve their ability to detect and prevent fraudulent activities, ultimately safeguarding the interests of both the financial institution and its customers.
```

# Step by Step Guide to Implementing LDA

## Data Preprocessing

```{warning}
Before applying LDA, ensure proper data preprocessing.
```

1. **Handling Missing Values**
   - Address any missing values in the dataset. Fill in missing values or remove instances with missing data to ensure a complete dataset.

2. **Scaling Features**
   - Standardize or normalize features. This step ensures that all features contribute equally to the analysis, preventing any particular feature from dominating the model due to its scale.

3. **Checking Assumptions**
   - Verify that your data meets LDA assumptions. LDA assumes that the features in each class follow a multivariate normal distribution, and the covariance matrices of different classes are equal.


## Model Training

```{admonition} Where the fun begins
:class: note 
Now, let's delve into the details of training your LDA model.
```

#### 1. Compute Covariance Matrices:

The within-class covariance matrix $(S_W)$ and the between-class covariance matrix $(S_B)$ are calculated as follows:

- Within-class covariance matrix $(S_W)$:

  $$S_W = \sum_{i=1}^{c} \sum_{j=1}^{n_i} (x_{ij} - \mu_i)(x_{ij} - \mu_i)^T$$
 
  where $(c)$ is the number of classes, $(n_i)$ is the number of samples in class $(i)$, $(x_{ij})$ is the $(j)$-th sample from class $(i)$, and $(\mu_i)$ is the mean vector of class $(i)$.

- Between-class covariance matrix $(S_B)$:
  
  $$S_B = \sum_{i=1}^{c} n_i (\mu_i - \mu)(\mu_i - \mu)^T$$
  
  where $(c)$ is the number of classes, $(n_i)$ is the number of samples in class $(i)$, $(\mu_i)$ is the mean vector of class $(i)$, and $(\mu)$ is the overall mean vector.

#### 2. Solve Eigenvalue Problem:

Next, solve the eigenvalue problem to obtain the eigenvalues $\lambda$ and corresponding eigenvectors $(v)$ of the matrix $(S_W^{-1}S_B)$. This involves solving the following equation:
  
  $$S_W^{-1}S_Bv = \lambda v$$

#### 3. Select Discriminant Functions:

Choose the top $(k)$ eigenvectors corresponding to the $(k)$ largest eigenvalues to form the matrix $(W)$. The discriminant functions can be defined as:

  $$Y = XW$$ 

  where $(Y)$ is the transformed feature matrix, $(X)$ is the original feature matrix, and $(W)$ contains the selected eigenvectors.

These discriminant functions will serve as the basis for making predictions and defining decision boundaries in your LDA model.

## Evaluation Metrics

After training your LDA model, it's crucial to assess its performance using various evaluation metrics. These metrics provide insights into how well the model generalizes to new, unseen data. Let's explore some commonly used evaluation metrics.

#### 1. Accuracy

Accuracy is the most straightforward metric, representing the ratio of correctly predicted instances to the total number of instances. It is calculated as follows:
  
  $$\text{Accuracy} = \frac{\text{Number of Correct Predictions}}{\text{Total Number of Predictions}}$$


#### 2. Precision

Precision focuses on the accuracy of positive predictions. It is the ratio of correctly predicted positive instances to the total predicted positives, and it is calculated as follows:
  
  $$\text{Precision} = \frac{\text{True Positives}}{\text{True Positives + False Positives}}$$ 


#### 3. Recall (Sensitivity)

Recall, also known as sensitivity or true positive rate, measures the ability of the model to identify all relevant instances. It is calculated as the ratio of true positives to the total actual positives:
  
  $$\text{Recall} = \frac{\text{True Positives}}{\text{True Positives + False Negatives}}$$


#### 4. F1 Score

The F1 score is the harmonic mean of precision and recall, providing a balanced measure between precision and recall. It is calculated as follows:
  
  $$\text{F1 Score} = 2 \times \frac{\text{Precision} \times \text{Recall}}{\text{Precision + Recall}}$$


```{admonition} Remember
:class: Warning 
Evaluating your model using these metrics is crucial to understanding its strengths and weaknesses, guiding potential improvements, and ensuring its suitability for real-world applications.
```

## Code for LDA Implementation

This example assumes you have preprocessed your data, loaded features $(X)$ and labels $(y)$, and split the dataset into training and testing sets. It then applies Linear Discriminant Analysis **(LDA)** and uses a simple Logistic Regression classifier for training and evaluation. The evaluation metrics include accuracy, precision, recall, and F1 score, providing a comprehensive assessment of the model's performance.

```{admonition} Try it
:class: tip
Feel free to adapt the code to your specific dataset and requirements.


```python
from sklearn.discriminant_analysis import LinearDiscriminantAnalysis
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score

# Your data preprocessing and loading here
# Assume you have X (features) and y (labels) loaded and preprocessed

# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Apply LDA
lda = LinearDiscriminantAnalysis()
X_train_lda = lda.fit_transform(X_train, y_train)
X_test_lda = lda.transform(X_test)

# Your model training and evaluation code here
# For simplicity, let's use a basic classifier like Logistic Regression
from sklearn.linear_model import LogisticRegression

# Instantiate and train the model
model = LogisticRegression()
model.fit(X_train_lda, y_train)

# Make predictions on the test set
y_pred = model.predict(X_test_lda)

# Evaluate the model
accuracy = accuracy_score(y_test, y_pred)
precision = precision_score(y_test, y_pred)
recall = recall_score(y_test, y_pred)
f1 = f1_score(y_test, y_pred)

# Print the evaluation metrics
print(f"Accuracy: {accuracy:.2f}")
print(f"Precision: {precision:.2f}")
print(f"Recall: {recall:.2f}")
print(f"F1 Score: {f1:.2f}")
```