Name = Goutam Kumar Sah

Roll Number = 2312res271

Experiment No = 9

Title = Linear Discriminant Analysis (LDA)

Aim = Implementation of Linear Discriminant Analysis (LDA) Algorithm

# Theory

*  Linear Discriminant Analysis (LDA) is a supervised dimensionality reduction technique used for
classification and feature extraction. It aims to find the linear combinations of features that best
separate different classes in a dataset. LDA is often used as a preprocessing step before applying
classification algorithms to improve their performance

* It is used to project the features in higher dimension space into a
lower dimension space.

**Key Concepts:**

1. Supervised Nature: LDA uses class labels to find a linear combination of features that best separates the data points from different classes.

2. Maximizing Separability: The goal is to maximize the distance between means of different classes (inter-class variance) while minimizing the scatter within each class (intra-class variance).

3. Feature Projection: LDA projects the data from a higher-dimensional space into a lower-dimensional space while ensuring that the classes remain as separable as possible.

**Applications**: LDA is widely used for reducing dimensionality in high-dimensional datasets while retaining most class-related information. It’s often applied in areas such as:

 * Face Recognition: Distinguishing between images of different individuals.

 * Speech Recognition: Differentiating between different speakers or spoken words.

 * Bioinformatics: Classifying types of genes or protein expressions.

LDA Algorithm Steps :

1. Compute the class means of dependent variable

2. Derive the covariance matrix of the class variable

3. Compute the within class — scatter matrix
(Sl+S2)

4. Compute the between class scatter matrix

5. Compute the Eigen values and eigen vectors
from the within class and between class scatter
matrix

6. Sort the values of eigen values and select
the top k values

7. Find the eigen vectors corresponds to the
top k eigen vectors.

8. Obtain the LDA by taking the dot product of
eigen vectors and original data

In [2]:
import numpy as np
from numpy.linalg import eig
import pandas as pd
from sklearn.datasets import load_iris

class LDA:
    def __init__(self, n_components=None):
        self.n_components = n_components
        self.linear_discriminants = None

    def fit(self, X, y):
        X = np.array(X).astype(float)  # Ensure X is numerical
        n_features = X.shape[1]
        class_labels = np.unique(y)

        # Step 1: Compute the mean vectors for each class
        mean_vectors = []
        for label in class_labels:
            mean_vectors.append(np.mean(X[y == label], axis=0))

        # Step 2: Compute within-class scatter matrix (Sw)
        Sw = np.zeros((n_features, n_features))
        for label, mean_vec in zip(class_labels, mean_vectors):
            class_scatter = np.zeros((n_features, n_features))
            for row in X[y == label]:
                row, mean_vec = row.reshape(n_features, 1), mean_vec.reshape(n_features, 1)
                class_scatter += (row - mean_vec).dot((row - mean_vec).T)
            Sw += class_scatter

        # Step 3: Compute between-class scatter matrix (Sb)
        overall_mean = np.mean(X, axis=0).reshape(n_features, 1)
        Sb = np.zeros((n_features, n_features))
        for label, mean_vec in zip(class_labels, mean_vectors):
            n = X[y == label].shape[0]
            mean_vec = mean_vec.reshape(n_features, 1)
            Sb += n * (mean_vec - overall_mean).dot((mean_vec - overall_mean).T)

        # Step 4: Solve the eigenvalue problem for Sw^(-1) * Sb
        eig_vals, eig_vecs = eig(np.linalg.inv(Sw).dot(Sb))

        # Step 5: Sort eigenvalues and eigenvectors in decreasing order
        eig_pairs = [(np.abs(eig_vals[i]), eig_vecs[:, i]) for i in range(len(eig_vals))]
        eig_pairs = sorted(eig_pairs, key=lambda x: x[0], reverse=True)

        # Step 6: Select top k eigenvectors
        self.linear_discriminants = np.array([eig_pairs[i][1] for i in range(self.n_components)]).T

    def transform(self, X):
        X = np.array(X).astype(float)  # Ensure X is numerical
        return np.dot(X, self.linear_discriminants)

# Load Iris dataset
iris = load_iris()
X = pd.DataFrame(iris.data, columns=iris.feature_names)
y = pd.Series(iris.target, name='target')

# Apply LDA
lda = LDA(n_components=2)
lda.fit(X, y)
X_lda = lda.transform(X)

# Print the transformed data
print("Transformed Data:\n", X_lda)


Transformed Data:
 [[-1.49920971+0.j -1.88675441+0.j]
 [-1.2643595 +0.j -1.59214275+0.j]
 [-1.35525305+0.j -1.73341462+0.j]
 [-1.18495616+0.j -1.62358806+0.j]
 [-1.5169559 +0.j -1.94476227+0.j]
 [-1.40864014+0.j -2.20148038+0.j]
 [-1.28548339+0.j -1.90177269+0.j]
 [-1.38431399+0.j -1.80218401+0.j]
 [-1.12136823+0.j -1.53021571+0.j]
 [-1.31831374+0.j -1.54860234+0.j]
 [-1.58367182+0.j -1.98077996+0.j]
 [-1.28716445+0.j -1.77562146+0.j]
 [-1.31422036+0.j -1.51454424+0.j]
 [-1.37605297+0.j -1.58704672+0.j]
 [-1.94923317+0.j -2.23514437+0.j]
 [-1.77516687+0.j -2.54725756+0.j]
 [-1.63024483+0.j -2.302505  +0.j]
 [-1.42847467+0.j -1.96369972+0.j]
 [-1.50337736+0.j -2.06783361+0.j]
 [-1.48893461+0.j -2.11442674+0.j]
 [-1.35700838+0.j -1.75428449+0.j]
 [-1.3795792 +0.j -2.13271099+0.j]
 [-1.65506386+0.j -2.0431741 +0.j]
 [-1.04356034+0.j -1.92449977+0.j]
 [-1.12096094+0.j -1.699853  +0.j]
 [-1.17443134+0.j -1.54228363+0.j]
 [-1.18744274+0.j -1.93081847+0.j]
 [-1.46468272+0.j -1.86215146+0.j]
 

# By using Sklearn Library

In [3]:
from sklearn.discriminant_analysis import LinearDiscriminantAnalysis as LDA
lda = LDA(n_components=2)
X_train_lda = lda.fit_transform(X, y)

In [4]:
X_train_lda

array([[ 8.06179978e+00, -3.00420621e-01],
       [ 7.12868772e+00,  7.86660426e-01],
       [ 7.48982797e+00,  2.65384488e-01],
       [ 6.81320057e+00,  6.70631068e-01],
       [ 8.13230933e+00, -5.14462530e-01],
       [ 7.70194674e+00, -1.46172097e+00],
       [ 7.21261762e+00, -3.55836209e-01],
       [ 7.60529355e+00,  1.16338380e-02],
       [ 6.56055159e+00,  1.01516362e+00],
       [ 7.34305989e+00,  9.47319209e-01],
       [ 8.39738652e+00, -6.47363392e-01],
       [ 7.21929685e+00,  1.09646389e-01],
       [ 7.32679599e+00,  1.07298943e+00],
       [ 7.57247066e+00,  8.05464137e-01],
       [ 9.84984300e+00, -1.58593698e+00],
       [ 9.15823890e+00, -2.73759647e+00],
       [ 8.58243141e+00, -1.83448945e+00],
       [ 7.78075375e+00, -5.84339407e-01],
       [ 8.07835876e+00, -9.68580703e-01],
       [ 8.02097451e+00, -1.14050366e+00],
       [ 7.49680227e+00,  1.88377220e-01],
       [ 7.58648117e+00, -1.20797032e+00],
       [ 8.68104293e+00, -8.77590154e-01],
       [ 6.

In [5]:
from sklearn.metrics import accuracy_score
accuracy_score(y, lda.predict(X))

0.98