# LDA (Linear Discriminant Analysis)

LDA is a supervised learning technique that is used for classification and dimensionality reduction. It is a generalization of Fisher's linear discriminant. It is used to find a linear combination of features that characterizes or separates two or more classes of objects or events. The resulting combination may be used as a linear classifier, or, more commonly, for dimensionality reduction before later classification.

LDA is closely related to principal component analysis (PCA) and factor analysis in that they both look for linear combinations of variables which best explain the data. LDA explicitly attempts to model the difference between the classes of data. PCA on the other hand does not take into account any difference in class, and factor analysis builds the feature combinations based on differences rather than similarities.

LDA and PCA are both linear transformation techniques. PCA is described as unsupervised but LDA is supervised because of the relation to the dependent variable.

## How does LDA work?

LDA works by first calculating the mean and variance of each class (the class is the dependent variable, the thing you are trying to predict). Then, it calculates the separability between classes by taking the ratio of the variance between classes to the variance within classes. The higher the ratio, the more separable the classes are.

The steps are as follows:

- Compute the d-dimensional mean vectors for the different classes from the dataset.
- Compute the scatter matrices (in-between-class and within-class scatter matrix).
- Compute the eigenvectors (e1,e2,...,ed) and corresponding eigenvalues (λ1,λ2,...,λd) for the scatter matrices.
- Sort the eigenvectors by decreasing eigenvalues and choose k eigenvectors with the largest eigenvalues to form a d×k dimensional matrix W (where every column represents an eigenvector).
- Use this d×k eigenvector matrix to transform the samples onto the new subspace. This can be summarized by the matrix multiplication: Y=X×W (where X is a n×d-dimensional matrix representing the n samples, and y are the transformed n×k-dimensional samples in the new subspace).


## How is LDA used for dimensionality reduction?

LDA is used for dimensionality reduction by projecting the original data onto a lower-dimensional space with good class-separability in order to avoid overfitting (“curse of dimensionality”) and also reduce computational costs.

## LDA vs PCA

- PCA is an unsupervised learning technique that is used for dimensionality reduction. LDA is a supervised learning technique that is used for classification and dimensionality reduction.
- PCA is used to find the axes of maximum variance in high-dimensional data and projects it onto a new subspace with equal or fewer dimensions than the original one. LDA tries to find a feature subspace that maximizes class separability.
- PCA is a linear transformation technique. LDA is a linear and quadratic transformation technique.
- PCA is described as unsupervised but LDA is supervised because of the relation to the dependent variable.
- PCA is a feature extraction technique. LDA is a feature selection technique.
- PCA is used for dimensionality reduction. LDA is used for dimensionality reduction and classification.
- LDA assumes that the data is normally distributed. PCA has no such assumption.

In [3]:
from sklearn import datasets
import pandas as pd

df = datasets.load_breast_cancer(as_frame=True)
df = df.frame
df

Unnamed: 0,mean radius,mean texture,mean perimeter,mean area,mean smoothness,mean compactness,mean concavity,mean concave points,mean symmetry,mean fractal dimension,...,worst texture,worst perimeter,worst area,worst smoothness,worst compactness,worst concavity,worst concave points,worst symmetry,worst fractal dimension,target
0,17.99,10.38,122.80,1001.0,0.11840,0.27760,0.30010,0.14710,0.2419,0.07871,...,17.33,184.60,2019.0,0.16220,0.66560,0.7119,0.2654,0.4601,0.11890,0
1,20.57,17.77,132.90,1326.0,0.08474,0.07864,0.08690,0.07017,0.1812,0.05667,...,23.41,158.80,1956.0,0.12380,0.18660,0.2416,0.1860,0.2750,0.08902,0
2,19.69,21.25,130.00,1203.0,0.10960,0.15990,0.19740,0.12790,0.2069,0.05999,...,25.53,152.50,1709.0,0.14440,0.42450,0.4504,0.2430,0.3613,0.08758,0
3,11.42,20.38,77.58,386.1,0.14250,0.28390,0.24140,0.10520,0.2597,0.09744,...,26.50,98.87,567.7,0.20980,0.86630,0.6869,0.2575,0.6638,0.17300,0
4,20.29,14.34,135.10,1297.0,0.10030,0.13280,0.19800,0.10430,0.1809,0.05883,...,16.67,152.20,1575.0,0.13740,0.20500,0.4000,0.1625,0.2364,0.07678,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
564,21.56,22.39,142.00,1479.0,0.11100,0.11590,0.24390,0.13890,0.1726,0.05623,...,26.40,166.10,2027.0,0.14100,0.21130,0.4107,0.2216,0.2060,0.07115,0
565,20.13,28.25,131.20,1261.0,0.09780,0.10340,0.14400,0.09791,0.1752,0.05533,...,38.25,155.00,1731.0,0.11660,0.19220,0.3215,0.1628,0.2572,0.06637,0
566,16.60,28.08,108.30,858.1,0.08455,0.10230,0.09251,0.05302,0.1590,0.05648,...,34.12,126.70,1124.0,0.11390,0.30940,0.3403,0.1418,0.2218,0.07820,0
567,20.60,29.33,140.10,1265.0,0.11780,0.27700,0.35140,0.15200,0.2397,0.07016,...,39.42,184.60,1821.0,0.16500,0.86810,0.9387,0.2650,0.4087,0.12400,0


In [4]:
X = df.drop(columns='target')
y = df.target

print(X.shape)
print(y.shape)

(569, 30)
(569,)


In [5]:
from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X,y, test_size=0.2)

print(X_train.shape)
print(X_test.shape)
print(y_train.shape)
print(y_test.shape)

(455, 30)
(114, 30)
(455,)
(114,)


LDA in scikit-learn : https://scikit-learn.org/stable/modules/generated/sklearn.discriminant_analysis.LinearDiscriminantAnalysis.html

In [15]:
from sklearn.discriminant_analysis import LinearDiscriminantAnalysis as LDA

model = LDA()
model.fit(X_train, y_train)

LinearDiscriminantAnalysis()

In [16]:
new_data = model.transform(X_train)
new_data

array([[ 2.17130346],
       [-2.69493047],
       [ 0.80546702],
       [ 0.84224892],
       [-2.46265796],
       [-3.33939958],
       [ 0.95094871],
       [-5.94458702],
       [ 0.30795551],
       [-0.98193311],
       [ 0.40372448],
       [-0.16895255],
       [-1.36167438],
       [ 2.12993938],
       [-3.75745303],
       [ 4.02456055],
       [-0.93804966],
       [ 1.12995695],
       [ 2.23000018],
       [ 2.64066038],
       [ 1.69234866],
       [ 1.31425575],
       [ 1.54326884],
       [ 2.46763551],
       [ 0.77092018],
       [ 1.77121259],
       [ 0.73071085],
       [-3.79898521],
       [ 2.2362253 ],
       [ 0.80602537],
       [ 0.15269552],
       [-2.84406839],
       [-2.08629504],
       [ 0.34081535],
       [-3.11273   ],
       [-0.87472652],
       [ 2.30536523],
       [ 0.22666782],
       [ 0.40505877],
       [ 1.7702969 ],
       [-3.28602861],
       [ 0.44813563],
       [-2.39435894],
       [ 0.2071809 ],
       [ 0.81512613],
       [-2

Since the original data had only 2 classes, the LDA model will have only 1 dimension.

For more : https://stackoverflow.com/questions/39083308/python-scikit-learn-lda-collapsing-to-single-dimension

In [17]:
model.predict(X_test)

array([1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 0, 1, 1, 0, 0, 0, 0, 1, 0, 0,
       0, 0, 0, 1, 1, 0, 1, 0, 1, 1, 1, 0, 0, 0, 1, 1, 1, 1, 0, 1, 0, 0,
       1, 0, 0, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1,
       0, 0, 1, 1, 1, 1, 0, 1, 1, 0, 0, 1, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1,
       0, 1, 1, 1, 0, 0, 1, 0, 1, 0, 0, 1, 0, 1, 0, 1, 1, 0, 1, 0, 1, 1,
       0, 1, 0, 1])

In [18]:
model.score(X_test, y_test) # computes the accuracy 

0.956140350877193