# 1.What is Discriminant Analysis?

Discriminant analysis is statistical technique used to __classify observations into non-overlapping groups__, based on scores on one or more quantitative predictor variables.

__For example,__ a doctor could perform a discriminant analysis to identify patients at high or low risk for stroke. The analysis might classify patients into high- or low-risk groups, based on personal attributes (e.g., chololesterol level, body mass) and/or lifestyle behaviors (e.g., minutes of exercise per week, packs of cigarettes per day).

# 2. Dimensionality Reduction.

Dimensionality reduction technique reduces the number of dimensions (i.e. variables) in a dataset while __retaining as much information as possible.__ 


The main goal of dimensionality reduction techinques is to reduce the dimensions by removing the reduntant and dependent features by transforming the features from higher dimensional space to a space with lower dimensions.


# 3.What is Linear Discriminant Analysis?

### Linear Discriminant Analysis / Normal Discriminant Analysis / Discriminant Function Analysis 

LDA is the most commonly used dimensionality reduction technique in supervised learning. 

Basically, it is a preprocessing step for pattern classification and machine learning applications. 

The goal of Linear Discriminant Analysis is to project the features in higher dimension space onto a lower dimensional space.

# 4.How does LDA works?

The general LDA approach is very similar to a Principal Component Analysis but in addition to finding the component axes that maximize the variance of our data (PCA), we are additionally interested in the axes that maximize the separation between multiple classes (LDA).

__Example__

Let's say we have to reduce a 2D plot into a 1D plot.

Don't go for the complexity(what is gene x and y)....consider it as only x and y axis.

![Reducing2Dto1D](Reducing2Dto1D.png 'Reducing2Dto1D')

If we plot it considering any one axis such as x-axis. Then there may be some overlapping points which will not distribute the groups perfectly as shown in the figure below.

![Reduction_on_x_axis](Reduction_on_x_axis.png 'Reduction_on_x_axis')

Same will be the case if we plot the points considering only y-axis.

Here,comes LDA which help us in proper distribution of the groups.

LDA uses both the axes (X and Y) to create a new axis and projects data onto a new axis in a way to maximize the separation of the two categories and hence, reducing the 2D graph into a 1D graph.

![Reducing_usingLDA](Reducing_usingLDA.png)

Two criteria are used by LDA to create a new axis:

1.Maximize the distance between means of the two classes.

2.Minimize the variation within each class.

![LDA_method](LDA_method.png)

In the above graph, it can be seen that a new axis is generated and plotted in the 2D graph such that it maximizes the distance between the means of the two classes and minimizes the variation within each class.

After generating this new axis using the above-mentioned criteria, all the data points of the classes are plotted on this new axis and are shown in the figure given below.

![Reduced_data_points](Reduced_data_points.png)

__Note:__ Linear Discriminant Analysis fails when the mean of the distributions are shared, as it becomes impossible for LDA to find a new axis that makes both the classes linearly separable. In such cases, we use non-linear discriminant analysis.

# 5. Implementing LDA with sklearn.

In [1]:
#Import the libraries.
import numpy as np
import pandas as pd
import seaborn as sns

In [2]:
irisdata = sns.load_dataset('iris')
irisdata.head()

Unnamed: 0,sepal_length,sepal_width,petal_length,petal_width,species
0,5.1,3.5,1.4,0.2,setosa
1,4.9,3.0,1.4,0.2,setosa
2,4.7,3.2,1.3,0.2,setosa
3,4.6,3.1,1.5,0.2,setosa
4,5.0,3.6,1.4,0.2,setosa


In [3]:
irisdata.shape

(150, 5)

In [4]:
irisdata.describe()

Unnamed: 0,sepal_length,sepal_width,petal_length,petal_width
count,150.0,150.0,150.0,150.0
mean,5.843333,3.057333,3.758,1.199333
std,0.828066,0.435866,1.765298,0.762238
min,4.3,2.0,1.0,0.1
25%,5.1,2.8,1.6,0.3
50%,5.8,3.0,4.35,1.3
75%,6.4,3.3,5.1,1.8
max,7.9,4.4,6.9,2.5


In [5]:
irisdata.sample(5)

Unnamed: 0,sepal_length,sepal_width,petal_length,petal_width,species
133,6.3,2.8,5.1,1.5,virginica
66,5.6,3.0,4.5,1.5,versicolor
86,6.7,3.1,4.7,1.5,versicolor
54,6.5,2.8,4.6,1.5,versicolor
144,6.7,3.3,5.7,2.5,virginica


In [6]:
X=irisdata.iloc[:,:-1]
y = irisdata['species']

In [7]:
X.head()

Unnamed: 0,sepal_length,sepal_width,petal_length,petal_width
0,5.1,3.5,1.4,0.2
1,4.9,3.0,1.4,0.2
2,4.7,3.2,1.3,0.2
3,4.6,3.1,1.5,0.2
4,5.0,3.6,1.4,0.2


In [8]:
y.head()

0    setosa
1    setosa
2    setosa
3    setosa
4    setosa
Name: species, dtype: object

In [9]:
# Train Test Split -> use train_test_split()
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X,y,test_size = 0.20)


from sklearn.preprocessing import StandardScaler
sc = StandardScaler()
X_train = sc.fit_transform(X_train)
X_test = sc.transform(X_test)

In [10]:
# Performing LDA

from sklearn.discriminant_analysis import LinearDiscriminantAnalysis as LDA

lda = LDA()
X_train = lda.fit_transform(X_train, y_train)
X_test = lda.transform(X_test)


In [11]:
explained_variance = lda.explained_variance_ratio_ #returns the variance caused by each coomponent.
explained_variance

array([0.9927599, 0.0072401])

__Note:__ Notice that we got the ratios for only 2 features..why..??
    
It's because the third and fourth value is 0, and hence not shown in the variance ratio.

It can be seen that first component is responsible for 99.27% variance. Similarly, the second component causes 0.7% variance in the dataset. Collectively we can say that the classification information contained in the feature set is captured by the first two components.

In [12]:
#making predictions using Random Forest Classifier.

from sklearn.ensemble import RandomForestClassifier

classifier = RandomForestClassifier(max_depth=2,random_state=0)

classifier.fit(X_train, y_train)
y_pred = classifier.predict(X_test)



In [13]:
#Evaluating the performance.

from sklearn.metrics import confusion_matrix
from sklearn.metrics import accuracy_score

cm = confusion_matrix(y_test, y_pred)
print(cm)
print('Accuracy ' + str(accuracy_score(y_test, y_pred)))

[[14  0  0]
 [ 0 12  0]
 [ 0  0  4]]
Accuracy 1.0


__Observation:__ 

LDA gave an accuracy of 100%,which is higher than that of PCA which gave accuracy of 93.33%.

Out of 30 observations,all observations are rightly predicted.

__Note that:__ As here LDA gave an accuracy of 100% with default parameters however, if we have really large dataset, then we can specify the number of components using __n_components__ parameter to achieve higher accuracies.
    
For more on lda and it's parameters refer here:

https://scikit-learn.org/0.15/modules/generated/sklearn.lda.LDA.html#:~:text=A%20classifier%20with%20a%20linear,share%20the%20same%20covariance%20matrix.

# 6. Extensions to LDA.

•	Quadratic Discriminant Analysis (QDA): Each class uses its own estimate of variance (or covariance when there are multiple input variables).

•	Flexible Discriminant Analysis (FDA): Where non-linear combinations of inputs is used such as splines.

•	Regularized Discriminant Analysis (RDA): Introduces regularization into the estimate of the variance (actually covariance), moderating the influence of different variables on LDA.


# 7.Applications of LDA.

•	__For customer's buying pattern recognition:__ LDA helps here to identify and choose the parameters to describe the components of a group of customers who are highly likely to buy similar products.
 
•	__For facial recognition:__ it is the most famous application in the field of computer vision, every face is drawn with a large number of pixel values, LDA reduces the number of features to a more controllable number first before implementing the classification task. A temple is created with newly produced dimensions which are a linear combination of pixel values.
 
•	__In medical:__ LDA is used here to classify the state of patients’ diseases as mild, moderate or severe based on the various parameters and the medical treatment the patient is going through in order to decrease the movement of treatment.
 
•	__For predictions:__ LDA is firmly used for prediction and hence in decision making, “will you read a book” gives you a predicted result through one or two possible class as a reading book or not. 
 
•	__In learning:__ Nowadays, robots are trained to learn and talk to work as human beings, this can be treated as classification problems. LDA makes similar groups based on various parameters such as frequencies, pitches, sounds, tunes, etc.


# 8. More about LDA.

Watch this video: https://www.youtube.com/watch?v=azXCzI57Yfc

__End of the Notebook.__