# Overview 

**Linear Discriminant Analysis** is a very good alternative of Logistic regression. 

In general, It is a less direct approach of predicting the target. It models the distribution of predictors for every Y ($P(X|Y)$ and then it uses Bayes' theorem to calculate the $P(Y|X)$. When we assume that the distributions of $P(X|Y)$ are normal, the model is very similar to logistic regression. 

But when do we prefer LDA over Logistic regression?

* When the classes are well-separated Logistic regression tends to be quite unstable, LDA not so much.
* When the number of observations is small and the distribution of X for every class is normal, LDA tends to be more stable
* LDA is preferred more when we have a multi-class problem.

The Bayes theorem formula is: 

$$P(Y = k |X = x) = \dfrac{P(X = x|Y = k)P(Y = k)}{P(X = x)} $$
$$P(Y = k |X = x) = \dfrac{P(X = x|Y = k)P(Y = k)}{\sum_{l = 1}^{classes} P(X = x | Y = l)P(Y = l)} $$


Where:
* $P(X = x)$ is the prior probability that a random observation comes from class k.
* $P(X = x|Y = k)$ is the density function of X for an observation that comes from the $k^{th}$ class
* $P(Y = k |X = x)$ is the posterior probability that an observation x belongs to class k.

At the end of the day, LDA is a classifier that approximates Bayes classifier.

It gets really mathy, and I will not go through it. In case you are interested into the math(it is totally recommended to check it out) go for two classes [here](https://en.wikipedia.org/wiki/Linear_discriminant_analysis#LDA_for_two_classes) and for a multi-class model [here](https://en.wikipedia.org/wiki/Linear_discriminant_analysis#Multiclass_LDA)


In [5]:
from sklearn.datasets import load_wine
import pandas as pd
import numpy as np
from sklearn.discriminant_analysis import LinearDiscriminantAnalysis
from sklearn.discriminant_analysis import QuadraticDiscriminantAnalysis
from sklearn.naive_bayes import GaussianNB
from sklearn.model_selection import train_test_split

X, y = load_wine(return_X_y=True)

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
clf_linear = LinearDiscriminantAnalysis().fit(X_train, y_train)
clf_linear.score(X_test, y_test)

1.0

# Other Discriminant Analysis methods 

## Quadratic Discriminant Analysis

In this method we omit an assumption that I am not discussing here because you need to study the links I have provided before in order to understand. We use it when data are not linearly separable. 

But to find how it differs go [here](https://en.wikipedia.org/wiki/Quadratic_classifier#Quadratic_discriminant_analysis)

In [6]:
clf_quant = QuadraticDiscriminantAnalysis().fit(X_train, y_train)
clf_quant.score(X_test, y_test)

0.97222222222222221

## Naive Bayes Classifier

Naive Bayes method, is quite a simplistic approach. Because it takes the assumption that all the predictors are independent in each class. It is useful when the number of classes it too large and even QDA and LDA break down. It is widely used at sentiment classification tasks. 

While its wild assumption, it performs very well.

The formula for Naive Bayes Classifier is:
$$ P(Y | x_1, x_2, x_3 ...x_n)  \propto P(Y) \prod_{i=n}^{n} P(x_{i}|Y)$$ ince $P(x_1, \dots, x_n)$ is constant given the input

In [7]:
gnb = GaussianNB()
clf_nb = gnb.fit(X_train, y_train)
clf_nb.score(X_test, y_test)

TypeError: fit() missing 1 required positional argument: 'y'