# Naive Bayes Classifier

## 1.  Algorithm
Naive Bayes is a kind of classifier which uses the Bayes Theorem. It predicts membership probabilities for each class such as the probability that given record or data point belongs to a particular class.  The class with the highest probability is considered as the most likely class. 

__Bayes Theorem__

Bayes theorem named after Rev. Thomas Bayes. It works on conditional probability. Conditional probability is the probability that something will happen, given that something else has already occurred. Using the conditional probability, we can calculate the probability of an event using its prior knowledge.

Below is the formula for calculating the conditional probability.

$P(H|E)=\frac{P(E|H) * P(H)}{P(E)}$

where
* P(H) is the probability of hypothesis H being true. This is known as the prior probability.
* P(E) is the probability of the evidence(regardless of the hypothesis).
* P(E|H) is the probability of the evidence given that hypothesis is true.
* P(H|E) is the probability of the hypothesis given that the evidence is there.

__Maximum A Posteriori (MAP)__

The MAP for a hypothesis is:

$ MAP(H)
= max(P(H|E))
= max(\frac{(P(E|H)*P(H))}{P(E)})
= max(P(E|H)*P(H))$

P(E) is evidence probability, and it is used to normalize the result. It remains same so, removing it won’t affect.

## 2. Asumption
Naive Bayes classifier assumes that all the features are unrelated to each other. Presence or absence of a feature does not influence the presence or absence of any other feature. This assumption is called class conditional independence.

## 3. Pros and Cons 
__Pros:__

* It is easy and fast to predict class of test data set. It also perform well in multi class prediction
* When assumption of independence holds, a Naive Bayes classifier performs better compare to other models like logistic regression and you need less training data.
* It perform well in case of categorical input variables compared to numerical variable(s). For numerical variable, normal distribution is assumed (bell curve, which is a strong assumption).

__Cons:__

* If categorical variable has a category (in test data set), which was not observed in training data set, then model will assign a 0 (zero) probability and will be unable to make a prediction. This is often known as “Zero Frequency”. To solve this, we can use the smoothing technique. One of the simplest smoothing techniques is called Laplace estimation.
* On the other side Naive Bayes is also known as a bad estimator, so the probability outputs from predict_proba are not to be taken too seriously.
* Another limitation of Naive Bayes is the assumption of independent predictors. In real life, it is almost impossible that we get a set of predictors which are completely independent.

## Naive Bayes Model under Scikit Learn Library
__Gaussian:__ It is used in classification and it assumes that features follow a normal distribution.

__Multinomial:__ It is used for discrete counts. For example, let’s say,  we have a text classification problem. Here we can consider bernoulli trials which is one step further and instead of “word occurring in the document”, we have “count how often word occurs in the document”, you can think of it as “number of times outcome number x_i is observed over the n trials”.

__Bernoulli:__ The binomial model is useful if your feature vectors are binary (i.e. zeros and ones). One application would be text classification with ‘bag of words’ model where the 1s & 0s are “word occurs in the document” and “word does not occur in the document” respectively.

[Scikit Learn Documentation]: (http://scikit-learn.org/stable/modules/naive_bayes.html)
[Scikit Learn Documentation]

In [2]:
# Gaussian
from sklearn import datasets
iris = datasets.load_iris()
from sklearn.naive_bayes import GaussianNB
gnb = GaussianNB()
y_pred = gnb.fit(iris.data, iris.target).predict(iris.data)
print("Number of mislabeled points out of a total %d points : %d" % (iris.data.shape[0],(iris.target != y_pred).sum()))

Number of mislabeled points out of a total 150 points : 6
