# Naive Bayes Classification

![Bayes theorem](datasets_n_images/images/Bayestheorem_formulae.png 'Bayestheorem_formulae')

![NaiveBayesFormalae](datasets_n_images/images/naive_bayes_formulae.png 'naive_bayes_formulae')

# How Naive Bayes algorithm works?

!['Example Tables'](datasets_n_images/images/NaiveBayes_example_tables.png 'NaiveBayes_example_tables')

# A basic model using Naive Bayes in Python

In [1]:
# scikit learn (python library) will help here to build a Naive Bayes model 
# in Python.  There are three types of Naive Bayes model 
# under scikit learn library:

# 1> Gaussian : Used in classification and assumes that features 
#               follow a normal distribution.
# 2> Multinomial (http://mathworld.wolfram.com/MultinomialDistribution.html)
# 3> Bernoulli (http://mathworld.wolfram.com/BernoulliDistribution.html)

# Below is the example of Gaussian model.

# Import Library of Gaussian Naive Bayes model
from sklearn.naive_bayes import GaussianNB
import numpy as np

# assigning predictor and target variables
x = np.array([[-3,7],[1,5], [1,2], [-2,0], [2,3], 
             [-4,0], [-1,1], [1,1], [-2,2], [2,7], [-4,1], [-2,7]])

Y = np.array([3, 3, 3, 3, 4, 3, 3, 4, 3, 4, 4, 4])

In [2]:
# type your code. Comments shouldguide you.
# Create a Gaussian Classifier
model = GaussianNB()

# Train the model using the training sets 
model.fit(x,Y)

# Predict Output 
predicted = model.predict([[1,2],[3,4]])
print(predicted)

[3 4]


In [3]:

from sklearn.naive_bayes import BernoulliNB
import numpy as np

# assigning predictor and target variables
x = np.array([[-3,7],[1,5], [1,2], [-2,0], [2,3], 
             [-4,0], [-1,1], [1,1], [-2,2], [2,7], [-4,1], [-2,7]])

Y = np.array([3, 3, 3, 3, 4, 3, 3, 4, 3, 4, 4, 4])

# type your code. Comments shouldguide you.
# Create a Gaussian Classifier
model = BernoulliNB()

# Train the model using the training sets 
model.fit(x,Y)

# Predict Output 
predicted = model.predict([[1,2],[3,4]])
print(predicted)

[4 4]


In [4]:

from sklearn.naive_bayes import MultinomialNB
import numpy as np

# assigning predictor and target variables
x = np.array([[-3,7],[1,5], [1,2], [-2,0], [2,3], 
             [-4,0], [-1,1], [1,1], [-2,2], [2,7], [-4,1], [-2,7]])

Y = np.array([3, 3, 3, 3, 4, 3, 3, 4, 3, 4, 4, 4])

# type your code. Comments shouldguide you.
# Create a Gaussian Classifier
model = MultinomialNB()

# Train the model using the training sets 
model.fit(x,Y)

# Predict Output 
predicted = model.predict([[1,2],[3,4]])
print(predicted)

ValueError: Negative values in data passed to MultinomialNB (input X)

# Where is Naive Bayes Classifier used ?

1> Real time Prediction 
2> Text classification / Spam Filtering 
3> Recommendation System 

# What are the Pros and Cons of Naive Bayes?

Pros:
-------
1> It is easy and fast to predict class of test data set. It also performs well in multi class prediction. 

2> When assumption of independence holds, a Naive Bayes classifier performs better compare to other models like logistic regression and you need less training data.

3> It perform well in case of categorical input variables compared to numerical variable(s). For numerical variable, normal distribution is assumed (bell curve, which is a strong assumption). 

Cons:
--------
1> If categorical variable has a category (in test data set), which was not observed in training data set, then model will assign a 0 (zero) probability and will be unable to make a prediction. This is often known as “Zero Frequency”. To solve this, we can use the smoothing technique. One of the simplest smoothing techniques is called Laplace estimation.

2> On the other side naive Bayes is also known as a bad estimator, so the probability outputs from predict_proba are not to be taken too seriously.

3> Another limitation of Naive Bayes is the assumption of independent predictors. In real life, it is almost impossible that we get a set of predictors which are completely independent.