# CHAPTER - 18: Naive Bayes

Bayes theorem is a method for understanding the probability of an event, given some new information and a prior belief in the probability of the event.

In machine learning the application of Bayes theorem is for the classification in the form of naive Bayes classifier.

Naive Bayes classifiers combine a number of desirable qualities in practical machine learning into a single classifier, including:
1. An intuitative approach
2. The ability to work with small data
3. Low computation costs for training and prediction
4. often solid results in a variety of settings.

In Naive Bayes classifier:
1. For each feature in the data we have to assume a statistical distribution of the likelihood. The common distributions are normal(Gaussian), multinomial, and Bernoulli distributions. The distributions are chosen are determined by the nature of features(continuous, binary, etc...).
2. Naive Bayes gets its name because we assume that each feature and its resulting likelihood is independent.

## 18.1 Training a Classifier for Continuous Features

When we have only continuous features and we want to train naive Bayes classifier.

Using Gaussian Naive Bayes classifier:

In [1]:
# loading libraries

from sklearn import datasets
from sklearn.naive_bayes import GaussianNB

In [3]:
# loading the data

iris = datasets.load_iris()
features = iris.data
target = iris.target

In [4]:
# creating Gaussian Naive Bayes object

classifier = GaussianNB()

In [5]:
# Training the model

model = classifier.fit(features, target)

In [6]:
# creating a new observation

new_observation = [[4, 4, 4, 0.4]]

In [7]:
# predicting class

model.predict(new_observation)

array([1])

In [8]:
# Creating Gaussian Naive Bayes object with prior probabilities of each class

clf = GaussianNB(priors = [0.25, 0.25, 0.5])

In [9]:
# train the model

model = classifier.fit(features, target)

## 18.2 Training a classifier for Discrete and count features

Using a muiltinomial Naive Bayes classifier:

In [10]:
# loading the libraries

import numpy as np
from sklearn.naive_bayes import MultinomialNB
from sklearn.feature_extraction.text import CountVectorizer

In [12]:
# Create text

text_data = np.array(['I love Brazil. Brazil!',
                     'Brazil is best', 
                      'Germany beats both'])

In [13]:
# creating a bag of words

count = CountVectorizer()
bag_of_words = count.fit_transform(text_data)

In [15]:
# Create feature matrix

features = bag_of_words.toarray()

In [16]:
# Create target vector

target = np.array([0, 0, 1])

In [17]:
# create multinomial naive Bayes object with prior probabilities of each class

classifier = MultinomialNB(class_prior = [0.25, 0.5])

In [18]:
# train model

model = classifier.fit(features, target)

Most common uses of Multinomial NB is text classification using bag of words or tf-idf approaches.

In [19]:
# Creating a new observation

new_observation = [[0, 0, 0, 1, 0, 1, 0]]

In [20]:
# predict new observation's class

model.predict(new_observation)

array([0])

If "class_prior" is not specified the prior probabilities are learned using the data. If we want a uniform distribution to be used as the prior, we can set "fit_prior = false".

MultinomialNB has an additive smoothing hyperparameter, alpha, that should be tuned, it has a deafult value as 1.0, if the value is 0.0 means no smoothing takes place.

## 18.3 Training a Naive Bayes Classifier for Binary Features

Using Bernoulli NB classifier:

In [21]:
# loading the libraries

import numpy as np
from sklearn.naive_bayes import BernoulliNB

In [22]:
# creating three binary features

features = np.random.randint(2, size = (100, 3))

In [23]:
# creating a binary target vector

target = np.random.randint(2, size = (100, 1)).ravel()

In [24]:
# create Bernouli Naive Bayes object with prior probabilities of each class

classifier = BernoulliNB(class_prior = [0.25, 0.5])

In [25]:
# train the model

model = classifier.fit(features, target)

Bernoulli NB classifier assumes that all our features are binary.

Like Multinomial, Bernoulli NB is often used in text classification, and it also has smoothing hyperparameter alpha.

## 18.4 Calibrating Predicted Probabilities

To calibrate predicted probabilities from naive Bayes classifiers so they are interpretable using CaliberatedClassifierCV

In [27]:
# loading libraries

from sklearn import datasets
from sklearn.naive_bayes import GaussianNB
from sklearn.calibration import CalibratedClassifierCV

In [28]:
# loading data

iris = datasets.load_iris()
features = iris.data
target = iris.target

In [29]:
# create Gaussian Naive Bayes object

classifier = GaussianNB()

In [30]:
# creating calibrated cross-validation with sigmoid calibration

classifier_sigmoid = CalibratedClassifierCV(classifier, cv = 2, method = 'sigmoid')

In [31]:
# Calibrate probabilities

classifier_sigmoid.fit(features, target)

In [34]:
# creating a new observation

new_observation = [[2.6, 2.6, 2.6, 0.4]]

In [35]:
# view calibrated probabilities

classifier_sigmoid.predict_proba(new_observation)

array([[0.31859969, 0.63663466, 0.04476565]])

In [36]:
# training a Gaussian NB  then predict class probabilities

classifier.fit(features, target).predict_proba(new_observation)

array([[2.31548432e-04, 9.99768128e-01, 3.23532277e-07]])

CalibratedClassifierCV has two classification methods:
1. Platt's sigmoid model
2. Isotonic regression - it is nonparametric and tends to overfit when sample sizes are very small.