# Naive Bayes

**Naive Bayes** method is a classification method based on Bayes' theorem and independent assumptions of feature conditions. For a given training data set, first learn the joint probability distribution of the input and output based one the feature independent assumption; then based on this model, use the Bayes theorem to find the posterior probability $y$ for the given input $x$.

1. Naive Bayes method is a typical generative learning method. It learned by calculate the joint probability dributribution $P(X, Y)$ to get the posterior probability distribution $P(Y|X)$. Specifically, use the training data to learn the estimates of A and B to get the joint probability distribution:

$$P(X, Y) = P(Y)P(X|Y)$$

The probability estimation method can be a maximum likelihood estimation or a Bayesian estimation.


2. The basic assumption of the Naive Bayesian method is conditional independence, 

$$\begin{matrix}
P(X=x|Y=c_k) & = P(X^{(1)} = x^{(1)}, ..., X^{(n)} = x^{(n)} | Y=c_k)  \\ 
 & =\prod_{j=1}^{n}P(X^{(j)} = x^{(j)}|Y = c_k)
\end{matrix}$$

This is a strong assumption. Because of this assumption, the number of conditional probabilities contained in the model is greatly reduced, and the learning and prediction of the naive Bayes method is greatly simplified. So naive Bayesian method is efficient and easy to implement. The disadvantage is that the performance of classification is not necessarily high.

3. Naive Bayes method uses Bayes theorem and learned joint probability model to make classification prediction.

$$P(Y|X) = \frac{P(X,Y)}{P(X} = \frac{P(Y)P(X|Y)}{\sum_YP(Y)P(X|Y)}$$

4. Divides input x into the class y with the largest posterior probability.

$$y =\arg \max_{c_k}P(c_k|X=x) \\= \arg max_{c_{k}}P(Y = c_k)\prod_{j=1}^{n}P(X_j = x^{(j)}|Y=c_k)$$


5. The maximum posterior probability is equivalent to the expected risk minimization when the loss function is 0-1.

model:

- Gaussian model

- Polynomial model

- Bernoulli model

In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline

from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split

from collections import Counter
import math

In [14]:
def create_data():
    iris = load_iris()
    df = pd.DataFrame(iris.data, columns = iris.feature_names)
    df['label'] = iris.target
    df.columns = ['sepal length','sepal width','petal length', 'petal width', 'label']
    data = np.array(df.iloc[:100, :])
    return data[:, :-1], data[:, -1]

In [16]:
X, y = create_data()
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.3)

In [17]:
X_test[0], y_test[0]

(array([6.3, 3.3, 4.7, 1.6]), 1.0)

# Gaussian Naive Bayes

The likelihood of features is assumed to be Gaussian

Probability density function:

$$P(x_i| y_k) = \frac{1}{\sqrt{2\pi\delta^{2}_{yk}}}\exp (-\frac{(x_i - \mu_{yk})^2}{2\delta^{2}_{yk}})$$

Mathematical Expectation: $\mu$


Variance: $\delta ^2 = \frac{\sum (X - \mu)^2}{N}$

In [51]:
class NaiveBayes:
    def __init__(self):
        self.model = None
    
    #Mathematical Expection
    @staticmethod
    def mean(X):
        return sum(X)/float(len(X))
    
    #Variance
    def stdev(self, X):
        avg = self.mean(X)
        return math.sqrt(sum([pow(x - avg, 2) for x in X]) / float(len(X)))
    
    #Probability density function
    def gaussian_probability(self, x, mean, stdev):
        exponent = math.exp(-(math.pow(x - mean, 2) / 
                             (2 * math.pow(stdev, 2))))
        return (1 / (math.sqrt(2*math.pi)*stdev)) * exponent
    
    #deal with X_train
    def summarize(self, train_data):
        summarize = [(self.mean(i), self.stdev(i)) for i in zip(*train_data)]
        return summarize
    
    #Find mathematical expectations and standard deviations by category
    def fit(self, X, y):
        labels = list(set(y))
        data = {label: [] for label in labels}
        for f, label in zip(X, y):
            data[label].append(f)
        self.model = {
            label: self.summarize(value)
            for label, value in data.items()
        }
        return 'gaussian Naive Bayes train done!'
    
    #Calculate the probabilities
    def calculate_probabilities(self, input_data):
        # summaries:{0.0: [(5.0, 0.37),(3.42, 0.40)], 1.0: [(5.8, 0.449),(2.7, 0.27)]}
        # input_data:[1.1, 2.2]
        probabilities = {}
        for label, value in self.model.items():
            probabilities[label] = 1
            for i in range(len(value)):
                mean, stdev = value[i]
                probabilities[label] *= self.gaussian_probability(input_data[i], mean, stdev)
        return probabilities
    
    #Category
    def predict(self, X_test):
        label = sorted(
            self.calculate_probabilities(X_test).items(),
            key = lambda x: x[-1])[-1][0]
        return label
    
    def score(self, X_test, y_test):
        right = 0
        for X, y in zip(X_test, y_test):
            label = self.predict(X)
            if label == y:
                right += 1
        return right / float(len(X_test))

In [52]:
model = NaiveBayes()

In [53]:
model.fit(X_train, y_train)

'gaussian Naive Bayes train done!'

In [54]:
model.model

{0.0: [(5.0, 0.3498917581542075),
  (3.442424242424243, 0.40529099387395123),
  (1.4545454545454548, 0.17423583650260666),
  (0.2515151515151515, 0.11578771620935031)],
 1.0: [(5.948648648648648, 0.5375729348990811),
  (2.7594594594594595, 0.29081382496177344),
  (4.25135135135135, 0.4830206876111376),
  (1.3189189189189188, 0.19289800951411323)]}

In [55]:
print(model.predict([4.4,  3.2,  1.3,  0.2]))

0.0


In [56]:
model.score(X_test, y_test)

1.0

### scikit-learn example

**Gaussian Naive Bayes**

In [57]:
from sklearn.naive_bayes import GaussianNB

In [58]:
clf = GaussianNB()
clf.fit(X_train, y_train)

GaussianNB(priors=None, var_smoothing=1e-09)

In [59]:
clf.score(X_test, y_test)

1.0

In [61]:
clf.predict([[4.4,  3.2,  1.3,  0.2]])

array([0.])

In [62]:
clf.score(X_test, y_test)

1.0

**Bernoulli Naive Bayes**

In [63]:
from sklearn.naive_bayes import BernoulliNB, MultinomialNB

In [64]:
clf = BernoulliNB()
clf.fit(X_train, y_train)

BernoulliNB(alpha=1.0, binarize=0.0, class_prior=None, fit_prior=True)

In [65]:
clf.score(X_test, y_test)

0.43333333333333335

In [66]:
clf.predict([[4.4,  3.2,  1.3,  0.2]])

array([1.])

In [67]:
clf.score(X_test, y_test)

0.43333333333333335

**Multinomial Naive Bayes**

In [68]:
clf = MultinomialNB()
clf.fit(X_train, y_train)

MultinomialNB(alpha=1.0, class_prior=None, fit_prior=True)

In [69]:
clf.score(X_test, y_test)

1.0

In [70]:
clf.predict([[4.4,  3.2,  1.3,  0.2]])

array([0.])

In [71]:
clf.score(X_test, y_test)

1.0