# Naive Bayes

## Bayes Theorem
$P(A|B) = \frac{P(B|A)\cdot P(A)}{P(B)}$

In our case,

$P(y|X) = \frac{P(X|y)\cdot P(y)}{P(X)}$

with feature vector $X$

$X = (x_1, x_2, x_3, ..., x_n)$

Assume that all features mutually independent(naive), then

$P(y|X) = \frac{P(X_1|y)\cdot P(X_2|y)\cdot ... \cdot P(X_n|y)\cdot P(y)}{P(X)}$

## Select class with highest probability

$y = argmax_y P(y|X) = argmax_y \frac{P(X_1|y)\cdot P(X_2|y)\cdot ... \cdot P(X_n|y)\cdot P(y)}{P(X)}$

Cross the $X$ term out because we are only interested in $y$:

$\begin{aligned}&y = argmax_y P(X_1|y)\cdot P(X_2|y)\cdot ... \cdot P(X_n|y)\cdot P(y)\\
&y = argmax_y \log(P(X_1|y)) + \log(P(X_2|y)) + ... + \log(P(X_n|y)) + \log(P(y))
\end{aligned}$

where $P(y)$ the prior probability is usually the frequency

## Class conditional probability $P(X_i|y)$ : Gaussian Distribution
$P(X_i|y) = \frac{1}{\sqrt{2\pi\sigma_y^2}}\cdot\exp(-\frac{(x_i - \mu_y)^2}{2\sigma_y^2})$
where $\mu_y$ is the mean of each class, $\sigma_y^2$ the variance.

In [1]:
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn import datasets
import matplotlib.pyplot as plt

In [2]:
class NaiveBayes:
    # does not need init method
    def fit(self, X, y):
        n_samples, n_features = X.shape
        self._classes = np.unique(y)    # find unique elements of y, use as classes
        n_classes = len(self._classes)
        
        # init mean, var, priors
        self._mean = np.zeros((n_classes, n_features), dtype = np.float64)
        self._var = np.zeros((n_classes, n_features), dtype = np.float64)
        self._priors = np.zeros(n_classes, dtype = np.float64)
        
        for c in self._classes:    # for each class
            X_c = X[y==c]
            self._mean[c,:] = X_c.mean(axis=0)    # find mean of each column
            self._var[c,:] = X_c.var(axis=0)
            self._priors[c] = X_c.shape[0]/float(n_samples)   # frequency of this class in training sample
    def predict(self, X):
        y_pred = [self._predict(x) for x in X]
        return y_pred
    
    def _predict(self, x):    # one sample
        posteriors = []
        for idx, c in enumerate(self._classes):
            prior = self._priors[idx]    # with current index
            class_conditional = np.sum(np.log(self._pdf(idx,x)))
            posterior = prior + class_conditional
            posteriors.append(posterior)
        return self._classes[np.argmax(posteriors)]
    
    def _pdf(self,class_idx,x):    # probability density function
        mean = self._mean[class_idx]
        var = self._var[class_idx]
        numerator = np.exp(- (x-mean)**2/(2*var))
        denominator = np.sqrt(2*np.pi*var)
        return numerator/denominator

In [3]:
def accuracy(y_true, y_pred):
    accuracy = np.sum(y_true == y_pred)/len(y_true)
    return accuracy

In [4]:
X, y = datasets.make_classification(n_samples = 1000, n_features = 10, n_classes = 2, random_state = 142)
X_train, X_test, y_train, y_test = train_test_split(X,y, test_size = 0.2, random_state = 142)
nb = NaiveBayes()
nb.fit(X_train,y_train)
predictions = nb.predict(X_test)
print("Naive Bayes classification accuracy", accuracy(y_test,predictions))

Naive Bayes classification accuracy 0.97


Note(from Wikipedia):

In Bayesian statistics, the posterior probability of a random event or an uncertain proposition is the conditional probability given the relevant evidence or background.