# Naive Bayes
## Bayes Theorem
$P(A|B) = \frac{P(B|A) P(A)}{P(B)} $

In our case, given features $X = (x_1, ..., x_n)$, the class probability $P(y|X)$:

$P(y|X) = \frac{P(X|y) P(y)}{P(X)}$

We're making an assumption all features are **mutually independent**.

$P(y|X) = \frac{P(x_1|y) \cdot P(x_2|y) \cdot P(x_3|y) \cdots P(x_n|y) \cdot P(y)} {P(X)}$

Note that $P(y|X)$ called the posterior probability, $P(x_i|y)$ class conditional probability, and $P(y)$ prior probability of $y$. 

## Select class with highest probability 

$y = argmax_yP(y|X) = argmax_y \frac{P(x_1|y) \cdot P(x_2|y) \cdot P(x_3|y) \cdots P(x_n|y) \cdot P(y)} {P(X)}$

Since $P(X)$ is certain, 

$y = argmax_y P(x_1|y) \cdot P(x_2|y) \cdot P(x_3|y) \cdots P(x_n|y) \cdot P(y)$

To avoid overfollow problem, we use a little trick:

$y = argmax_y (\log(P(x_1|y)) + \log(P(x_2|y)) + \log(P(x_3|y)) \cdots \log(P(x_n|y)) + \log(P(y)) )$

## Model class conditional probability $P(x_i|y)$ by Gaussian

$P(x_i|y) = \frac{1}{\sqrt{2\pi \sigma^2}} \cdot e^{-\frac{(x_i - \mu_y)^2}{2 \sigma_y^2}}$


In [1]:
import numpy as np

class NaiveBayes:
    def fit(self, X, y):
        n_samples, n_features = X.shape
        self._classes = np.unique(y)
        n_classes = len(self._classes)
        
        self._mean = np.zeros((n_classes, n_features), dtype=np.float64)
        self._var = np.zeros((n_classes, n_features), dtype=np.float64)
        self._priors = np.zeros(n_classes, dtype=np.float64)
        
        for idx, c in enumerate(self._classes):
            X_c = X[y==c] 
            self._mean[idx, :] = X_c.mean(axis=0)
            self._var[idx, :] = X_c.var(axis=0)
            
            # prior probability of y, or frequency, how often this class C occur
            self._priors[idx] = X_c.shape[0] / float(n_samples)
        print(self._classes)
        print(self._mean, self._var, self._priors)
        

    def predict(self, X):
        y_pred = [self._predict(x) for x in X]
        return y_pred 
    
    def _predict(self, x):
        '''Make prediction on a single instance.'''
        posteriors = []
        for idx, c in enumerate(self._classes):
            prior = np.log(self._priors[idx])
            class_conditional = np.sum(np.log(self._probability_dense_function(idx, x)))
            _posterior = prior + class_conditional
            posteriors.append(_posterior)
        
        return self._classes[np.argmax(posteriors)]
            
    def _probability_dense_function(self, class_idx, x):
        mean = self._mean[class_idx]
        var = self._var[class_idx]
        numerator = np.exp(-(x - mean) ** 2 / (2 * var))
        denominator = np.sqrt(2 * np.pi * var)
        return numerator / denominator
        
        

In [2]:
from sklearn import datasets
import xgboost
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
from sklearn.naive_bayes import MultinomialNB

data = datasets.load_breast_cancer()
data = datasets.load_iris()
X, y = data.data, data.target 
y[y == 0] = -1
n_estimators=10

X_train, X_val, y_train, y_val = train_test_split(X, y, test_size=0.2, random_state=0)

nb = NaiveBayes()
# nb = MultinomialNB()
nb.fit(X_train, y_train)
y_pred = nb.predict(X_val)
print("Accuracy score ", accuracy_score(y_val, y_pred))

[-1  1  2]
[[5.02051282 3.4025641  1.46153846 0.24102564]
 [5.88648649 2.76216216 4.21621622 1.32432432]
 [6.63863636 2.98863636 5.56590909 2.03181818]] [[0.12932281 0.1417883  0.02031558 0.01113741]
 [0.26387144 0.1039737  0.2300073  0.04075968]
 [0.38918905 0.10782541 0.29451963 0.06444215]] [0.325      0.30833333 0.36666667]
Accuracy score  0.9666666666666667
