# NaiveBayes Classifier
### Probabilitistic Model
- From data $D$ we can **infer** the parameters $\theta$ of model $M$ 

>$\displaystyle p(\theta \lvert D) = \frac{p(\theta)\,p(D \lvert \theta)}{p(D)}$ 
>

### Likelihood Function
- From data $D$ we can **infer** the parameters $\theta$ of model $M$ 

>$\displaystyle p(\theta \lvert D) = \frac{\pi(\theta)\,{\cal{}L}_D(\theta)}{Z}$ 
>
> where the normalization
>
>$\displaystyle Z = \int \pi(\theta)\,{\cal{}L}_D(\theta)\ d\theta $ 

- The **posterior** is proportional to the **prior** times the **likelihood function** 

### Naive Bayes Classifier

- In general, we can use Bayes' rule (and law of total probability) to infer discrete classes $C_k$ for a given $\boldsymbol{x}$ set of features

>$\displaystyle P(C_k \lvert\,\boldsymbol{x}) = \frac{\pi(C_k)\,{\cal{}L}_{\boldsymbol{x}}(C_k)}{Z} $ 


- Naively assuming the features are independent 

>$\displaystyle {\cal{}L}_{\boldsymbol{x}}(C_k) = \prod_{\alpha}^d p(x_{\alpha} \lvert C_k)$ 

### Naive Bayes: Estimation

- Look for maximum of the posterior


>$\displaystyle \hat{k} =  \mathrm{arg}\max_k \left[ \pi_k \prod_{\alpha}^d G(x_{\alpha};\mu_{k,\alpha}, \sigma^2_{k,\alpha})\right]$ 

In [6]:
import numpy as np
from sklearn import datasets
from math import pi

In [2]:
# Load in Iris Data
iris = datasets.load_iris()
X = iris.data[:,:]
y = iris.target

In [3]:
# calculate feature means and variances for each class
classes = np.unique(y)
param = dict()  # we save them in this dictionary
for k in classes:
    members = (iris.target == k) # boolean array
    num = members.sum()    # True:1, False:0
    prior = num / float(iris.target.size)
    X = iris.data[members,:] # slice out members
    mu = X.mean(axis=0)      # calc mean
    X -= mu
    var = (X*X).sum(axis=0) / (X[:,0].size-1)
    param[k] = (num, prior, mu, var) # save results
    print (k, mu, var)

0 [5.006 3.428 1.462 0.246] [0.12424898 0.1436898  0.03015918 0.01110612]
1 [5.936 2.77  4.26  1.326] [0.26643265 0.09846939 0.22081633 0.03910612]
2 [6.588 2.974 5.552 2.026] [0.40434286 0.10400408 0.30458776 0.07543265]


In [8]:
# init predicted values
k_pred = -1 * np.ones(iris.target.size)

# evaluate posterior for each point and find maximum
for i in range(iris.target.size):
    pmax, kmax = -1, None   # initialize to nonsense values
    for k in classes:
        num, prior, mu, var = param[k]
        diff = iris.data[i,:] - mu
        d2 = diff*diff / (2*var) 
        p = prior * np.exp(-d2.sum()) / np.sqrt(np.prod(2*pi*var))
        if p > pmax:
            pmax = p
            kmax = k
    k_pred[i] = kmax

print("Number of mislabeled points out of a total {:d} points : {:d}".format(iris.target.size, sum(iris.target!=k_pred)))

Number of mislabeled points out of a total 150 points : 6


In [11]:
# run sklearn's version - read up on differences if interested
from sklearn.naive_bayes import GaussianNB

gnb = GaussianNB()
y_pred = gnb.fit(iris.data, iris.target).predict(iris.data)

print("Number of mislabeled points out of a total {:d} points : {:d}".format(iris.target.size, sum(iris.target!=y_pred)))

Number of mislabeled points out of a total 150 points : 6
