# Naive Bayes Classification


### Bayes' Theorem

- Calculate the probability of a hypothesis given prior knowledge.

$P(h|d) = \frac{P(d|h) \times P(h)}{P(d)}$

- $P(h|d) = $ *probability of hypothesis $h$ given data $d$*, **posterior probability**.
- $P(d|h) = $ *probability of data $d$ given hypothesis $h$*.
- $P(h) = $ * probability of hypothesis $h$ being true regardless of the data*, **prior probability**.

#### Example

- $D$ indicates an item is defective.
- $A, B, C$ indicate an item was produced at factories A, B, C.
- $P(A) = 0.35, P(B) = 0.35, P(C) = 0.3$
- $P(D | A) = 0.015, P(D | B) = 0.01, P(D | C) = 0.02$
- Find $P(C|D)$ - *probability a defective item was produced at factory C*

$P(D|C) = \frac{P(C|D) \times P(D)}{P(C)}$

$0.02 =  \frac{P(C|D) \times P(D)}{0.3}$

$P(C|D) = \frac{0.02 \times 0.3}{P(D)}$

$P(D) = P(D | A) \times P(A) + P(D | B) \times P(B) + P(D | C) \times P(C) = 0.01475$

$P(C|D) = \frac{0.02 \times 0.3}{0.01475} = 0.406$

### Applied to Classification

- Features are assumed to be independent so that you don't have to compute each permutation of features given a hypothesis, which would be intractable. This assumption is what makes it *naive*, because typically features will interact in some way.
- For categorical features, the model is defined by class probabilities and conditional probabilities. The class probabilities are just the probability of each class in the training dataset. The conditaional probabilities are the probabiliteis of each feature given the corresponding class value.
- Predictions are made by compute the maximum probable hypothesis, or the **maximum a-posteriori** hypothesis (MAP) (as opposed to a priori).

$MAP(h) = max(P(h|d))$

- For real-valued features, Gaussian Naive Bayes is used instead. The mean and standard deviation of each training feature is stored, then predictions are made by determining the probability of that feature's value in using the gaussian probability density function.

### Dataset

- [UCI Iris](https://archive.ics.uci.edu/ml/datasets/Iris)
- Four real-valued attributes - so Gaussian Naive Bayes will be used.

# Baseline with Scikit-Learn

In [149]:
from sklearn.naive_bayes import GaussianNB
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score as ACC
import pandas as pd
import numpy as np

names = ['sep_len', 'sep_wid','pet_len', 'pet_wid', 'class']
df = pd.read_csv('./data/iris.data', names=names)

for i, name in enumerate(df['class'].unique()):
    df['class'].replace(name, i, inplace=True)

X = df[['sep_len','sep_wid','pet_len','pet_wid']]
y = df['class']

X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.33, random_state=44)

model = GaussianNB()
model.fit(X_train, y_train)
pred = model.predict(X_test)
print('Scikit-learn Iris Accuracy = %lf' % (ACC(pred, y_test)))

Scikit-learn Iris accuracy = 0.960000


# My Implementation

In [151]:
from math import pi, exp, sqrt


class MyGaussianNB:
    
    def __init__(self):
        
        # Categorical implementation.
        self._condt_probs = dict()  # Dict stores conditional probabilities.
        
        # Real-valued implementation.
        self._feat_means = dict()   # Dict stores the means.
        self._feat_stdvs = dict()   # Dict stores the standard deviations.
        
        # Common.
        self._classes = []          # List of the possible classes.
        self._class_probs = dict()  # Dict stores class probabilities.
        self._feat_type = dict()    # Dict stores feature types,'cat' or 'con'.
        
    def fit(self, X, y):
    
        # Unique classes.
        self._classes = list(y.unique())
    
        # Class probabilities.
        for c in self._classes:
            self._class_probs[c] = list(y.values).count(c) * 1.0 / len(y.values)
                
        # Determine feature types and store their corresponding information.
        for feat in X.columns:
            
            iscon = str(X[feat].dtype).startswith('int') or str(X[feat].dtype).startswith('float')
            if iscon:
                self._feat_type[feat] = 'con'
                
                # Compute the mean and stdv of this feature given each class.
                self._feat_means[feat] = {}
                self._feat_stdvs[feat] = {}
                for c in self._classes:
                    # There's hopefully a less disgusting way to do this.
                    matching = [X[feat].values[i] for (i,v) in enumerate(y) if v == c]
                    self._feat_means[feat][c] = np.mean(matching)
                    self._feat_stdvs[feat][c] = np.std(matching)
                
            else:
                self._feat_type[feat] = 'cat'
                # TODO: Compute the conditional probability.
    
        return
    
    def predict(self, X):
        feats = list(X.columns)
        preds = [self.predict_single(feats, x) for i,x in X.iterrows()]
        return preds
    
    def predict_single(self, feats, x):
        
        # Gaussian probability density function.
        # Takes the value, the mean, and the std deviation.
        # Returns the probability of getting that value.
        def pdf(x, m, s):
            return (1/(sqrt(2*pi)*s))*exp(-((x-m)**2)/(2*s**2))
        
        # Compute the posterior for each class.
        # to find the most likely classification given these features.
        # P(c | x) = (P(x | c) x P(c)) / P(x).
        # P(x) cancels out because the same features are used.
        # So P(c | x) = P(x | c) x P(c).
        # P(x | c) = P(x_1 | c) x ... x P(x_n | c).
        # P(cls) and P(x_i) are either stored (categorical)
        # or easily computed (real-value).
        
        max_class = None
        max_prob = 0
        
        for c in self._classes:
            
            prob = self._class_probs[c]
            
            # Compute P(x | c).
            # Categorical features use the stored conditional probabilities.
            # Continuous features compute their probability via guassian PDF.
            for (feat, x_i) in zip(feats, x):
            
                if self._feat_type[feat] is 'cat':
                    prob *= self._condt_probs[x_i][c]
                
                elif self._feat_type[feat] is 'con':
                    m = self._feat_means[feat][c]
                    s = self._feat_stdvs[feat][c]
                    prob *= pdf(x_i, m, s)

            # Track max probability and its corresponding class.
            if prob > max_prob:
                max_prob = prob
                max_class = c
            
        return max_class
    
model = MyGaussianNB()
model.fit(X_train, y_train)
pred = model.predict(X_test)
print('My Implementation Iris Accuracy = %lf' % (ACC(pred, y_test)))

My Implementation Iris Accuracy = 0.960000


# Results

- My custom implementation matches the accuracy (96%) of the scikit-learn implementation for the Iris dataset with continuous features.