Naive bayes is based on bayes theorem: <br> <center>
    <b> p(a|b) = (p(b|a)*p(a))/p(b) </b>
</br> </center> <br>
In our case a will be target label and b is feature vector. We will assume all feature are mutually independent. <br>In our dataset [Source: https://www.kaggle.com/msjaiclub/2classclassification?select=ex2data1.csv] a person being healthy or not is our target and features are person-go-for-walk and person-eat-healthy-food so both are independent but both contribute in target.

using that independent assumption now we can divide probability for every feature by converting <br> <center>
    <b>p(a|b) = { (p(b1|a)*p(b2|a)*p(b3|a)....*p(bn|a)) * p(a) } / p(b) </b>
    </br> </center>
<br>
where p(a|b) is conditional probability <br>
p(b1|a) is class posterior probability <br>
p(a) = prior prob of a <br>
p(b) = prior prob of b <br>

Now we will select class which is highest so we will apply <br> <center> 
    <b>argmax a {p(a|b)} <br></b> or <b><br> argmax a { (p(b1|a)*p(b2|a)*p(b3|a)....*p(bn|a)) * p(a) } / p(b) </b> <br> </center> <br>
and we will take log of this multiplication because this all term is between 0 to 1 can cause overflow so convert it by taking log.

So final formula becomes <b> <br> <center> argmax a {log(p(b1|a)) + .... + log(p(bn|a) + log(p(a)) } </center> <br>  </b>
where prior prob p(a) is just frequency <br>
and posterior p(bi|a) for all i = 1...n features follows gaussian kernel.




In [63]:
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
import matplotlib.pyplot as plt


class NaiveBayesClassifier:

    def fittodata(self, data, target):
        samples, features = data.shape       
        #print(samples, features)
        
        #diffrent element in target array
        self._classes = np.unique(target)    
        classes = len(self._classes)        
        
        #  # calculate mean, var, and prior for each class.
        self._mean = np.zeros((classes, features), dtype=np.float64)  
        self._var = np.zeros((classes, features), dtype=np.float64)
        self._priors =  np.zeros(classes, dtype=np.float64)    

        #doing for every unique class
        for i, c in enumerate(self._classes):  
            # we select all data with one class, like all data with target 0
            data_c = data[target==c]  
            
            # we calculate mean and var of each feature column in data_c ie
            # the restricted set when a particular class c is selected.
            # We want to determine p(b==x|a==c) = p(b0==x0|a==c)*...*p(bn==xn|a==c)
            
            # column-mean for different features (and particular class c) stored in array  
            self._mean[i, :] = data_c.mean(axis=0)  
            # variance array for different features (and particular class c) stored in array 
            self._var[i, :] = data_c.var(axis=0)
            
            # above 2 help us determine the marginal distributions of each feature
            # ie array [p(b0|a==c), p(bn|a==c)] for this particular 2 feature dataset.
            
            # prior probability of observing class c with class index i ie p(a==c)
            self._priors[i] = data_c.shape[0] / float(samples) 
        
        # Side note: below signifies mean for the 2nd-feature for target class index 0: ie self._mean[0][1]        
        
    def predictdata(self, data):   
        #PREDICT FOR WHOLE SAMPLE
        target_pred = [self._predictdata(x) for x in data]
        return np.array(target_pred)      
        
    def _predictdata(self, x):  
        posteriors = []
        # for each datasample x find the probability of x belonging to each class (or class's pdf) in self._classes 
        # take the max of these which maximizes p(a|b) where a is possible class assigned to x. 
        #rather than addition we multiply as we work with logs
        
        for i, c in enumerate(self._classes): 
            # we assume that x belongs to class c out of possible self._classes with class index i 
            # then log  (priori-probability of x belonging to class index i) or p(a==c) is:
            prior = np.log(self._priors[i])
            # since x belongs to class c with class index i, the probability of belonging to class c
            # depends on x0 and x1 coming from joint distribution of b0 and b1 calculated as
            # gaussian distribution from data particular to class c.
            
            # _pdfdata returns array as the probabilities of x0 belonging to b0's distribution
            # and x1 belonging to b1's distribution ie [p(b0==x0|a==c), p(b1==x1|a==c)]
            
            # rather than multiplying as we do in joint probabilities we do sum as logs were taken.
            posterior = np.sum(np.log(self._pdfdata(i, x)))
            
            # below forms the log version of eqaution: 
            # (p(b1|a)p(b2|a)p(b3|a)....*p(bn|a)) * p(a)
            posterior = posterior + prior
            posteriors.append(posterior)
            
        # return class c which gives highest log(p(a==c|b)) condititonal probability
        return self._classes[np.argmax(posteriors)]
            

    def _pdfdata(self, class_i, x): 
        mean = self._mean[class_i]
        var = self._var[class_i]
        
        # we return a distribution of each feature ie b0 and b1 when we priori know that x belongs to class_i.     
        num = np.exp(- (x-mean)**2 / (2 * var))
        deno = np.sqrt(2 * np.pi * var)
        # below is an array consisting of the probability of a feature xi belonging to distribution of bi when
        # we know priori that x belongs to class c with class index class_i.
        return num / deno   


In [64]:
df = pd.read_csv('ex2data1.csv') #we have taken a dataset from kaggle[https://www.kaggle.com/msjaiclub/2classclassification?select=ex2data1.csv]

In [65]:
df.head() # dataset head

Unnamed: 0,x,y,label
0,34.62366,78.024693,0
1,30.286711,43.894998,0
2,35.847409,72.902198,0
3,60.182599,86.308552,1
4,79.032736,75.344376,1


In [66]:
data = df.drop('label',axis = 1) # in data we take all column except target

In [67]:
data = np.asarray(data)  # converting that to numpy array

In [68]:
data 

array([[34.62365962, 78.02469282],
       [30.28671077, 43.89499752],
       [35.84740877, 72.90219803],
       [60.18259939, 86.3085521 ],
       [79.03273605, 75.34437644],
       [45.08327748, 56.31637178],
       [61.10666454, 96.51142588],
       [75.02474557, 46.55401354],
       [76.0987867 , 87.42056972],
       [84.43281996, 43.53339331],
       [95.86155507, 38.22527806],
       [75.01365839, 30.60326323],
       [82.30705337, 76.4819633 ],
       [69.36458876, 97.71869196],
       [39.53833914, 76.03681085],
       [53.97105215, 89.20735014],
       [69.07014406, 52.74046973],
       [67.94685548, 46.67857411],
       [70.66150955, 92.92713789],
       [76.97878373, 47.57596365],
       [67.37202755, 42.83843832],
       [89.67677575, 65.79936593],
       [50.53478829, 48.85581153],
       [34.21206098, 44.2095286 ],
       [77.92409145, 68.97235999],
       [62.27101367, 69.95445795],
       [80.19018075, 44.82162893],
       [93.1143888 , 38.80067034],
       [61.83020602,

In [69]:
target = df['label'] 

In [70]:
target.head()

0    0
1    0
2    0
3    1
4    1
Name: label, dtype: int64

In [71]:
data_train, data_test, target_train, target_test = train_test_split(data,target,test_size = 0.2 ,random_state = 42)  # splitting in train test split with 80/20 ratio 

In [72]:
def accuracy(target_true, target_pred): 
    accuracy = np.sum(target_true == target_pred) / len(target_true)
    return accuracy

In [73]:
nb = NaiveBayesClassifier() 
nb.fittodata(data_train, target_train)

In [74]:
predictions = nb.predictdata(data_test)

In [75]:
print("Naive Bayes classification accuracy", accuracy(target_test, predictions))

Naive Bayes classification accuracy 0.8
