# Naive Bayes Classifier

A program to learn a naïve Bayes classifier and used to predict class labels of test data. Laplacian smoothing with α = 1 has been implemented. The learned classifier is tested on test instances with unknown class labels, and the predicted class labels for the test instances is printed as output. 

## Importing Libraries

In [0]:
#Importing required Libraries
import numpy as np
import copy

## Loading data

In [0]:
#Reading the data
f= open("data3.csv")
l= open("test3.csv")
fl=f.readlines()
ll=l.readlines()
data3=np.array(fl)
test3=np.array(ll)
#Formatting the data
data = np.array([x.split(',') for x in data3], dtype=np.int)
test = np.array([x.split(',') for x in test3], dtype=np.int)

In [0]:
data

array([[1, 1, 1, 1, 1, 1, 0, 1, 1],
       [1, 1, 1, 1, 1, 1, 0, 0, 1],
       [1, 1, 1, 1, 1, 1, 1, 1, 0],
       [1, 1, 1, 1, 1, 0, 0, 1, 1],
       [1, 1, 1, 1, 1, 0, 0, 0, 1],
       [1, 1, 1, 0, 1, 1, 0, 1, 1],
       [1, 1, 0, 1, 1, 1, 0, 1, 0],
       [1, 1, 1, 0, 1, 1, 0, 0, 1],
       [1, 1, 1, 0, 1, 0, 0, 1, 1],
       [1, 1, 1, 0, 1, 0, 0, 0, 1],
       [0, 1, 1, 1, 1, 1, 0, 1, 1],
       [0, 1, 1, 1, 1, 1, 0, 0, 1],
       [1, 0, 1, 1, 1, 1, 0, 1, 0],
       [0, 1, 1, 1, 1, 0, 0, 1, 1],
       [1, 1, 0, 1, 0, 1, 0, 1, 0],
       [1, 0, 0, 1, 1, 1, 0, 1, 0],
       [1, 0, 0, 1, 0, 1, 1, 1, 0],
       [0, 1, 1, 1, 1, 0, 0, 0, 1],
       [1, 0, 1, 1, 1, 1, 1, 1, 0],
       [0, 1, 1, 0, 1, 1, 0, 1, 1]])

In [0]:
test

array([[0, 1, 1, 1, 1, 1, 1, 1],
       [1, 0, 0, 0, 0, 0, 0, 0],
       [0, 1, 1, 0, 1, 0, 0, 0],
       [0, 1, 1, 1, 1, 0, 0, 0]])

### Data Summary

In [0]:
#Data Summary
Number_Features=len(data[0])-1

Classes=list((np.unique(data[0:20,-1])))
    
Number_Classes=len(Classes)

print("Number of Training instances:",len(data))
print("Number of Test instances:",len(test))

print("Number of Features : ",Number_Features,"Number of Classes: ",Number_Classes)

print("Classes:",Classes)

Number of Training instances: 20
Number of Test instances: 4
Number of Features :  8 Number of Classes:  2
Classes: [0, 1]


## Data preperation

In [0]:
#Regrouping data by labels
indices1=np.where((data[:,Number_Features]==0))
class1=(data[indices1])
indices2=np.where((data[:,Number_Features]==1))
class2=(data[indices2])
grouped_data=np.array((class1,class2))

In [0]:
grouped_data

array([array([[1, 1, 1, 1, 1, 1, 1, 1, 0],
       [1, 1, 0, 1, 1, 1, 0, 1, 0],
       [1, 0, 1, 1, 1, 1, 0, 1, 0],
       [1, 1, 0, 1, 0, 1, 0, 1, 0],
       [1, 0, 0, 1, 1, 1, 0, 1, 0],
       [1, 0, 0, 1, 0, 1, 1, 1, 0],
       [1, 0, 1, 1, 1, 1, 1, 1, 0]]),
       array([[1, 1, 1, 1, 1, 1, 0, 1, 1],
       [1, 1, 1, 1, 1, 1, 0, 0, 1],
       [1, 1, 1, 1, 1, 0, 0, 1, 1],
       [1, 1, 1, 1, 1, 0, 0, 0, 1],
       [1, 1, 1, 0, 1, 1, 0, 1, 1],
       [1, 1, 1, 0, 1, 1, 0, 0, 1],
       [1, 1, 1, 0, 1, 0, 0, 1, 1],
       [1, 1, 1, 0, 1, 0, 0, 0, 1],
       [0, 1, 1, 1, 1, 1, 0, 1, 1],
       [0, 1, 1, 1, 1, 1, 0, 0, 1],
       [0, 1, 1, 1, 1, 0, 0, 1, 1],
       [0, 1, 1, 1, 1, 0, 0, 0, 1],
       [0, 1, 1, 0, 1, 1, 0, 1, 1]])], dtype=object)

## Naive Bayes Classifier algorithm

In [0]:
#Naive Bayes Classifier

Total_instances=(len(grouped_data[0])+len(grouped_data[1]))
Prior_Probability_Class0=(len(grouped_data[0])/Total_instances)
Prior_Probability_Class1=(len(grouped_data[1])/Total_instances)

print("P(C0)=",Prior_Probability_Class0)
print("P(C1)=",Prior_Probability_Class1)

Pta=np.ones(Number_Features)
Ptb=np.ones(Number_Features)
Predicted_Classes=np.zeros(len(test),dtype=int)

    
for j in range(0,len(test)):
    
    print("Test Case",j+1)

    for i in range(0,Number_Features):
        Class0_total=len(grouped_data[0])
        if test[j][i]==0:
            Pta[i]=((np.count_nonzero(grouped_data[0][:,i]==0)+1)/(Class0_total+Number_Classes))
        elif test[j][i]==1:
            Pta[i]=((np.count_nonzero(grouped_data[0][:,i]==1)+1)/(Class0_total+Number_Classes))
        #print("P(%d|C1)="%(i),Pta[i])


    for i in range(0,Number_Features):
        Class1_total=len(grouped_data[1])
        if test[j][i]==0:
            Ptb[i]=((np.count_nonzero(grouped_data[1][:,i]==0)+1)/(Class1_total+Number_Classes))
        elif test[j][i]==1:
            Ptb[i]=((np.count_nonzero(grouped_data[1][:,i]==1)+1)/(Class1_total+Number_Classes))
        #print("P(%d|C2)="%(i),Ptb[i])

    Class0_Conditional=np.prod(Pta)
    Class1_Conditional=np.prod(Ptb)

    print("Class Conditionals P(A|C0)=",round(Class0_Conditional,7),",P(A|C1)=",round(Class1_Conditional,7))

    Posteriori_Class0=Class0_Conditional*Prior_Probability_Class0
    Posteriori_Class1=Class1_Conditional*Prior_Probability_Class1
    
    print("Posteriori Probability P(A|C0)P(C0)=",round(Posteriori_Class0,7),",P(A|C1)P(C1)=",round(Posteriori_Class1,7))

    
    if Posteriori_Class0>Posteriori_Class1:
        print("Class 0")
        Predicted_Classes[j]=0
    else:
        print("Class 1")
        Predicted_Classes[j]=1
    
    print("\n")
    
print("Predicted Classes for Test : ", Predicted_Classes)

P(C0)= 0.35
P(C1)= 0.65
Test Case 1
Class Conditionals P(A|C0)= 0.0045673 ,P(A|C1)= 0.0037002
Posteriori Probability P(A|C0)P(C0)= 0.0015986 ,P(A|C1)P(C1)= 0.0024051
Class 1


Test Case 2
Class Conditionals P(A|C0)= 6.97e-05 ,P(A|C1)= 1.45e-05
Posteriori Probability P(A|C0)P(C0)= 2.44e-05 ,P(A|C1)P(C1)= 9.4e-06
Class 0


Test Case 3
Class Conditionals P(A|C0)= 1.12e-05 ,P(A|C1)= 0.0264412
Posteriori Probability P(A|C0)P(C0)= 3.9e-06 ,P(A|C1)P(C1)= 0.0171868
Class 1


Test Case 4
Class Conditionals P(A|C0)= 8.92e-05 ,P(A|C1)= 0.0396618
Posteriori Probability P(A|C0)P(C0)= 3.12e-05 ,P(A|C1)P(C1)= 0.0257801
Class 1


Predicted Classes for Test :  [1 0 1 1]
