## Naive Bayes Classification

In Naive Bayes, the probability of Event A happening given Condition B is given by the formula:

<img src="img/NaiveBayes1.png" width=200 height=10 />

In a classifier, this can be rewritten as the probability of Class Y happening given Input X:

<img src="img/NaiveBayesXY.png" width=200 height=10 />

When there are multiple inputs X1, X2, ... Xn, the formula becomes:

<img src="img/NaiveBayesXnY.png" width=420 height=10 /> 

When there are multiple classes of Y, the above formula repeats for each class of Y: ie Y0, Y1, ... Yn etc

When using Naive Bayes for prediction, we are trying to find out which class of Y has the highest probability of happening.

We therefore calculate the P(y|x) for each class of Y, and see which P(y|x) is the maximum

Since the denominator P(x1)P(x2)...P(xn) is constant among all P(y|x), the denominator is removed from the calculation.  Therefore, the outcome of the formula DOES NOT show the actual predicted probability for each class.  It merely shows which class has the highest probability among the classes

Therefore, the likelihood of the data belonging to a certain Class Y given inputs X1, X2, ... Xn is given by the formula:

<img src="img/NaiveBayesXYargmax.png" width=250 height=10 />

P(x|y), or probability of input X given class Y, is estimated from the Gaussian probability density function of X for each class of Y

P(y), or probability of a certain Class Y happening, is simply the number of Class Y divided by the total number of all Y

All formulas in the notebook are written from scratch as much as possible.  Only basic functions in Python such as mean, square root, standard deviation, exponential and pi are used 

#### Read in the data

In [1]:
import pandas as pd
import numpy as np
from math import sqrt
from math import exp
from math import pi
import random

In [2]:
df = pd.read_csv('SensorData.csv')

In [210]:
df = pd.read_csv('GeneratedData.csv')

In [3]:
df.dtypes

light_intensity      int64
temperature        float64
humidity           float64
label                int64
dtype: object

In [4]:
df = df.sample(frac=1).reset_index(drop=True)  # reset index and shuffle the df

In [5]:
df

Unnamed: 0,light_intensity,temperature,humidity,label
0,99,23.92,73.8,2
1,50,23.52,65.8,2
2,767,25.87,90.4,0
3,92,23.92,79.8,2
4,272,28.26,83.1,1
...,...,...,...,...
867,766,24.12,95.0,0
868,142,23.32,77.8,2
869,257,28.00,83.2,1
870,78,22.66,76.5,2


#### Train Test Split

In [6]:
dftrain = df.sample(frac=0.8, random_state=42) #random state is a seed value
dftest = df.drop(dftrain.index)

In [7]:
dftrain = dftrain.reset_index(drop=True)

In [8]:
dftest = dftest.reset_index(drop=True)

In [11]:
len(dftrain), len(dftest)

(698, 174)

#### Split training dataset by class

In [12]:
df0 = dftrain.loc[dftrain['label'] == 0]

In [13]:
df1 = dftrain.loc[dftrain['label'] == 1] 

In [14]:
df2 = dftrain.loc[dftrain['label'] == 2] 

In [15]:
len(df0), len(df1), len(df2)

(234, 232, 232)

#### Calculate P(y) for each class in the training dataset

In [16]:
py0 = float(len(df0))/float(len(dftrain))

In [17]:
py1 = float(len(df1))/float(len(dftrain))

In [18]:
py2 = float(len(df2))/float(len(dftrain))

In [19]:
py0, py1, py2

(0.335243553008596, 0.332378223495702, 0.332378223495702)

#### Calculate mean and standard deviation for each class in the training dataset

In [20]:
mean0 = df0.mean(axis = 0, skipna = True)
stdev0 = df0.std(axis = 0, skipna = True)

In [21]:
# remove the mean and standard deviation for the 'label' column since we don't need them
mean0 = mean0.drop(mean0.index[-1])
stdev0 = stdev0.drop(stdev0.index[-1])

In [22]:
mean1 = df1.mean(axis = 0, skipna = True)
stdev1 = df1.std(axis = 0, skipna = True)

In [23]:
mean1 = mean1.drop(mean1.index[-1])
stdev1 = stdev1.drop(stdev1.index[-1])

In [24]:
mean2 = df2.mean(axis = 0, skipna = True)
stdev2= df2.std(axis = 0, skipna = True)

In [25]:
mean2 = mean2.drop(mean2.index[-1])
stdev2 = stdev2.drop(stdev2.index[-1])

In [26]:
print("\ndf0 Class 0 Mean \n{} \n\ndf0 Class 0 SD \n{}".format(mean0, stdev0))
print("\ndf1 Class 1 Mean \n{} \n\ndf1 Class 1 SD \n{}".format(mean1, stdev1))
print("\ndf2 Class 2 Mean \n{} \n\ndf2 Class 2 SD \n{}".format(mean2, stdev2))


df0 Class 0 Mean 
light_intensity    729.752137
temperature         24.759786
humidity            93.861111
dtype: float64 

df0 Class 0 SD 
light_intensity    39.064689
temperature         0.801466
humidity            2.285311
dtype: float64

df1 Class 1 Mean 
light_intensity    282.103448
temperature         29.140172
humidity            85.312069
dtype: float64 

df1 Class 1 SD 
light_intensity    47.620229
temperature         0.556113
humidity            2.296248
dtype: float64

df2 Class 2 Mean 
light_intensity    88.310345
temperature        23.064871
humidity           73.033621
dtype: float64 

df2 Class 2 SD 
light_intensity    40.239309
temperature         0.360026
humidity            6.072426
dtype: float64


#### Calculate P(x|y) using Gaussian Probability Distribution 

P(x|y), or the probability of observing x given y, can be estimated by assuming that x is distributed in a normal/Gaussian distribution. The formula for Gaussian Probability Density Function is:

<img src="img/Gaussian_pdf.png" width=260 height=60 />

In [27]:
def pdf_prob(x, mean, stdev):
    exponent = exp(-0.5 * (((x-mean)/stdev)**2) )
    prob = exponent / (stdev * sqrt(2 * pi) )
    return prob

In [28]:
pdf_prob(1, 1, 1) # testing

0.3989422804014327

#### Calculate prediction accuracy of testing dataset (using mean and stdev from training dataset)

In [244]:
# dftest = pd.read_csv('CleanedData.csv')
# Test with unseen labeled data

In [30]:
# initialise actual/predicted counters for confusion matrix
c00 = c10 = c20 = c01 = c11 = c21 = c02 = c12 = c22 = 0

In [31]:
for i in range(len(dftest)):

#     p_xlight = pdf_prob(dftest.loc[i, 'light_intensity'], mean['light_intensity'], stdev['light_intensity'])
#     p_xtemp = pdf_prob(dftest.loc[i, 'temperature'], mean['temperature'], stdev['temperature'])
#     p_xhumid = pdf_prob(dftest.loc[i, 'humidity'], mean['humidity'], stdev['humidity'])
    
    p_xlight_y0 = pdf_prob(dftest.loc[i, 'light_intensity'], mean0['light_intensity'], stdev0['light_intensity'])
    p_xtemp_y0 = pdf_prob(dftest.loc[i, 'temperature'], mean0['temperature'], stdev0['temperature'])
    p_xhumid_y0 = pdf_prob(dftest.loc[i, 'humidity'], mean0['humidity'], stdev0['humidity'])
    
    p_xlight_y1 = pdf_prob(dftest.loc[i, 'light_intensity'], mean1['light_intensity'], stdev1['light_intensity'])
    p_xtemp_y1 = pdf_prob(dftest.loc[i, 'temperature'], mean1['temperature'], stdev1['temperature'])
    p_xhumid_y1 = pdf_prob(dftest.loc[i, 'humidity'], mean1['humidity'], stdev1['humidity'])

    p_xlight_y2 = pdf_prob(dftest.loc[i, 'light_intensity'], mean2['light_intensity'], stdev2['light_intensity'])
    p_xtemp_y2 = pdf_prob(dftest.loc[i, 'temperature'], mean2['temperature'], stdev2['temperature'])
    p_xhumid_y2 = pdf_prob(dftest.loc[i, 'humidity'], mean2['humidity'], stdev2['humidity'])

    pyx0 = p_xlight_y0 * p_xtemp_y0 * p_xhumid_y0 * py0
    pyx1 = p_xlight_y1 * p_xtemp_y1 * p_xhumid_y1 * py1
    pyx2 = p_xlight_y2 * p_xtemp_y2 * p_xhumid_y2 * py2
    
#     px = p_xlight * p_xtemp * p_xhumid    
#     pyx0 = p_xlight_y0 * p_xtemp_y0 * p_xhumid_y0 * py0 / px
#     pyx1 = p_xlight_y1 * p_xtemp_y1 * p_xhumid_y1 * py1 / px
#     pyx2 = p_xlight_y2 * p_xtemp_y2 * p_xhumid_y2 * py2 / px
    
    infer_dict = {pyx0:0, pyx1:1, pyx2:2}
    dftest.loc[i, 'predicted_class'] = infer_dict[max(pyx0, pyx1, pyx2)]
    dftest.loc[i, 'probability'] = max(pyx0, pyx1, pyx2)
#     df.loc[i, 'probability'] = round(max(pyx0, pyx1, pyx2), 2)
    
    if dftest.loc[i, 'predicted_class'] == 0:
        if dftest.loc[i, 'label'] == 0:
            c00 = c00 + 1
        elif dftest.loc[i, 'label'] == 1:
            c10 = c10 + 1
        elif dftest.loc[i, 'label'] == 2:
            c20 = c20 + 1
            
    elif dftest.loc[i, 'predicted_class'] == 1:
        if dftest.loc[i, 'label'] == 0:
            c01 = c01 + 1
        elif dftest.loc[i, 'label'] == 1:
            c11 = c11 + 1
        elif dftest.loc[i, 'label'] == 2:
            c21 = c21 + 1
            
    elif dftest.loc[i, 'predicted_class'] == 2:
        if dftest.loc[i, 'label'] == 0:
            c02 = c02 + 1
        elif dftest.loc[i, 'label'] == 1:
            c12 = c12 + 1
        elif dftest.loc[i, 'label'] == 2:
            c22 = c22 + 1

In [32]:
dftest = dftest.astype({"predicted_class": int})

In [33]:
dftest.dtypes

light_intensity      int64
temperature        float64
humidity           float64
label                int64
predicted_class      int32
probability        float64
dtype: object

In [34]:
dftest

Unnamed: 0,light_intensity,temperature,humidity,label,predicted_class,probability
0,492,31.62,85.5,1,1,1.005341e-12
1,569,31.53,85.5,1,1,4.436465e-16
2,415,31.70,85.6,1,1,1.757052e-10
3,331,31.54,85.5,1,1,1.845872e-08
4,403,31.57,85.5,1,1,9.858849e-10
...,...,...,...,...,...,...
121,754,27.74,99.6,0,0,1.042253e-08
122,759,27.62,99.9,0,0,1.174703e-08
123,764,27.75,99.9,0,0,5.856594e-09
124,768,27.50,99.4,0,0,2.827285e-08


In [35]:
max(dftest['probability'])

0.0002136095540132183

In [36]:
min(dftest['probability'])

4.4364646409081524e-16

In [37]:
c00, c10, c20, c01, c11, c21, c02, c12, c22

(91, 0, 0, 0, 14, 0, 0, 0, 21)

In [38]:
print ('\033[1m' + '\n ****** CONFUSION MATRIX ******' + '\033[0m' )
print ('\n                 Predicted' )
print ('\n               0      1     2' )
print ('\n         0     {}    {}     {}'.format(c00, c01, c02) )
print ('\n Actual  1     {}    {}     {}'.format(c10, c11, c12) )
print ('\n         2     {}    {}     {}'.format(c20, c21, c22) )
print ('\n')

[1m
 ****** CONFUSION MATRIX ******[0m

                 Predicted

               0      1     2

         0     91    0     0

 Actual  1     0    14     0

         2     0    0     21




In [39]:
accuracy = float(c00+c11+c22)/len(dftest)*100
accuracy = round(accuracy, 2)
falsealarm = float(c10 + c20 + c01 + c21 + c02 + c12)/len(dftest)*100
falsealarm = round(falsealarm, 2)

In [40]:
print ('Accuracy =', accuracy,'%')
print ('False Alarm Rate =', falsealarm,'%')

Accuracy = 100.0 %
False Alarm Rate = 0.0 %


In [41]:
c00 + c10 + c20 + c01 + c11 + c21 + c02 + c12 + c22 == len(dftest)
# sanity check

True

#### Infer class for new data

In [42]:
# Class 0 Optimal [light, temp, humidity]
sensorReadings = [771, 27, 97]

In [322]:
# Class 0 Optimal [light, temp, humidity]
sensorReadings = [360, 27, 97]

In [71]:
# Class 1 Sub Optimal [light, temp, humidity]
sensorReadings = [290, 30, 90]

In [83]:
# Class 2 Bad [light, temp, humidity]
sensorReadings = [200, 25, 85]

#### Calculate P(x|y) for new data (using mean and stdev from training dataset)

In [43]:
p_xlight_y0 = pdf_prob(sensorReadings[0], mean0['light_intensity'], stdev0['light_intensity'])
p_xtemp_y0 = pdf_prob(sensorReadings[1], mean0['temperature'], stdev0['temperature'])
p_xhumid_y0 = pdf_prob(sensorReadings[2], mean0['humidity'], stdev0['humidity'])

In [44]:
p_xlight_y1 = pdf_prob(sensorReadings[0], mean1['light_intensity'], stdev1['light_intensity'])
p_xtemp_y1 = pdf_prob(sensorReadings[1], mean1['temperature'], stdev1['temperature'])
p_xhumid_y1 = pdf_prob(sensorReadings[2], mean1['humidity'], stdev1['humidity'])

In [45]:
p_xlight_y2 = pdf_prob(sensorReadings[0], mean2['light_intensity'], stdev2['light_intensity'])
p_xtemp_y2 = pdf_prob(sensorReadings[1], mean2['temperature'], stdev2['temperature'])
p_xhumid_y2 = pdf_prob(sensorReadings[2], mean2['humidity'], stdev2['humidity'])

In [46]:
p_xlight_y0, p_xtemp_y0, p_xhumid_y0

(0.005848294155191346, 0.010011241518495705, 0.06796920729126)

In [47]:
p_xlight_y1, p_xtemp_y1, p_xhumid_y1

(1.0845574620055595e-25, 0.00043618724097817523, 4.1114067488576334e-07)

In [48]:
p_xlight_y2, p_xtemp_y2, p_xhumid_y2

(3.1145331808128705e-65, 1.2663953143333386e-26, 2.723155072361637e-05)

#### Find P(y|x) with the highest probability

In [49]:
pyx0 = p_xlight_y0 * p_xtemp_y0 * p_xhumid_y0 * py0
pyx1 = p_xlight_y1 * p_xtemp_y1 * p_xhumid_y1 * py1
pyx2 = p_xlight_y2 * p_xtemp_y2 * p_xhumid_y2 * py2

In [50]:
infer_dict = {pyx0:0, pyx1:1, pyx2:2}

In [51]:
infer_dict

{1.334104308950124e-06: 0, 6.464702312458971e-36: 1, 3.5699915861222233e-96: 2}

In [54]:
inferred_class = infer_dict[max(pyx0, pyx1, pyx2)]

In [55]:
inferred_class

0

In [57]:
label_dict = {0:'Perfect!', 1:"Sub optimal", 2:"Bad"}

In [58]:
label = label_dict[inferred_class]

In [59]:
label

'Perfect!'