# <span style="color:#0b486b">SIT 112 - Data Science Concepts</span>

---
Lecturer: Truyen Tran | truyen.tran@deakin.edu.au<br />

School of Information Technology, <br />
Deakin University, VIC 3216, Australia.

---
## <span style="color:#0b486b">Practical Session 10: Naive Bayes Classifier</span> 

## <span style="color:#0b486b">Naive Bayes Classifier</span> 


Naive Bayes is one of the most practical classification machine learning algorithms. 

* fast
* good performance
* simple yet very effective
* robust to irrelative features

So why is it called naive?

Because it does not consider the dependency between features and assume all features are independent of each other which is not the case in reality. This is a naive assumption, hence the name.

The accuracy is very good although this naive assumption. A famous example of NB usage is spam filtering.

---
### Example1

We assume we have collected the below data for the past 5 days. Based on this data, can we predict if our subject will play in a setting like:

    outlook  = overcast
    temp     = hot
    humidity = normal
    windy    = no

<!-- <img src="nb_data.png" width="800"> -->
<img src="nb_data.png" width="800">
<br />

First we have to find a representation for our data. We can construct a dictionary to convert stings into numbers and then save them in a dataframe. 

    outlook: sunny=0, overcast=1, rainy=2
    temp: hot=0, mild=1, cool=2
    humidity: normal=0, high=1
    wind: no=0, yes=1
    play: np=0, yes=1

In [1]:
import numpy as np
import pandas as pd

In [2]:
data = {
    'outlook': [0, 1, 2, 0, 1],
    'temp'   : [0, 1, 2, 1, 0],
    'humid'  : [0, 0, 1, 0, 1],
    'wind'   : [0, 0, 1, 1, 0],
    'play'   : [1, 1, 0, 0, 0,]    
}

df = pd.DataFrame(data)

Now we use Bayes rule to construct a Naive Bayes classifier. We can write:

$$Pr\left(p|o,t,h,w\right)\propto Pr\left(p\right)Pr(o|p)Pr(t|p)Pr(h|p)Pr(w|p)$$

To calculate $Pr(p)$ we use marginal probablity.

In [3]:
def marginal_prob(df, column):
    '''
    Compute the marignal probability for values in a column
    '''
    # an array contain pairs of (value, count)
    vals_counts = [(val, (df[column] == val).sum()) for val in set(df[column])]
    total_count = sum([count for val, count in vals_counts])
    
    # an array contain pairs of (value, probability)
    vals_probs = [(val, count/total_count) for val, count in vals_counts]
    # a dictionary in which keys are val and values are the corresponding probabilities
    return dict(vals_probs)

To calculate probability of a feature given the class (play) we use conditinoal probability.

In [4]:
def conditional_prob(df, feature, c, val):
    # c is the class (0 or 1)
    df2 = df[df[c] == val][feature]
    vals_counts = [[val, (df2 == val).sum() + 1e-8] for val in set(df[feature])]
    total_count = sum([count for val, count in vals_counts])
    
    vals_probs = [(val, count/total_count) for val, count in vals_counts]
    return dict(vals_probs)

Now we can use Bayes rule:

In [5]:
o = 1
t = 0
h = 0
w = 0

c = 0
p0 = marginal_prob(df, 'play')[c] * conditional_prob(df, 'outlook', 'play', c)[o] * conditional_prob(df, 'temp', 'play', c)[t] \
* conditional_prob(df, 'humid', 'play', c)[h] * conditional_prob(df, 'wind', 'play', c)[w]

c = 1
p1 = marginal_prob(df, 'play')[c] * conditional_prob(df, 'outlook', 'play', c)[o] * conditional_prob(df, 'temp', 'play', c)[t] \
* conditional_prob(df, 'humid', 'play', c)[h] * conditional_prob(df, 'wind', 'play', c)[w]

# normalizing
p_sum = p0 + p1
p0 /= p_sum
p1 /= p_sum

print("probability of not playing: {}".format(p0))
print("probability of playing    : {}".format(p1))

probability of not playing: 0.0689655189536266
probability of playing    : 0.9310344810463733


---
### Example 2

Suppose we have documents below as our training set. 

    d1: Chinese Beijing Chinese , class = C
    d2: Chinese Chinese Shanghai, class = C
    d3: Chinese Macao           , class = C
    d4: Tokyo Japan Chinese     , class = J


**Exercise:** Train a NB classifier and predict if `d5` belongs to class C or J.

    d5: Chinese Chinese Chinese Tokyo Japan, class = ?

In [6]:
# Create data frame
# class 0 is C and class 1 is J
data = {
    'Chinese' : [2, 2, 1, 1],
    'Beijing' : [1, 0, 0, 0],
    'Shanghai': [0, 1, 0, 0],
    'Macao'   : [0, 0, 1, 0],
    'Tokyo'   : [0, 0, 0, 1],
    'Japan'   : [0, 0, 0, 1],
    'class'   : [0, 0, 0, 1]
}

df = pd.DataFrame(data)

In [7]:
# New data point d5
# Can you think of any strategy to make the code shorter and more efficient?
chinese  = 2
beijing  = 0
shanghai = 0
macao    = 0
tokyo    = 1
japan    = 1

c = 0
print(conditional_prob(df, 'Tokyo', 'class', c))
p0 = marginal_prob(df, 'class')[c] \
    * conditional_prob(df, 'Chinese', 'class', c)[chinese] \
    * conditional_prob(df, 'Beijing', 'class', c)[beijing] \
    * conditional_prob(df, 'Shanghai', 'class', c)[shanghai] \
    * conditional_prob(df, 'Macao', 'class', c)[macao] \
    * conditional_prob(df, 'Tokyo', 'class', c)[tokyo] \
    * conditional_prob(df, 'Japan', 'class', c)[japan] \

c = 1
p1 = marginal_prob(df, 'class')[c] \
    * conditional_prob(df, 'Chinese', 'class', c)[chinese] \
    * conditional_prob(df, 'Beijing', 'class', c)[beijing] \
    * conditional_prob(df, 'Shanghai', 'class', c)[shanghai] \
    * conditional_prob(df, 'Macao', 'class', c)[macao] \
    * conditional_prob(df, 'Tokyo', 'class', c)[tokyo] \
    * conditional_prob(df, 'Japan', 'class', c)[japan] \
    
# normalizing
p_sum = p0 + p1
p0 /= p_sum
p1 /= p_sum

print("probability of C: {}".format(p0))
print("probability of J: {}".format(p1))

{0: 0.99999999666666672, 1: 3.3333333111111114e-09}
probability of C: 6.584362464800417e-10
probability of J: 0.9999999993415637
