In [20]:
!pip install tabulate sklearn



# Part B: Naive Bayes Classifier

This notebook will introduce you to the basics of the Naive Bayes Algorithm for classification tasks. It includes the following content:

- A brief overview of the Naive Bayes (NB) Classifier
- An example exercise of performing inference with NB


## What is a classifier?

A classifier is a machine learning model that is used to discriminate different objects based on specific features. Given sample data $X$, a classifier predicts the class $y$ it belongs to.

## What is a Naive Bayes Classifier?

A Naive Bayes classifier is a probabilistic machine learning model for solving classification tasks. It is based on Bayes theorem and imposes a strong assumption on feature independence.

## Bayes Theorem

$$ P(A \mid B) = \frac{P(B \mid A) \, P(A)}{P(B)} $$

We can compute the probability of event A happening, given the fact that event B has occurred. Event B is the evidence and event A is the hypothesis. The assumption made by Naive Bayes is that the features are independent, i.e. the presence of one feature does not affect the other. Therefore it is called naive.

Under the context of classification tasks, given the observation $X$, the classifier casts prediction on the class $y$. It can also be rewritten (with $y$ and $X$ replacing $A$ and $B$) as

$$ P(y \mid X) = \frac{P(X \mid y) \, P(y)}{P(X)} $$

The formula consists of four components:

- $
P(y \mid X) :
\:$ The posterior probability, which is the probability of class $y$ given the observation $X$

- $
P(y) :
\:$ The Prior probability, which is the prior probability (initial belief) of class $y$

- $
P(X \mid y) :
\:$The Likelihood, which is the probability of observation $X$ given class $y$.

- $
P(X) :
\:$The Evidence, which is the probability of observation $X$.

In classification tasks, the variable $y$ is the class label. The variable X represents the parameters/features and it usually contains multiple features/dimensions:

$$ X = (x_1, x_2, x_3, ..., x_n) $$

where $x_1, x_2, ..., x_n$ are the features and they are assumed to be independent in NB, i.e. $ (\:x_i \: \bot \:  x_j \mid y)\:\: \text{for all features}$ ($i \neq j$ and $i, j \in \{1, 2, ...., n\}$). By expanding using the chain rule we obtained the following:

$$ P(y \mid x_1, x_2, ..., x_n) = \frac{P(x_1, x_2, ..., x_n \mid y) \, P(y)}{P(X)} = \frac{P(x_1 \mid y) P(x_2 \mid y) P(x_3 \mid y) \cdots P(x_n \mid y) \, P(y)}{P(x_1) P(x_2) P(x_3) \cdots P(x_n)} $$

The denominator ($P(X)$) of the Bayes rule remains the same for all classes. Therefore, we can exclude it when performing inference since it is just a term for normalization. Therefore, based on the assumption of feature independence and ignoring the denominator the NB formula can be written as follows:

$$ P(\: y \mid x_1,x_2,...,x_n)\: \propto P(y) \prod_{i=1}^{i=n} P(\:x_i\mid y) $$

In (binary) classification tasks, the class variable $y$ has two outcomes. We need to find the class $y$ with maximum probability, i.e. $ y = argmax_y P(y) \prod_{i=1}^{i=n} P(\:x_i\mid y) $.

## An example exercise of performing inference with NB

We will use the following example to strengthen our understanding of NB. The example toy dataset is for classifying whether a person owns a pet. Observations $X$ contain three features, two categorical ("Gender" and "Education") and one numerical ("Income"), and class label $y$ (i.e. "Has_pet") corresponds to whether this person owns a pet.

In [21]:
from IPython.display import HTML, display
import tabulate
import pandas as pd

tab_cat = [["Female", "University", 103000,   "Yes"],
          ["Female", "HighSchool", 90500,   "No"],
          ["Female", "HighSchool", 114000,   "No"],
          ["Male",   "University", 102000,   "No"],
          ["Male",   "University", 75000,   "Yes"],
          ["Male",   "HighSchool", 90000,   "No"],
          ["Male",   "HighSchool", 85000,   "Yes"],
          ["Male",   "University", 86000,   "No"]]

display(HTML(tabulate.tabulate(tab_cat, tablefmt='html')))

0,1,2,3
Female,University,103000,Yes
Female,HighSchool,90500,No
Female,HighSchool,114000,No
Male,University,102000,No
Male,University,75000,Yes
Male,HighSchool,90000,No
Male,HighSchool,85000,Yes
Male,University,86000,No


<div class='alert alert-block alert-success' style="font-weight:bolder">

### Task 2a - Compute the Likelihood table of having pet, for each categorical feature, as well as the marginal probability.

- $P(Gender|Has\_pet)$: $P(Male|Yes)$, $P(Female|Yes)$, $P(Male|No)$, $P(Female|No)$
    
- $P(Education|Has\_pet)$: $P(University|Yes)$, $P(HighSchool|Yes)$, $P(University|No)$, $P(HighSchool|No)$
    
</div>

In [22]:
df = pd.DataFrame(tab_cat, columns=["Gender", "Education", "Income", "Has_pet"])


def probability_gender(df, has_pet, gender):
    # Get number of elements in the dataframe
    nb_elements = df.shape[0]

    # Get the % of element that has or has not a pet (divided by the number of elements in the df)
    nb_has_pet = len(df[df.Has_pet == has_pet]) / nb_elements

    # Get the % of the given gender (divided by the number of elements in the df)
    nb_gender = len(df[df.Gender == gender]) / nb_elements

    # Get the % of gender who has or has not a pet (divided by the total number of the same gender)
    # P(has_pet|Gender)
    nb_pet_owners_who_are_gender = len(df[(df['Gender'] == gender) & (df['Has_pet'] == has_pet)]) / len(df[df.Gender == gender])

    return round((nb_pet_owners_who_are_gender * nb_gender) / nb_has_pet, 2)


marginal_gender = round(df['Gender'].value_counts() / df.shape[0], 3)

marginal_has_pets = round(df['Has_pet'].value_counts() / df.shape[0], 3)

tab_likelihood_gender = [
    ["likelihood", "-", "Has_pet", "-", "-"],
    ["-", "-", "Yes", "No", "P(Gender)"],
    ["Gender", "Male", probability_gender(df, "Yes", "Male"), probability_gender(df, "No", "Male"), marginal_gender["Male"]],
    ["-", "Female", probability_gender(df, "Yes", "Female"), probability_gender(df, "No", "Female"), marginal_gender["Female"]],
    ["-", "P(Has_pet)", marginal_has_pets["Yes"], marginal_has_pets["No"], ""]
]
display(HTML(tabulate.tabulate(tab_likelihood_gender, tablefmt='html')))


def probability_education(df, has_pet, education):
    # Get number of elements in the dataframe
    nb_elements = df.shape[0]

    # Get the % of element that has or has not a pet (divided by the number of elements in the df)
    nb_has_pet = len(df[df.Has_pet == has_pet]) / nb_elements

    # Get the % of the given education (divided by the number of elements in the df)
    nb_education = len(df[df.Education == education]) / nb_elements

    # Get the % of education who has or has not a pet (divided by the total number of the same gender)
    nb_pet_owners_who_are_education = len(df[(df['Education'] == education) & (df['Has_pet'] == has_pet)]) / len(df[df.Education == education])

    return round((nb_pet_owners_who_are_education * nb_education) / nb_has_pet, 2)


marginal_education = round(df['Education'].value_counts() / df.shape[0], 3)

tab_likelihood_gender = [
    ["likelihood", "-", "Has_pet", "-", "-"],
    ["-", "-", "Yes", "No", "P(Education)"],
    ["Education", "University", probability_education(df, "Yes", "University"),
     probability_education(df, "No", "University"), marginal_education["University"]],
    ["-", "HighSchool", probability_education(df, "Yes", "HighSchool"), probability_education(df, "No", "HighSchool"),
     marginal_education["HighSchool"]],
    ["-", "P(Has_pet)", marginal_has_pets["Yes"], marginal_has_pets["No"], ""]
]
display(HTML(tabulate.tabulate(tab_likelihood_gender, tablefmt='html')))

0,1,2,3,4
likelihood,-,Has_pet,-,-
-,-,Yes,No,P(Gender)
Gender,Male,0.67,0.6,0.625
-,Female,0.33,0.4,0.375
-,P(Has_pet),0.375,0.625,


0,1,2,3,4
likelihood,-,Has_pet,-,-
-,-,Yes,No,P(Education)
Education,University,0.67,0.4,0.5
-,HighSchool,0.33,0.6,0.5
-,P(Has_pet),0.375,0.625,


<div class='alert alert-block alert-success' style="font-weight:bolder">

### Task 2b - Compute the posterior probability

- $P(\text{No}|\text{Male})$, $P(\text{Yes}|\text{Female})$
    
- $P(\text{Yes}|\text{Univeristy})$, $P(\text{No}|\text{HighSchool})$

</div>


In [23]:
def posterior_probability(df, is_gender, feature, has_pets):
    if is_gender:
        nb_pet_owners_who_are_gender = len(df[(df['Gender'] == feature) & (df['Has_pet'] == has_pets)]) / len(df[df.Gender == feature])
        return round(nb_pet_owners_who_are_gender, 2)
    else:
        nb_pet_owners_who_are_education = len(df[(df['Education'] == feature) & (df['Has_pet'] == has_pets)]) / len(df[df.Education == feature])
        return round(nb_pet_owners_who_are_education, 2)


# P(No | Male)
print("Posterior probability for P(No|Male):", (posterior_probability(df, True, "Male", "No")))
# P(Yes | Female)
print("Posterior probability for P(Yes|Female):", posterior_probability(df, True, "Female", "Yes"))
# P(No | University)
print("Posterior probability for P(No|HighSchool):", posterior_probability(df, False, "HighSchool", "No"))
# P(Yes | University)
print("Posterior probability for P(Yes|University):", posterior_probability(df, False, "University", "Yes"))

Posterior probability for P(No|Male): 0.6
Posterior probability for P(Yes|Female): 0.33
Posterior probability for P(No|HighSchool): 0.75
Posterior probability for P(Yes|University): 0.5


<div class='alert alert-block alert-success' style="font-weight:bolder">

### Task 2c - Compute the Likelihood of having pets using mean, standard deviation, and normal distribution function:

- Mean: $ \mu = \frac{1}{n} \sum^{n}_{i=1}{x_i} $
    
- Standard Deviation $ \sigma = \left[ \frac{1}{n-1} \sum^{n}_{i=1}{(x_i-\mu)^2} \right]^\frac{1}{2}  $
    
- Normal Distribution $f(x)=\dfrac{1}{\sigma\sqrt{2\pi}}\,e^{-\dfrac{(x-\mu)^2}{2\sigma{}^2}}$
    
Compute $L( \text{Income}=90000 \mid \text{Yes})$, $L( \text{Income}=90000 \mid \text{No})$

</div>

In [30]:
from scipy.stats import norm

income = 90000

having_pets_mean = df['Has_pet'].map({'Yes': 1, 'No': 0}).mean()

# Standard Deviation
having_pets_std_dev = (df['Has_pet'].map({'Yes': 1, 'No': 0})).std()

# Normal distribution
normal_distribution = norm(having_pets_mean, having_pets_std_dev)

probability_has_pet = normal_distribution.pdf(1)  # Probability of having a pet (Yes)
probability_no_pet = normal_distribution.pdf(0)  # Probability of not having a pet (No)

print(f"Have pets = Mean: {having_pets_mean:>8.3f} ; Standard deviation: {having_pets_std_dev:>8.3f}")
print(f"Normal Distribution for having a pet: {probability_has_pet:>12.3f}")
print(f"Normal Distribution for not having a pet: {probability_no_pet:>8.3f}\n")

nb_income = len(df[df.Income == income])

L_inc_yes = nb_income / len(df[df.Has_pet == "Yes"])
L_inc_no = nb_income / len(df[df.Has_pet == "No"])

print("L(Income = 90000 | Yes):", L_inc_yes)
print("L(Income = 90000 | No):", L_inc_no)



Have pets = Mean:    0.375 ; Standard deviation:    0.518
Normal Distribution for having a pet:        0.372
Normal Distribution for not having a pet:    0.593

L(Income = 90000 | Yes): 0.3333333333333333
L(Income = 90000 | No): 0.2


<div class='alert alert-block alert-success' style="font-weight:bolder">

### Task 2d - Making inference for the given examples

- $X=(Education=University, Gender=Female, Income=100000)$
    
- $X=(Education=HighSchool, Gender=Male, Income=92000)$

</div>



In [31]:
import math

def probability_income(df, income, df_income):
    count = len(df_income)
    mu = (1/len(df)) * count

    sommation = 0.0
    for i in range(count) :
        sommation = sommation + (income - mu)**2

    sig = ((1/(len(df)-1)) * sommation)**(1/2)

    exp = math.exp(- ((income - mu)**2) / (2 * (sig**2) ))
    square = math.sqrt(2 * math.pi)
    result =  (1/(sig * square))*exp
    return result


def inference(df, has_pet, gender, education, income):
    df_income = df["Income"]

    P_gender_has_pet = probability_gender(df, has_pet, gender)
    P_education_has_pet = probability_education(df, has_pet, education)
    P_income_has_pet = probability_income(df, income, df_income)

    P_gender = round(len(df[df.Gender == gender]) / df.shape[0], 3)
    P_education = round(len(df[df.Education == education]) / df.shape[0], 3)
    P_income = len(df_income) / len(df)

    having_pets_mean = df['Has_pet'].map({'Yes': 1, 'No': 0}).mean()

    P_X = (P_gender_has_pet * P_education_has_pet * P_income_has_pet * marginal_has_pets[has_pet]) / (P_gender * P_education * P_income)

    return P_X

P_has_pet_1 = inference(df, "Yes", "Female", "University", 100000)
P_has_no_pet_1 = inference(df, "No", "Female", "University", 100000)
P_has_pet_2 = inference(df, "Yes", "Male", "HighSchool", 92000)
P_has_no_pet_2 = inference(df, "No", "Male", "HighSchool", 92000)

tab_Has_pet = [
    ["","",  "X1", "X2"],
    ["","",  "University", "HighSchool"],
    ["","",  "Female",  "Male"],
    ["","",  "100,000", "92,000"],
    ["","",  "_______________", "_____________"],
    ["Has_Pet", "Yes",P_has_pet_1 ,P_has_pet_2],
    ["", "No", P_has_no_pet_1, P_has_no_pet_2]
]
display(HTML(tabulate.tabulate(tab_Has_pet, tablefmt='html')))


0,1,2,3
,,X1,X2
,,University,HighSchool
,,Female,Male
,,100000,92000
,,_______________,_____________
Has_Pet,Yes,1.065450647621758e-06,6.948597222460778e-07
,No,1.2850301795528517e-06,1.8856437510069952e-06


<div class='alert alert-block alert-success' style="font-weight:bolder">

### Task 2e (Extra Credit) Implementing a Naive Bayes Classifier and performing classification on the Iris dataset. Note that the Iris dataset only contains numerical features.

</div>




In [None]:
from sklearn.datasets import load_iris

iris = load_iris()
X, y = iris["data"], iris["target"]
print("data", X)
print("class/label", y)