In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

#### Naive Bayes Classifier
Based on the `Bayes Theorem`, this algorithm allows us to predict if $ y $ happens given that $ x $ has happened.

`Bayes Theorem` : It describes the probability of A happening given that B has already occurred. Its mathematical notation goes like this:

$$ P(A|B) = \frac{P(B|A) \cdot P(A)}{P(B)} $$

Where:
- $ P(A|B) $ is the posterior probability of A given B (in simple words, the probability of A happening given that B happened).
- $ P(B|A) $ is the likelihood of B given A (in simple words, the probability of B happening given that A happened).
- $ P(A) $ is the prior probability of A (in simple words, the probability of A happening without any conditions).
- $ P(B) $ is the prior probability of B (in simple words, the probability of B happening without any conditions).

In our case, we need to find if $ y $ happens, given that $ x_1 , x_2 , \cdots, x_n $ happened.
So, the formula becomes:

$$ P(Y|x_1 , x_2 , \cdots, x_n) = \frac{P(x_1 , x_2 , \cdots, x_n|Y) \cdot P(Y)}{P(x_1 , x_2 , \cdots, x_n)} $$

Which can be simplified to: 
$$ P(Y|x_1 , x_2 , \cdots, x_n) = \frac{P(x_1|Y) \cdot P(x_2|Y) \cdots P(x_n|Y) \cdot P(Y)}{P(x_1) \cdot P(x_2) \cdots P(x_n)} $$

For $ Y $ being multiple different outcomes, say if $ Y $ consists of true or false outcomes, we simply substitute the value of the outcome in place of $ Y $.
$$ P(T|x_1 , x_2 , \cdots, x_n) = \frac{P(x_1|T) \cdot P(x_2|T) \cdots P(x_n|T) \cdot P(T)}{P(x_1) \cdot P(x_2) \cdots P(x_n)} $$

$$ P(F|x_1 , x_2 , \cdots, x_n) = \frac{P(x_1|F) \cdot P(x_2|F) \cdots P(x_n|F) \cdot P(F)}{P(x_1) \cdot P(x_2) \cdots P(x_n)} $$

In the above equations, we ignore the denominator term as it will stay constant among all different outcomes $ (Y) $.

$$ P(T|x_1 , x_2 , \cdots, x_n) = P(x_1|T) \cdot P(x_2|T) \cdots P(x_n|T) \cdot P(T) $$

$$ P(F|x_1 , x_2 , \cdots, x_n) = P(x_1|F) \cdot P(x_2|F) \cdots P(x_n|F) \cdot P(F) $$

#### Let's implement this with an example
| Outlook   | Temp | Humidity | Windy | Play |
|-----------|------|----------|-------|------|
| Rainy     | Hot  | High     | f     | no   |
| Rainy     | Hot  | High     | t     | no   |
| Overcast  | Hot  | High     | f     | yes  |
| Sunny     | Mild | High     | f     | yes  |
| Sunny     | Cool | Normal   | f     | yes  |
| Sunny     | Cool | Normal   | t     | no   |
| Overcast  | Cool | Normal   | t     | yes  |
| Rainy     | Mild | High     | f     | no   |
| Rainy     | Cool | Normal   | f     | yes  |
| Sunny     | Mild | Normal   | f     | yes  |
| Rainy     | Mild | Normal   | t     | yes  |
| Overcast  | Mild | High     | t     | yes  |
| Overcast  | Hot  | Normal   | f     | yes  |
| Sunny     | Mild | High     | t     | no   |

As, we saw above to pridct for a data-point, we need to calculate all the Liklihoods and prior Probability (how to do so is in `Pen and Paper`.)
Let's implement the function to do so.

In [2]:
data = [
    ['Rainy', 'Hot', 'High', 'f', 'no'],
    ['Rainy', 'Hot', 'High', 't', 'no'],
    ['Overcast', 'Hot', 'High', 'f', 'yes'],
    ['Sunny', 'Mild', 'High', 'f', 'yes'],
    ['Sunny', 'Cool', 'Normal', 'f', 'yes'],
    ['Sunny', 'Cool', 'Normal', 't', 'no'],
    ['Overcast', 'Cool', 'Normal', 't', 'yes'],
    ['Rainy', 'Mild', 'High', 'f', 'no'],
    ['Rainy', 'Cool', 'Normal', 'f', 'yes'],
    ['Sunny', 'Mild', 'Normal', 'f', 'yes'],
    ['Rainy', 'Mild', 'Normal', 't', 'yes'],
    ['Overcast', 'Mild', 'High', 't', 'yes'],
    ['Overcast', 'Hot', 'Normal', 'f', 'yes'],
    ['Sunny', 'Mild', 'High', 't', 'no']
]

dataset = pd.DataFrame(data)
dataset.columns = ['Outlook','Temp','Humidity','Windy','Play']

X = dataset.drop('Play',axis = 1)
y = dataset['Play']

print("Input values:\n",X)
print("Output values:\n",y)

Input values:
      Outlook  Temp Humidity Windy
0      Rainy   Hot     High     f
1      Rainy   Hot     High     t
2   Overcast   Hot     High     f
3      Sunny  Mild     High     f
4      Sunny  Cool   Normal     f
5      Sunny  Cool   Normal     t
6   Overcast  Cool   Normal     t
7      Rainy  Mild     High     f
8      Rainy  Cool   Normal     f
9      Sunny  Mild   Normal     f
10     Rainy  Mild   Normal     t
11  Overcast  Mild     High     t
12  Overcast   Hot   Normal     f
13     Sunny  Mild     High     t
Output values:
 0      no
1      no
2     yes
3     yes
4     yes
5      no
6     yes
7      no
8     yes
9     yes
10    yes
11    yes
12    yes
13     no
Name: Play, dtype: object


### Calculating Prior Probabilities

1. Count the occurrences of each outcome ("yes" and "no").

- Count of "yes" = 9
- Count of "no" = 5
- Total outcomes = 14

2. Calculate the prior probabilities  (count of outcome / total outcomes ):

$$ P(\text{Yes}) = \frac{9}{14} \approx 0.643 $$

$$ P(\text{No}) = \frac{5}{14} \approx 0.357 $$

Let's implement this in python.

In [3]:
#Calculating the prior probabilities
def _calculate_prior_probabilities(Y):
    """
    This function calculates the prior probabilities of the outcomes.

    Args:
        Y (ndarray): Array of outcomes.

    Returns:
        prior (dict): Dictionary with computed prior probabilities for each outcome.
        total_outcomes (list): List of tuples, each containing an outcome and its count.
    """
    
    prior = {}
    total_samples = len(Y)
    total_outcomes = []
    for outcome in np.unique(Y):
        outcome_count = np.sum(Y == outcome)
        prior[outcome] = outcome_count / total_samples  # This will store the probability of each outcome.
        total_outcomes.append((outcome, outcome_count))
    return prior, total_outcomes



### Calculating Likelihoods

1. Calculate the likelihoods for each feature value given each outcome.

For example, for the "Outlook" feature:

- $ P(\text{Outlook} = \text{Rainy} | \text{Play} = \text{No}) $
- $ P(\text{Outlook} = \text{Rainy} | \text{Play} = \text{Yes}) $

2. Count the occurrences of each feature value for each outcome.

- Count of "Rainy" given "No" = 3
- Count of "Rainy" given "Yes" = 2

3. Calculate the likelihoods:
We will simple divide each count with the total number of the respective outcomes.

$$ P(\text{Rainy} | \text{No}) = \frac{3}{5} = 0.6 $$
$$ P(\text{Rainy} | \text{Yes}) = \frac{2}{9} \approx 0.222 $$

Similarly, we can calculate this for others.

let's implement this in python.

In [4]:
def _calculate_likelihoods(X, Y, total_outcome):
    """
    This function calculates the likelihood of different events occurring with respect to a particular outcome.
    
    Args:
        X (ndarray): Input features.
        Y (ndarray): Outcomes.
        total_outcome (list): List of tuples, each containing an outcome and its count.
    
    Returns:
        likelihoods (dict): Calculated likelihoods for each feature-outcome pair.
    """
    likelihoods = {}
    rows, columns = X.shape

    for value in np.unique(X):
        for outcome in np.unique(Y):
            likelihoods[f"{value}_{outcome}"] = 0
        
    for i in range(rows):
        for j in range(columns):    
            outcome = Y[i]
            likelihoods[f"{X.iloc[i, j]}_{outcome}"] += 1

    for outcome, count in total_outcome:
        for key in likelihoods:
            if key.endswith(f'_{outcome}'):
                likelihoods[key] /= count

    return likelihoods


In [5]:
#Now that all the things are calculated, we can calculate the final probability now and make our prediction
prior,total_out = _calculate_prior_probabilities(y)
print("Prior: ", prior)
print("Total outcome: ", total_out)


likelihood = _calculate_likelihoods(X,y,total_out)
print("Likelihood", likelihood)

Prior:  {'no': 0.35714285714285715, 'yes': 0.6428571428571429}
Total outcome:  [('no', 5), ('yes', 9)]
Likelihood {'Cool_no': 0.2, 'Cool_yes': 0.3333333333333333, 'High_no': 0.8, 'High_yes': 0.3333333333333333, 'Hot_no': 0.4, 'Hot_yes': 0.2222222222222222, 'Mild_no': 0.4, 'Mild_yes': 0.4444444444444444, 'Normal_no': 0.2, 'Normal_yes': 0.6666666666666666, 'Overcast_no': 0.0, 'Overcast_yes': 0.4444444444444444, 'Rainy_no': 0.6, 'Rainy_yes': 0.2222222222222222, 'Sunny_no': 0.4, 'Sunny_yes': 0.3333333333333333, 'f_no': 0.4, 'f_yes': 0.6666666666666666, 't_no': 0.6, 't_yes': 0.3333333333333333}


#### Now that we have calculated the Likelihoods, let's write the function that predicts for a set of input

### Posterior Probability Calculation

Given a set of input features:
$$ \text{['Sunny', 'Hot', 'High', 'f']} $$

We calculate the posterior probabilities for each outcome, yes or no.

#### For the outcome "Yes":
$$ P(\text{Yes}|\text{['Sunny', 'Hot', 'High', 'f']}) = P(\text{Yes}) \times P(\text{Sunny}|\text{Yes}) \times P(\text{Hot}|\text{Yes}) \times P(\text{High}|\text{Yes}) \times P(\text{f}|\text{Yes}) $$

#### For the outcome "No":
$$ P(\text{No}|\text{['Sunny', 'Hot', 'High', 'f']}) = P(\text{No}) \times P(\text{Sunny}|\text{No}) \times P(\text{Hot}|\text{No}) \times P(\text{High}|\text{No}) \times P(\text{f}|\text{No}) $$

We compare these probabilities to determine the prediction. The outcome with the higher probability is our prediction.

To further make sense of the numbers we get after this calculation, we normalize them with respect to all the outcomes and multiply by 100 to convert them to percentages.

### Normalizing the Final Probabilities

To normalize the final probabilities, we use the following formula:
$$ \text{Normalized Probability} = \frac{\text{Posterior Probability}}{\sum \text{All Posterior Probabilities}} \times 100 $$

This will give us the percentage probability of each outcome

Let's Implement this fucntion.

In [6]:
def predict(X, Y, prior_dict, likelihood_dict):
    """
    This function takes in the input given by the user and, with the help of Bayes' theorem,
    predicts the output.

    Args:
        X (ndarray): Input features.
        Y (ndarray): Possible outcomes.
        prior_dict (dict): Calculated prior probabilities.
        likelihood_dict (dict): Calculated likelihoods.

    Returns:
        max_key (str): Predicted outcome.
        max_value (float): Percentage chance of that outcome occurring.
    """
    prob_outcome = {}

    for outcome in np.unique(Y):
        prob_outcome[outcome] = 1.0
    
    for outcome in np.unique(Y):
        for value in np.unique(X):
            prob_outcome[outcome] *= likelihood_dict[f"{value}_{outcome}"]
        prob_outcome[outcome] *= prior_dict[outcome]
    
    
    # Normalize the probabilities
    total_prob = sum(prob_outcome.values())
    for outcome in prob_outcome:
        prob_outcome[outcome] /= total_prob
        prob_outcome[outcome] *= 100  # To turn into percentage probability

    max_key = max(prob_outcome, key=prob_outcome.get)
    max_val = prob_outcome[max_key]

    return max_key, max_val


In [7]:
key,value = predict(['Sunny', 'Hot', 'High', 't'],y,prior,likelihood)

print(f"Prediction: {key}  percentage Chance: {value}%")

Prediction: no  percentage Chance: 83.82923673997412%


#### There are ways to analyze how our model is performing (e.g., through accuracy metrics, confusion matrices, and ROC curves). I will delve into these methods next and implement one of them here.