# Naive Bayes Classifier Implementation

This notebook provides a step-by-step implementation of the Naive Bayes classifier using the weather dataset. We'll cover:

1. Understanding the dataset
2. Calculating prior probabilities P(y)
3. Computing likelihood probabilities P(x|y)
4. Making predictions using the Naive Bayes formula
5. Evaluating the model's performance


## 1. Data Loading and Exploration

First, let's load and examine our weather dataset.

In [2]:
import numpy as np
import pandas as pd

def pre_processing(df):

	""" partioning data into features and target """

	X = df.drop([df.columns[-1]], axis = 1)
	y = df[df.columns[-1]]

	return X, y



# Load the weather dataset
df = pd.read_csv('weather.csv', delim_whitespace=True)
print('Dataset Preview:')
print(df)

# Split into features and target

X, y = pre_processing(df)

Dataset Preview:
     Outlook  Temp Humidity Windy Play
0      Rainy   Hot     High     f   no
1      Rainy   Hot     High     t   no
2   Overcast   Hot     High     f  yes
3      Sunny  Mild     High     f  yes
4      Sunny  Cool   Normal     f  yes
5      Sunny  Cool   Normal     t   no
6   Overcast  Cool   Normal     t  yes
7      Rainy  Mild     High     f   no
8      Rainy  Cool   Normal     f  yes
9      Sunny  Mild   Normal     f  yes
10     Rainy  Mild   Normal     t  yes
11  Overcast  Mild     High     t  yes
12  Overcast   Hot   Normal     f  yes
13     Sunny  Mild     High     t   no


  df = pd.read_csv('weather.csv', delim_whitespace=True)


In [3]:
print("Features:\n" , X)

Features:
      Outlook  Temp Humidity Windy
0      Rainy   Hot     High     f
1      Rainy   Hot     High     t
2   Overcast   Hot     High     f
3      Sunny  Mild     High     f
4      Sunny  Cool   Normal     f
5      Sunny  Cool   Normal     t
6   Overcast  Cool   Normal     t
7      Rainy  Mild     High     f
8      Rainy  Cool   Normal     f
9      Sunny  Mild   Normal     f
10     Rainy  Mild   Normal     t
11  Overcast  Mild     High     t
12  Overcast   Hot   Normal     f
13     Sunny  Mild     High     t


In [4]:
print("Labels \n", y)

Labels 
 0      no
1      no
2     yes
3     yes
4     yes
5      no
6     yes
7      no
8     yes
9     yes
10    yes
11    yes
12    yes
13     no
Name: Play, dtype: object


## 2. Prior Probabilities

The prior probability P(y) represents the probability of each class occurring in the dataset. It's calculated as:

P(y) = Count of class y / Total number of samples

In [5]:
def calculate_prior_probabilities(y):
    counts =  y.value_counts().to_dict()
    total_samples = len(y)
    for cls in counts:
        counts[cls] /= total_samples
    return counts

# Calculate prior probabilities
priors = calculate_prior_probabilities(y)
print('Prior Probabilities P(y):')

for cls, prob in priors.items():
    print(f'P(Play={cls}) = {prob}')

Prior Probabilities P(y):
P(Play=yes) = 0.6428571428571429
P(Play=no) = 0.35714285714285715


## 3. Likelihood Probabilities

The likelihood P(x|y) represents the probability of feature x given class y. We use Laplace smoothing to handle zero probabilities:

P(x|y) = (Count of x in class y + 1) / (Count of class y + number of unique values)

In [6]:
# Calculate class counts for likelihood
def calculate_class_counts(X, y):
    class_counts = {}
    for feature in X.columns:
        class_counts[feature] = {}
        for value in X[feature].unique():
            class_counts[feature][value] = {}
            for cls in y.unique():
                count = sum((X[feature] == value) & (y == cls))
                class_counts[feature][value][cls] = count
    return class_counts

In [7]:
# Calculate likelihood probabilities
def calculate_likelihoods(X, y):
    class_counts = calculate_class_counts(X, y)
    class_totals = {cls: sum(y == cls) for cls in y.unique()}
    
    likelihood_table = {}
    for feature in X.columns:
        likelihood_table[feature] = {}
        for value in X[feature].unique():
            likelihood_table[feature][value] = {}
            for cls in y.unique():
                likelihood_table[feature][value][cls] = (
                    class_counts[feature][value][cls] / class_totals[cls]
                )
    return likelihood_table

In [9]:
# Calculate and print likelihood tables
print('Likelihood Table')
likelihood_table = calculate_likelihoods(X, y)
likelihood_table

Likelihood Table


{'Outlook': {'Rainy': {'no': 0.6, 'yes': 0.2222222222222222},
  'Overcast': {'no': 0.0, 'yes': 0.4444444444444444},
  'Sunny': {'no': 0.4, 'yes': 0.3333333333333333}},
 'Temp': {'Hot': {'no': 0.4, 'yes': 0.2222222222222222},
  'Mild': {'no': 0.4, 'yes': 0.4444444444444444},
  'Cool': {'no': 0.2, 'yes': 0.3333333333333333}},
 'Humidity': {'High': {'no': 0.8, 'yes': 0.3333333333333333},
  'Normal': {'no': 0.2, 'yes': 0.6666666666666666}},
 'Windy': {'f': {'no': 0.4, 'yes': 0.6666666666666666},
  't': {'no': 0.6, 'yes': 0.3333333333333333}}}

## 4. Making Predictions

To make predictions, we use the Naive Bayes formula:

P(y|x) = P(y) * P(x|y) / P(x)

Which can be expanded to:

P(y|x) = P(y) * P(x₁|y) * P(x₂|y) * ... * P(xₙ|y) / P(x)

Where:
- P(y|x) is the posterior probability of class y given features x
- P(y) is the prior probability of class y
- P(xᵢ|y) is the likelihood of feature i given class y
- P(x) is the evidence (normalizing constant)

Since P(x) is constant for all classes, we can simply compare:

P(y|x) ∝ P(y) * P(x₁|y) * P(x₂|y) * ... * P(xₙ|y)

In [21]:
def predict_naive_bayes(X_new, priors, likelihood_table):
    predictions = []
    
    for _, sample in X_new.iterrows():
        posteriors = {}
        
        for cls in priors.keys():
            posterior = priors[cls]
            
            for feature, value in sample.items():
                if value in likelihood_table[feature]:
                    posterior *= likelihood_table[feature][value].get(cls, 1.0)
            
            posteriors[cls] = posterior
        
        predictions.append(max(posteriors.items(), key=lambda x: x[1])[0])
    
    return predictions

# Make predictions on training data
predictions = predict_naive_bayes(X, priors, likelihood_table)

# Calculate accuracy
accuracy = sum(predictions == y) / len(y)
print(f'Training Accuracy: {accuracy:.2f}')

Training Accuracy: 0.93


## 5. Testing with New Examples

Let's test our classifier with some new weather conditions.

In [24]:
# Test some example queries
test_queries = pd.DataFrame([
    ['Rainy', 'Mild', 'Normal', 't'],
    ['Overcast', 'Cool', 'Normal', 't'],
    ['Sunny', 'Hot', 'High', 't']
], columns=X.columns)  # FIX: Use X.columns instead of Features

predictions = predict_naive_bayes(test_queries, priors, likelihood_table)

print('Test Predictions:')
for query, pred in zip(test_queries.values, predictions):
    print(f'Query: {query} → Prediction: {pred}')


Test Predictions:
Query: ['Rainy' 'Mild' 'Normal' 't'] → Prediction: yes
Query: ['Overcast' 'Cool' 'Normal' 't'] → Prediction: yes
Query: ['Sunny' 'Hot' 'High' 't'] → Prediction: no
