# Naive Bayes Classifier from Scratch

## Overview 📊
In this project, I’m implementing a **Naive Bayes** classifier from scratch using Python, demonstrating how this probabilistic model works under the hood. The model uses **Bayes' Theorem** to predict the class of a given sample based on conditional probabilities.

### Key Concepts:
- **Naive Bayes Classifier**: A classification algorithm based on applying Bayes' Theorem with strong (naive) independence assumptions between the features.
  
- **Bayes' Theorem**: A principle that relates current evidence to prior beliefs, providing a way to calculate posterior probabilities.
  
- **Conditional Probability**: The likelihood of an event occurring given that another event has occurred.
  
- **Maximum Likelihood Estimation (MLE)**: A method for estimating the parameters of a statistical model that maximizes the likelihood of the observed data.

---

## Objective 🎯
The goal of this project is to:
1. Implement a custom Naive Bayes classifier.
   
2. Calculate prior probabilities and likelihoods for each class and feature.
   
3. Make predictions by computing posterior probabilities and selecting the class with the highest probability.

---

## Naive Bayes Explanation 🧠

### Bayes' Theorem
Naive Bayes classifiers rely on **Bayes' Theorem** to calculate the posterior probability of a class given the features:

$$
P(C | X) = \frac{P(X | C) P(C)}{P(X)}
$$

Where:
- \( P(C | X) \): The probability of class \(C\) given features \(X\) (posterior probability).
- \( P(X | C) \): The likelihood of observing the features \(X\) given class \(C\).
- \( P(C) \): The prior probability of class \(C\).
- \( P(X) \): The probability of the features \(X\) (also called the evidence, but usually constant in classification).

### Naive Assumption
In Naive Bayes, we assume that the features \(X = (x_1, x_2, ..., x_n)\) are **conditionally independent** given the class \(C\). This simplifies the likelihood calculation:

$$
P(X | C) = P(x_1 | C) \cdot P(x_2 | C) \cdot ... \cdot P(x_n | C)
$$

Thus, the posterior probability becomes:

$$
P(C | X) = \frac{P(C) \prod_{i=1}^{n} P(x_i | C)}{P(X)}
$$

Where:
- \( P(C) \) is the prior probability of the class.
- \( P(x_i | C) \) is the likelihood of each feature \( x_i \) given the class \( C \).

### Calculating the Prior and Likelihood

1. **Prior Probability \( P(C) \)**: The prior probability of a class is the proportion of samples in that class in the dataset.

   $$ P(C) = \frac{\text{count of class C}}{\text{total number of samples}} $$

2. **Likelihood \( P(x_i | C) \)**: The likelihood of each feature given the class is calculated using the frequency of feature values in the dataset.

   $$ P(x_i | C) = \frac{\text{count of feature value } x_i \text{ in class C}}{\text{count of class C}} $$

> Note: In case of categorical features, we treat them as counts and apply **Laplace smoothing** to avoid zero probabilities for unseen feature values.

---

## Implementation 🛠️

Below is the code for implementing the Naive Bayes classifier from scratch. The `NaiveBayes` class includes methods to:
1. **Fit the model**: Learn prior probabilities and likelihoods from the training data.
   
2. **Predict**: Make predictions on new data based on Bayes' Theorem.
   
3. **Calculate Accuracy**: Evaluate the model’s accuracy on a test dataset.

---

### Model Fitting and Prediction

1. **Training the Model**: The `fit` method learns the prior probabilities \(P(C)\) and likelihoods \(P(x_i | C)\) for each feature \(x_i\) in the dataset. These probabilities are stored for later use in predictions.

2. **Making Predictions**: For a given test sample, the `predict` method calculates the posterior probability for each class and chooses the class with the highest probability.

   $$ \hat{y} = \arg\max_C P(C) \prod_{i=1}^{n} P(x_i | C) $$

---

# Let's code Naive Bayes from scratch

In [2]:
import numpy as np
import pandas as pd

# Generating a simple dataset
np.random.seed(0)

# Features: Age (Young, Middle-aged, Old), Income (Low, High), Education (High School, Bachelor, Master)
data = {
    'Age': ['Young', 'Middle-aged', 'Old', 'Young', 'Middle-aged', 'Old', 'Young', 'Middle-aged', 'Old', 'Young'],
    'Income': ['Low', 'High', 'High', 'Low', 'High', 'Low', 'High', 'Low', 'Low', 'High'],
    'Education': ['High School', 'Bachelor', 'Master', 'Bachelor', 'Master', 'High School', 'High School', 'Master', 'Bachelor', 'Master'],
    'Class': ['Buy', 'Buy', 'Don\'t Buy', 'Buy', 'Buy', 'Don\'t Buy', 'Buy', 'Don\'t Buy', 'Don\'t Buy', 'Buy']
}

df = pd.DataFrame(data)

# Show the dataset
df.head()

Unnamed: 0,Age,Income,Education,Class
0,Young,Low,High School,Buy
1,Middle-aged,High,Bachelor,Buy
2,Old,High,Master,Don't Buy
3,Young,Low,Bachelor,Buy
4,Middle-aged,High,Master,Buy


In [3]:
import numpy as np

class NaiveBayes:
    def __init__(self):
        # Initialize dictionaries to store prior probabilities and likelihoods for each class
        self.prior_probs = {}  # Dictionary to store P(C) - Prior probability for each class
        self.likelihoods = {}  # Dictionary to store P(x_i | C) - Likelihood of each feature value given a class
        
    def fit(self, X, y):
        """
        Train the Naive Bayes model by calculating prior probabilities and likelihoods.
        
        X: DataFrame of input features (independent variables)
        y: Series of target labels (dependent variable)
        """
        total_samples = len(y)  # Get the total number of samples in the dataset
        
        # Calculate prior probabilities P(C), which is the frequency of each class divided by the total number of samples
        self.prior_probs = y.value_counts() / total_samples
        
        # Initialize the likelihoods dictionary to store the probability of feature values given a class
        self.likelihoods = {}
        
        # Loop through each unique class in the target variable
        for c in y.unique():
            # Get the subset of data where the target variable equals the current class
            class_data = X[y == c]
            
            # Initialize a dictionary for each feature in the current class
            self.likelihoods[c] = {}
            
            # Loop through each feature column in the input data
            for feature in X.columns:
                # Calculate the probability of each feature value in this class using value_counts
                feature_values = class_data[feature].value_counts() / len(class_data)
                
                # Store the calculated probabilities for the current feature and class
                self.likelihoods[c][feature] = feature_values

    def predict(self, X):
        """
        Make predictions for the given input features based on the trained model.
        
        X: DataFrame of input features (independent variables) to classify
        """
        predictions = []  # List to store the predicted classes for each row
        
        # Loop through each row of input features in X to make predictions
        for _, row in X.iterrows():
            posteriors = {}  # Dictionary to store the calculated posterior probability for each class
            
            # Loop through each class and calculate its posterior probability
            for c in self.prior_probs.index:
                # Start with the prior probability P(C) for the class, using log to handle small values
                posterior = np.log(self.prior_probs[c])
                
                # Loop through each feature to calculate its contribution to the posterior probability
                for feature in X.columns:
                    feature_value = row[feature]  # Get the feature value for the current row
                    # Get the likelihood P(x_i | C) for the feature value, with Laplace smoothing to handle zero probabilities
                    likelihood = self.likelihoods[c].get(feature, {}).get(feature_value, 1e-6)
                    # Add the log of the likelihood to the posterior
                    posterior += np.log(likelihood)
                
                # Store the calculated posterior probability for the current class
                posteriors[c] = posterior
            
            # Choose the class with the highest posterior probability and append it to predictions
            predictions.append(max(posteriors, key=posteriors.get))
        
        return predictions  # Return the list of predicted classes

# Step 3: Training the model and making predictions
X = df[['Age', 'Income', 'Education']]  # Select the input features
y = df['Class']  # Select the target variable (class labels)

# Convert categorical features to a numeric format (i.e., category codes) for processing by the model
X = X.apply(lambda col: col.astype('category').cat.codes)

# Train the Naive Bayes model using the input features and target variable
nb = NaiveBayes()  # Instantiate the NaiveBayes class
nb.fit(X, y)  # Fit the model to the data

# Step 4: Making predictions using the trained model
predictions = nb.predict(X)  # Get the predicted classes for the input features

# Step 5: Evaluating the model by calculating the accuracy
accuracy = sum(predictions == y) / len(y)  # Calculate accuracy as the percentage of correct predictions
print(f"Accuracy: {accuracy:.4f}")  # Print the accuracy with 4 decimal places

Accuracy: 1.0000


# When to Use Naive Bayes 📈

- **Text Classification**: Great for tasks like spam detection and sentiment analysis, where features (e.g., words) are abundant.

- **Binary and Multi-Class Classification**: Works well for both yes/no and multi-category problems.
  
- **Independent Features**: Best when features are independent or weakly dependent.
  
- **Large Datasets**: Efficient for big datasets with categorical features.
  
- **Real-Time Prediction**: Ideal for fast predictions in systems like recommendation engines.

---

# Pros of Naive Bayes ✅

- **Simple and Fast**: Easy to implement and quick to train and predict.
  
- **Works with Large Datasets**: Efficient for big data.
  
- **Good with Categorical Features**: Perfect for tasks like text classification.
  
- **Handles Missing Data**: Ignores missing values during calculation.
  
- **Effective for Text Data**: Great for high-dimensional data like words in documents.

---

# Cons of Naive Bayes ❌

- **Independence Assumption**: Struggles with correlated features, as it assumes all features are independent.
  
- **Not Great for Correlated Features**: Performs poorly if features are strongly related.
  
- **Limited Complexity**: Simple model, so it may miss complex patterns.
  
- **Issues with Continuous Data**: Needs transformation (e.g., Gaussian distribution) to handle continuous features.

---

## Conclusion 🎯

Naive Bayes is fast, simple, and efficient for tasks like text classification, large datasets with categorical data, and real-time predictions. However, its assumption of feature independence can limit its performance when features are correlated.