# Naïve Bayes: Capstone Project Report

**Student Name:** Zafir Shaikh  
**Course:** Machine Learning AI  
**Institution:** Humber IGS  
**Course Code:** BINF_5507  
**Date:** July 21, 2025

## 1. Introduction

Naïve Bayes is a family of simple yet powerful probabilistic classifiers based on applying Bayes’ Theorem with the “naïve” assumption of feature independence. It is widely used in text classification tasks such as spam filtering, sentiment analysis, and document categorization due to its speed, simplicity, and surprisingly strong performance.

Despite its simplicity, Naïve Bayes often performs competitively with more complex algorithms, especially when the assumption of independence holds approximately true. It is especially effective when dealing with large feature spaces and small datasets.

## 2. How It Works

Naïve Bayes applies **Bayes’ Theorem** as follows:

$$
P(C|X) = \frac{P(X|C) \cdot P(C)}{P(X)}
$$

Where:
- $P(C|X)$: Posterior probability of class $C$ given predictor $X$  
- $P(X|C)$: Likelihood of predictor $X$ given class $C$  
- $P(C)$: Prior probability of class $C$  
- $P(X)$: Prior probability of predictor $X$

In [None]:
# Example: Applying Naive Bayes on sample data
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import GaussianNB
from sklearn.metrics import accuracy_score

# Load data
X, y = load_iris(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# Train Naive Bayes model
model = GaussianNB()
model.fit(X_train, y_train)
y_pred = model.predict(X_test)

# Evaluate accuracy
accuracy = accuracy_score(y_test, y_pred)
print(f"Accuracy: {accuracy:.2f}")