# Naive Bayes From Scratch

Naive Bayes is a probabilistic classifier based on Bayes' theorem with the "naive" assumption of conditional independence between features.

## Key Concepts:
- **Bayes' Theorem**: P(y|X) = P(X|y) * P(y) / P(X)
- **Naive Assumption**: Features are conditionally independent given the class
- **Three Variants**: Gaussian, Multinomial, Bernoulli
- **Fast Training**: Simple probability calculations

In [None]:
import numpy as np
import matplotlib.pyplot as plt
from sklearn.datasets import load_iris, fetch_20newsgroups
from sklearn.model_selection import train_test_split
from sklearn.feature_extraction.text import CountVectorizer

## 1. Mathematical Foundation

### Bayes' Theorem:
$$P(y|X) = \frac{P(X|y) \cdot P(y)}{P(X)}$$

### Naive Bayes Assumption:
$$P(X|y) = P(x_1|y) \cdot P(x_2|y) \cdot ... \cdot P(x_n|y)$$

### Classification Rule:
$$\hat{y} = \arg\max_y P(y) \prod_{i=1}^{n} P(x_i|y)$$

## 2. Gaussian Naive Bayes

For continuous features, assumes Gaussian (normal) distribution:
$$P(x_i|y) = \frac{1}{\sqrt{2\pi\sigma_y^2}} \exp\left(-\frac{(x_i - \mu_y)^2}{2\sigma_y^2}\right)$$

In [None]:
class GaussianNB:
    def __init__(self):
        self.classes = None
        self.class_priors = {}
        self.means = {}
        self.variances = {}
    
    def fit(self, X, y):
        """
        Fit Gaussian Naive Bayes
        
        Parameters:
        -----------
        X : array-like, shape (n_samples, n_features)
        y : array-like, shape (n_samples,)
        """
        self.classes = np.unique(y)
        n_samples = len(y)
        
        # Calculate prior probabilities and parameters for each class
        for c in self.classes:
            X_c = X[y == c]
            
            # Prior probability: P(y=c)
            self.class_priors[c] = len(X_c) / n_samples
            
            # Mean and variance for each feature
            self.means[c] = np.mean(X_c, axis=0)
            self.variances[c] = np.var(X_c, axis=0) + 1e-9  # Add small value to avoid division by zero
        
        return self
    
    def _gaussian_pdf(self, x, mean, var):
        """
        Calculate Gaussian probability density function
        """
        return (1 / np.sqrt(2 * np.pi * var)) * np.exp(-((x - mean) ** 2) / (2 * var))
    
    def _predict_single(self, x):
        """
        Predict class for a single sample
        """
        posteriors = []
        
        for c in self.classes:
            # Start with prior probability (in log space to avoid underflow)
            posterior = np.log(self.class_priors[c])
            
            # Add log likelihood for each feature
            for i in range(len(x)):
                likelihood = self._gaussian_pdf(x[i], self.means[c][i], self.variances[c][i])
                posterior += np.log(likelihood + 1e-10)  # Avoid log(0)
            
            posteriors.append(posterior)
        
        # Return class with highest posterior probability
        return self.classes[np.argmax(posteriors)]
    
    def predict(self, X):
        """
        Predict classes for multiple samples
        """
        return np.array([self._predict_single(x) for x in X])
    
    def score(self, X, y):
        """
        Calculate accuracy
        """
        return np.mean(self.predict(X) == y)

## 3. Testing Gaussian NB on Iris Dataset

In [None]:
# Load Iris dataset
iris = load_iris()
X, y = iris.data, iris.target

# Split data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# Train Gaussian NB
gnb = GaussianNB()
gnb.fit(X_train, y_train)

# Evaluate
train_acc = gnb.score(X_train, y_train)
test_acc = gnb.score(X_test, y_test)

print("Gaussian Naive Bayes on Iris Dataset")
print(f"Train Accuracy: {train_acc:.4f}")
print(f"Test Accuracy: {test_acc:.4f}")

## 4. Multinomial Naive Bayes

For discrete count features (e.g., word counts in text):
$$P(x_i|y) = \frac{N_{yi} + \alpha}{N_y + \alpha n}$$

where $N_{yi}$ is count of feature $i$ in class $y$, and $\alpha$ is smoothing parameter.

In [None]:
class MultinomialNB:
    def __init__(self, alpha=1.0):
        """
        Initialize Multinomial Naive Bayes
        
        Parameters:
        -----------
        alpha : float
            Additive (Laplace/Lidstone) smoothing parameter (default=1.0)
        """
        self.alpha = alpha
        self.classes = None
        self.class_priors = {}
        self.feature_probs = {}
    
    def fit(self, X, y):
        """
        Fit Multinomial Naive Bayes
        """
        self.classes = np.unique(y)
        n_samples, n_features = X.shape
        
        for c in self.classes:
            X_c = X[y == c]
            
            # Prior probability
            self.class_priors[c] = len(X_c) / n_samples
            
            # Feature probabilities with Laplace smoothing
            feature_counts = np.sum(X_c, axis=0)
            total_count = np.sum(feature_counts)
            self.feature_probs[c] = (feature_counts + self.alpha) / (total_count + self.alpha * n_features)
        
        return self
    
    def _predict_single(self, x):
        """
        Predict class for a single sample
        """
        posteriors = []
        
        for c in self.classes:
            # Log prior
            posterior = np.log(self.class_priors[c])
            
            # Log likelihood
            posterior += np.sum(x * np.log(self.feature_probs[c] + 1e-10))
            
            posteriors.append(posterior)
        
        return self.classes[np.argmax(posteriors)]
    
    def predict(self, X):
        return np.array([self._predict_single(x) for x in X])
    
    def score(self, X, y):
        return np.mean(self.predict(X) == y)

## 5. Bernoulli Naive Bayes

For binary features:
$$P(x_i|y) = P(i|y)x_i + (1 - P(i|y))(1 - x_i)$$

In [None]:
class BernoulliNB:
    def __init__(self, alpha=1.0):
        """
        Initialize Bernoulli Naive Bayes
        
        Parameters:
        -----------
        alpha : float
            Additive smoothing parameter (default=1.0)
        """
        self.alpha = alpha
        self.classes = None
        self.class_priors = {}
        self.feature_probs = {}
    
    def fit(self, X, y):
        """
        Fit Bernoulli Naive Bayes
        """
        self.classes = np.unique(y)
        n_samples, n_features = X.shape
        
        for c in self.classes:
            X_c = X[y == c]
            n_c = len(X_c)
            
            # Prior probability
            self.class_priors[c] = n_c / n_samples
            
            # Feature probabilities with smoothing
            # P(feature=1|class)
            self.feature_probs[c] = (np.sum(X_c, axis=0) + self.alpha) / (n_c + 2 * self.alpha)
        
        return self
    
    def _predict_single(self, x):
        """
        Predict class for a single sample
        """
        posteriors = []
        
        for c in self.classes:
            # Log prior
            posterior = np.log(self.class_priors[c])
            
            # Log likelihood
            for i in range(len(x)):
                p = self.feature_probs[c][i]
                if x[i] == 1:
                    posterior += np.log(p + 1e-10)
                else:
                    posterior += np.log(1 - p + 1e-10)
            
            posteriors.append(posterior)
        
        return self.classes[np.argmax(posteriors)]
    
    def predict(self, X):
        return np.array([self._predict_single(x) for x in X])
    
    def score(self, X, y):
        return np.mean(self.predict(X) == y)

## 6. Comparison with Scikit-learn

In [None]:
from sklearn.naive_bayes import GaussianNB as SklearnGaussianNB

# Train sklearn Gaussian NB
sklearn_gnb = SklearnGaussianNB()
sklearn_gnb.fit(X_train, y_train)

sklearn_train_acc = sklearn_gnb.score(X_train, y_train)
sklearn_test_acc = sklearn_gnb.score(X_test, y_test)

print("\nGaussian NB Comparison:")
print(f"{'Method':<20} {'Train Acc':<12} {'Test Acc':<12}")
print("-" * 44)
print(f"{'Our Gaussian NB':<20} {train_acc:<12.4f} {test_acc:<12.4f}")
print(f"{'Sklearn Gaussian NB':<20} {sklearn_train_acc:<12.4f} {sklearn_test_acc:<12.4f}")

## 7. Testing on Binary Data (Bernoulli NB)

In [None]:
# Create binary dataset
np.random.seed(42)
X_binary = np.random.randint(0, 2, size=(200, 10))
y_binary = (np.sum(X_binary, axis=1) > 5).astype(int)

X_train_bin, X_test_bin, y_train_bin, y_test_bin = train_test_split(
    X_binary, y_binary, test_size=0.3, random_state=42
)

# Train Bernoulli NB
bnb = BernoulliNB()
bnb.fit(X_train_bin, y_train_bin)

bnb_train_acc = bnb.score(X_train_bin, y_train_bin)
bnb_test_acc = bnb.score(X_test_bin, y_test_bin)

print("\nBernoulli Naive Bayes on Binary Data")
print(f"Train Accuracy: {bnb_train_acc:.4f}")
print(f"Test Accuracy: {bnb_test_acc:.4f}")

## 8. Visualization: Feature Distributions

In [None]:
# Visualize learned Gaussian distributions for first feature
feature_idx = 0
x_range = np.linspace(X[:, feature_idx].min(), X[:, feature_idx].max(), 100)

plt.figure(figsize=(12, 4))

for i, c in enumerate(gnb.classes):
    plt.subplot(1, 3, i+1)
    
    # Plot histogram of actual data
    plt.hist(X[y == c, feature_idx], bins=15, density=True, alpha=0.5, label='Actual')
    
    # Plot learned Gaussian distribution
    mean = gnb.means[c][feature_idx]
    var = gnb.variances[c][feature_idx]
    pdf = gnb._gaussian_pdf(x_range, mean, var)
    plt.plot(x_range, pdf, 'r-', linewidth=2, label='Learned Gaussian')
    
    plt.xlabel(iris.feature_names[feature_idx], fontsize=11)
    plt.ylabel('Density', fontsize=11)
    plt.title(f'Class: {iris.target_names[c]}', fontsize=12)
    plt.legend(fontsize=10)
    plt.grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

## 9. Key Takeaways

### Advantages:
- ✅ Fast training and prediction
- ✅ Works well with small datasets
- ✅ Handles high-dimensional data well
- ✅ Naturally handles multi-class problems
- ✅ Probabilistic predictions
- ✅ Simple and interpretable

### Disadvantages:
- ❌ Naive independence assumption (rarely true)
- ❌ Can be outperformed by more complex models
- ❌ Sensitive to irrelevant features
- ❌ Zero-frequency problem (solved by smoothing)

### When to Use Each Variant:

**Gaussian NB:**
- Continuous features
- Features follow normal distribution
- Examples: Iris classification, sensor data

**Multinomial NB:**
- Discrete count features
- Text classification (word counts)
- Examples: Document classification, spam detection

**Bernoulli NB:**
- Binary features
- Text classification (word presence/absence)
- Examples: Sentiment analysis, binary feature data

### Best Practices:
- Use smoothing to handle zero probabilities
- Feature scaling not required (unlike many algorithms)
- Works well as a baseline model
- Excellent for text classification tasks