# Naive Bayes

Naive Bayes is a probabilistic classifier that applies Bayes' Theorem with the assumption of strong (naive) independence between features.

## Bayes' Theorem
The theorem is expressed as:

$$
P(A|B) = \frac{P(B|A) \cdot P(A)}{P(B)}
$$

### In the context of classification:
For a feature vector \( X = (x_1, x_2, \ldots, x_n) \), the probability of class \( y \) given \( X \) is:

$$
P(y|X) = \frac{P(X|y) \cdot P(y)}{P(X)}
$$

## Independence Assumption
Assuming the features are mutually independent, the equation simplifies to:

$$
P(y|X) = \frac{P(x_1|y) \cdot P(x_2|y) \cdot \ldots \cdot P(x_n|y) \cdot P(y)}{P(X)}
$$

## Decision Rule
The predicted class \( y \) is determined by maximizing the posterior probability:

$$
y = \arg\max_y \left( \log(P(x_1|y)) + \log(P(x_2|y)) + \ldots + \log(P(x_n|y)) + \log(P(y)) \right)
$$

## Key Components:
- **Prior Probability (\( P(y) \))**: The probability of each class based on frequency.
- **Class-Conditional Probability (\( P(x_i|y) \))**: The probability of a feature given a class, often modeled with a Gaussian distribution.

## Training Process
1. Calculate the mean and variance for each feature within each class (for Gaussian models).
2. Compute the prior probability for each class based on its frequency in the dataset.

Naive Bayes is particularly useful for high-dimensional data and is commonly used in text classification and spam filtering. Its simplicity and efficiency make it a powerful baseline classifier.


In [1]:
import numpy as np

In [8]:
class NaiveBayes:

    def fit(self, X, y):
        n_samples, n_features = X.shape
        self._classes = np.unique(y)
        n_classes = len(self._classes)

        # Calculate mean, var, and prior for each class
        self._mean = np.zeros((n_classes, n_features), dtype=np.float64) 
        self._var = np.zeros((n_classes, n_features), dtype=np.float64)
        self._priors = np.zeros(n_classes, dtype=np.float64)

        for idxs, c in enumerate(self._classes):
            X_c = X[y == c]
            self._mean[idxs, :] = X_c.mean(axis=0)
            self._var[idxs, :] = X_c.var(axis=0)
            self._priors[idxs] = X_c.shape[0] / float(n_samples)

    def predict(self, X):
        y_pred = [self._predict(x) for x in X]
        return np.array(y_pred)

    # Helper Fucntion
    def _predict(self, x):
        posteriors = []

        # Calculate posterior probability for each class
        for idxs, c in enumerate(self._classes):
            prior = np.log(self._priors[idxs])
            posterior = np.sum(np.log(self._pdf(idxs, x)))
            posterior = posterior + prior
            posteriors.append(posterior)

        # Return class with the heighest posterior
        return self._classes[np.argmax(posteriors)]
    
    def _pdf(self, class_idx, x): # PT: fdp (função densidade probabilidade)
        mean  = self._mean[class_idx]
        var = self._var[class_idx]
        numerator = np.exp(-((x-mean)**2) / (2*var))
        denominator = np.sqrt(2 * np.pi * var)
        
        return numerator / denominator

In [6]:
# Test

from sklearn.model_selection import train_test_split
from sklearn import datasets

def accuracy(y_true, y_pred):
    acc = np.sum(y_true == y_pred) / len(y_true)
    return acc


X, y = datasets.make_classification(
    n_samples=1000, n_features=10, n_classes=2, random_state=1234
)

X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=1234
)

In [15]:
clf = NaiveBayes()
clf.fit(X_train, y_train)

predictions = clf.predict(X_test)
acc = accuracy(y_test, predictions)
print(acc)

0.93
