# Naive Bayes

It is a probabilistic classifier based on applying Bayes theorem with strong independence assumptions between the features

$$P(A|B) = \frac{P(B|A) * P(A)}{P(B)}$$

$$P(y|X) = \frac{P(X|y) * P(y)}{P(X)}$$

$$P(y|X) = \frac{P(x_{1}|Y) * P(x_{2}|Y) .... P(x_{n}|Y)*P(Y)}{P(X)}$$

$$y = P(y|X) = argmax_{y} P(y|X)$$

$$y = argmax_{y} \log(P(x_{1}|y)) + \log(P(x_{2}|y)) + .... + \log(P(x_{n}|y)) + \log(P(y))$$

P(y) = prior probability --> frequency of each class

$P(x_{i}|y)$ = class conditional probability --> Model with Gaussian

$$P(x_{i}|y) = \frac{1}{\sqrt{2\pi\sigma^{2}}}*\exp({-\frac{(x_{i} - \mu_{y})^{2}}{2\sigma_{y}^{2}}})$$

Training:
* Calculate mean, variance, and prior for each class
Predicitons:
* Calculate posterior for each class with $y = argmax_{y} \log(P(x_{1}|y)) + \log(P(x_{2}|y)) + .... + \log(P(x_{n}|y)) + \log(P(y))$ and Gaussian formula
* Choose class with highest posterior probability


In [1]:
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn import datasets

In [2]:
class NaiveBayes:
    def fit(self, X, y):
        """
        Fit the Naive Bayes classifier to the training data.

        Parameters:
        - X: The input features of the training data.
        - y: The target values of the training data.
        """
        n_samples, n_features = X.shape
        self.classes = np.unique(y)
        n_classes = len(self.classes)

        self.mean = np.zeros((n_classes, n_features), dtype=np.float64)
        self.var = np.zeros((n_classes, n_features), dtype=np.float64)
        self.priors = np.zeros((n_classes, n_features), dtype=np.float64)

        for idx, c in enumerate(self.classes):
            X_c = X[y == c]
            self.mean[idx, :] = X_c.mean(axis=0)
            self.var[idx, :] = X_c.var(axis=0)
            self.priors[idx] = X_c.shape[0] / float(n_samples)

    def predict(self, X):
        """
        Make predictions for the given input data.

        Parameters:
        - X: The input features of the data points to predict.

        Returns:
        - y_pred: The predicted target values.
        """
        y_pred = [self.predictions(x) for x in X]
        return np.array(y_pred)

    def predictions(self, x):
        """
        Predict the class label for a single input sample.

        Parameters:
        - x: The input features of a single data point.

        Returns:
        - predicted_class: The predicted class label for the input sample.
        """
        posteriors = []
        for idx, c in enumerate(self.classes):
            prior = np.log(self.priors[idx])
            posterior = np.sum(np.log(self.pdf(idx, x)))
            posteriors.append(posterior)
        predicted_class = self.classes[np.argmax(posteriors)]
        return predicted_class
    
    def pdf(self, class_idx, x):
        """
        Compute the probability density function for a feature given a class.

        Parameters:
        - class_idx: The index of the class.
        - x: The input feature value.

        Returns:
        - probability: The computed probability density.
        """
        mean = self.mean[class_idx]
        var = self.var[class_idx]
        numerator = np.exp(-((x - mean) ** 2) / (2 * var))
        denominator = np.sqrt(2 * np.pi * var)
        return numerator / denominator

In [3]:
def accuracy(y_true, y_pred):
    accuracy = np.sum(y_true == y_pred) / len(y_true)
    return accuracy

In [4]:
X, y = datasets.make_classification(
    n_samples=1000, n_features=10, n_classes=2, random_state=123
)
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=123
)

nb = NaiveBayes()
nb.fit(X_train, y_train)
predictions = nb.predict(X_test)

print("Naive Bayes classification accuracy", accuracy(y_test, predictions))

Naive Bayes classification accuracy 0.965
