## Naive Bayes From Scratch

* `Naive Bayes` Algorithm is based on the `Bayes` Theorem which states that the probability of A given B equals the probability of B given A multiplied by probability of A divided by probability of B. i.e

$$
p(A | B) = \frac{p(B | A). p(A)}{p(B)}
$$

* Applying Bayes' Theorem to ML, we have:

$$
p(y | X) = \frac{p(X | y). p(y)}{p(X)}
$$

where:
  * $p(y | X)$: Posterior probability
  * $p(X | y)$: Class-conditional probability
  * $p(y)$: Prior probability of y
  * $p(X)$: Prior probability of X

### Note:
  
```python
Posterior_probability = Class_conditional_probability + Prior_probability_y
```

* It's a `naive` algorithm because it assumes that the features are mutually independent (which might not be true).
* Expanding Bayes' theorem yields:

$$
p(y | X) = \frac{p(x_{1} | y).p(x_{2} | y)...p(x_{n} | y). p(y)}{p(X)}
$$

* Since p(X) does NOT depend on `y`, we can drop it.
* In order to determine `y`, we need to find the argmax of the posterior. i.e
  
$$
p(y | X) = argmax(p(x_{1} | y).p(x_{2} | y)...p(x_{n} | y). p(y))
$$

* Since the product of the probabilities will yield a very small value (very close 0), we need to find the `log` of the posterior so that we avoid overflow error. 

$$
p(y | X) = argmax(logp(x_{1} | y) + logp(x_{2} | y) + ... + logp(x_{n} | y) + logp(y))
$$

* Log of the conditional probability can be modelled using a `Probability Density Function`.

$$
p(X | y) = (\frac{\exp({- \frac{(x_{i} - \mu_{y})^2}{2\sigma_{y}^2}})}{\sqrt{2\pi\sigma_{y}^2}})
$$

where:
  * $\mu_{y}$ is the mean given a class. i.e when class=0 or 1.
  * $\sigma_{y}^2$: is the variance given a class. i.e when class=0 or 1.

* Therefore, `y` is:

$$
y = argmax({\sum_{i=1}^{N}{log(\frac{\exp({-\frac{(x_{i} - \mu_{y})^2}{2\sigma_{y}^2}})}{\sqrt{2\pi\sigma_{y}^2}}) + log(p(y))}})
$$

* Since we have a binary class, for each input, the index of the value that produces the highest probability (argmax) is the the predicted value of `y`.

In [1]:
import numpy as np
from sklearn.model_selection import train_test_split

from run_algos import utils


# Black code formatter (Optional)
%load_ext lab_black

### Steps

> The following steps are used to build the Naive Bayes' classifier from scratch.

#### Training
Given the entire dataset:
1. Initialize the variables. Since we're modelling the class conditional probabilities using a **`probability density function`**, we need to calculate values for:
   * Priors of the class labels.
   * Means of the features given the class label.
   * Variances of the features given the class label.

#### Making Predictions
Given a data point:

2. Make prediction by finding the **`argmax`** of the `log posterior` which was modelled using `pdf`.

<hr>

In [2]:
class NaiveBayes:
    def __init__(self) -> None:
        self.means = None
        self.variances = None
        self.priors = None
        self.K = None
        self.n_K = None

    def __repr__(self) -> str:
        return (
            f"{self.__class__.__name__}(n_classes={self.n_K!r}, prior={self.priors!r})"
        )

    def fit(self, X: np.ndarray, y: np.ndarray) -> None:
        """This is used for training the model."""
        # Init the parameters
        n_samples, n_features = X.shape
        self.K = np.unique(y)
        self.n_K = len(self.K)

        # Init params for the classes. i.e if k is 2 then, k=0 or 1
        self.means = np.zeros((self.n_K, n_features))  # Matrix
        self.variances = np.zeros((self.n_K, n_features))  # Matrix
        self.priors = np.zeros((self.n_K)).reshape(-1, 1)  # Column vector

        # Compute the parameters for each class.
        # Calculate the mean, variance and priors given each class.
        for k in self.K:
            X_k = X[k == y]
            self.means[k, :] = np.mean(X_k, axis=0)
            self.variances[k, :] = np.var(X_k, axis=0)
            self.priors[k] = X_k.shape[0] / float(n_samples)
        return self

    def _predict(self, x: np.ndarray) -> np.ndarray:
        """This is used for making predictions for a training example."""
        self.posteriors = []
        # Shape of x: (1, n_features)
        for k in self.K:
            log_prior = np.log(self.priors[k])
            posterior = np.sum(np.log(self._prob_density_func(x, k))) + log_prior
            self.posteriors.append(posterior)

        # This returns : 0 or 1 since the list `posteriors` has a size of 2.
        # i.e [posterior_cl_0, posterior_cl_1] and np.argmax returns
        # the index that has the maximum value (which is 0 or 1).
        return np.argmax(self.posteriors)

    def _prob_density_func(self, x: np.ndarray, k: int) -> float:
        """This is used to calculate the Gaussian Probability Density Function\n
        given the class for a training example. i.e for class=0 or 1"""
        # Shape of x, mean and variance: (1, n_features)
        mean, variance = self.means[k], self.variances[k]
        numerator = np.exp(-np.square(x - mean) / (2 * variance))
        denominator = np.sqrt(2 * np.pi * variance)
        return numerator / denominator

    def predict(self, X: np.ndarray) -> np.ndarray:
        """This is used for making predictions for ALL the training examples."""
        y_pred = [self._predict(x) for x in X]
        return np.array(y_pred)

In [3]:
X, y = utils.generate_mock_data(type_="classification")

# split the data
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=utils.TEST_SIZE, random_state=utils.RANDOM_STATE
)

X_train.shape, X_test.shape

((1800, 11), (200, 11))

In [4]:
nb = NaiveBayes()
nb.fit(X_train, y_train)

NaiveBayes(n_classes=2, prior=array([[0.49944444],
       [0.50055556]]))

In [5]:
y_pred = nb.predict(X=X_test)

np.mean(y_pred == y_test)

0.94