## Naive Bayes From Scratch

* `Naive Bayes` Algorithm is based on the `Bayes` Theorem which states that the probability of A given B equals the probability of B given A multiplied by probability of A divided by probability of B. i.e

$$
p(A | B) = \frac{p(B | A). p(A)}{p(B)}
$$

* Applying Bayes' Theorem to ML, we have:

$$
p(y | X) = \frac{p(X | y). p(y)}{p(X)}
$$

where:
  * $p(y | X)$: Posterior probability
  * $p(X | y)$: Class-conditional probability
  * $p(y)$: Prior probability of y
  * $p(X)$: Prior probability of X

### Note:
  
```python
Posterior_probability = Class_conditional_probability + Prior_probability_y
```

* It's a `naive` algorithm because it assumes that the features are mutually independent (which might not be true).
* Expanding Bayes' theorem yields:

$$
p(y | X) = \frac{p(x_{1} | y).p(x_{2} | y)...p(x_{n} | y). p(y)}{p(X)}
$$

* Since p(X) does NOT depend on `y`, we can drop it.
* In order to determine `y`, we need to find the argmax of the posterior. i.e
  
$$
p(y | X) = argmax(p(x_{1} | y).p(x_{2} | y)...p(x_{n} | y). p(y))
$$

* Since the product of the probabilities will yield a very small value (very close 0), we need to find the `log` of the posterior so that we avoid overflow error. 

$$
p(y | X) = argmax(logp(x_{1} | y) + logp(x_{2} | y) + ... + logp(x_{n} | y) + logp(y))
$$

* Log of the conditional probability can be modelled using a `Probability Density Function`.

$$
p(X | y) = (\frac{\exp({- \frac{(x_{i} - \mu_{y})^2}{2\sigma_{y}^2}})}{\sqrt{2\pi\sigma_{y}^2}})
$$

where:
  * $\mu_{y}$ is the mean given a class. i.e when class=0 or 1.
  * $\sigma_{y}^2$: is the variance given a class. i.e when class=0 or 1.

* Therefore, `y` is:

$$
y = argmax({\sum_{i=1}^{N}{log(\frac{\exp({-\frac{(x_{i} - \mu_{y})^2}{2\sigma_{y}^2}})}{\sqrt{2\pi\sigma_{y}^2}}) + log(p(y))}})
$$

* Since we have a binary class, for each input, the index of the value that produces the highest probability (argmax) is the the predicted value of `y`.

In [1]:
import numpy as np

# Black code formatter (Optional)
%load_ext lab_black

In [2]:
class NaiveBayes:
    def __init__(self) -> None:
        self.means = None
        self.variances = None
        self.priors = None
        self.K = None
        self.n_K = None

    def __repr__(self) -> str:
        return f"{self.__class__.__name__}(n_classes={self.n_classes!r})"

    def fit(self, X: np.ndarray, y: np.ndarray) -> None:
        """This is used for training the model."""
        # Init the parameters
        n_samples, n_features = X.shape
        self.K = np.unique(y)
        self.n_K = len(self.K)

        # Init params for the 2 classes. i.e k=0 or 1
        self.means = np.zeros((self.n_K, n_features))  # Matrix
        self.variances = np.zeros((self.n_K, n_features))  # Matrix
        self.priors = np.zeros((self.n_K)).reshape(-1, 1)  # Column vector

        # Compute the parameters for each class.
        # Calculate the mean, variance and priors given each class.
        for k in self.K:
            X_k = X[k == y]
            self.means[k, :] = np.mean(X_k, axis=0)
            self.variances[k, :] = np.var(X_k, axis=0)
            self.priors[k] = X_k.shape[0] / float(n_samples)
        return self

    def predict(self, X: np.ndarray) -> np.ndarray:
        """This is used for making predictions for a training example."""
        for k in self.K:
            likelihood = np.log(self.__prob_density_func(X, k)) + np.log(self.priors[k])

    def __prob_density_func(self, x: np.ndarray, k: int) -> float:
        """This is used to calculate the Probability Density Function."""
        # Shape of mean and variance: (1, n_features)
        mean, variance = self.means[k], self.variances[k]
        numerator = np.exp(-np.square(x - mean) / (2 * variance))
        denominator = np.sqrt(2 * np.pi * variance)
        return numerator / denominator

    def predict(self, X: np.ndarray) -> np.ndarray:
        """This is used for making predictions."""
        pass

In [3]:
A = [[2, 3, 12, 9, 4, 7, 3], [1, 5, 2, 3, 11, 8, 0], [1, 4, 3, 1, 6, 4, 5]]
y = [1, 0, 1]

A_arr = np.array(A)
y_arr = np.array(y)

A_arr, y_arr

(array([[ 2,  3, 12,  9,  4,  7,  3],
        [ 1,  5,  2,  3, 11,  8,  0],
        [ 1,  4,  3,  1,  6,  4,  5]]),
 array([1, 0, 1]))

In [4]:
A_arr[0]

array([ 2,  3, 12,  9,  4,  7,  3])

In [5]:
A_arr[1 == y_arr]

np.sqrt(2 * np.pi * A_arr[0])

array([3.5449077 , 4.34160753, 8.68321505, 7.51988482, 5.01325655,
       6.63191504, 4.34160753])

In [6]:
[1 == y_arr]

[array([ True, False,  True])]