# Naive Bayesian - Gaussian

In [None]:
import numpy as np
import matplotlib.pyplot as plt

<img style="background-color: white" src="../figures/normal.png" width="500">

$$
P(y|x) = \frac{P(x|y)P(y)}{P(x)}
$$

## Let's understand each part

### 1. Prior $P(y)$

### 2. Likelihood / Conditional Probability $P(x|y)$

$$ P(x^i_j \in \text{test} \mid y=1 ; \mu_{1j}, \sigma_{1j}^{2}) = \frac{1}{\sqrt{2\pi\sigma_{1j}^{2}}}e ^{-\frac{(x^i_j-\mu_{1j})^{2}}{2\sigma_{1j}^{2}}}$$
$$ P(x^i_j \in \text{test} \mid y=0 ; \mu_{0j}, \sigma_{0j}^{2}) = \frac{1}{\sqrt{2\pi\sigma_{0j}^{2}}}e ^{-\frac{(x^i_j-\mu_{0j})^{2}}{2\sigma_{0j}^{2}}}$$

$$P(x \mid y) = \prod_{j=1}^n P( x_j \mid y )$$

### 3. $P(y)P(x|y)$

### 4. Predict

## Let's implement

## 1. Prepare your data

In [None]:
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler

# generate quite a lot of noise
# with only 4 informative features out of 10
# with 2 redundant features, overlapping with that 4 informative features
# and 4 noisy features
# Also, make std wider using n_clusters=2
X, y = make_classification(n_samples=500, n_features=10, n_redundant=2, n_informative=4,
                             n_clusters_per_class=2, random_state=14)
plt.scatter(X[:, 0], X[:, 1], marker='o', c=y,
            s=25, edgecolor='k')

# look at the data...it is likely not linearly separable!

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3)

# feature scaling helps improve reach convergence faster
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)

## 2. Calculate the mean and std for each feature for each class

## 3. Define the probability density function so we can later calculate $p(x_j \mid y)$

## 4. Calculate the likelihood by calculating the probability density $p(x_j \mid y)$

### 4.1 Calculate the total likelihood by calculating $p(x \mid y)$

### 4.2 Calculate the prior $p(y)$

## 5. Calculate the posterior $p(x \mid y)p(y)$ for each class

## 6. Calculate accuracy