### Naive Bayes' Classifier

#### Bayes' Theorem

$P(A|B)=\frac{P(B|A)P(A)}{P(B)}$


#### Apply Bayes' Theorem to dataset

$P(y|X) = \frac{P(X|y)P(y)}{P(X)}$, given y is class variable and $X=(x_1, x_2, ..., x_n)$


#### Naive assumption that all variables x in X are independent

$P(y|x_1,...,x_n) = \frac{P(x_1|y)...P(x_n|y)P(y)}{P(x_1)...P(x_n)} = \frac{P(y)\Pi^n_{i=1}P(x_i|y)}{P(x_1)...P(x_n)} = P(y)\Pi^n_{i=1}P(x_i|y)$

### Gaussian Naive Bayes' Classifier

#### Apply Gaussian (Normal) Distribution on each feature

$P(x_i|y) = \frac{1}{\sqrt{2\pi\sigma^2_y}}\exp(-\frac{x_i-\mu_y)^2}{2\sigma^2_y})$

In [1]:
# Load the iris dataset
from sklearn.datasets import load_iris
iris = load_iris()

# Store feature matrix (X) and response vector (y)
X = iris.data
y = iris.target

In [2]:
# Split X and y into training and testing sets
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.4, random_state=1)

In [3]:
# Train model on testing set
from sklearn.naive_bayes import GaussianNB
gnb = GaussianNB()
gnb.fit(X_train, y_train)

GaussianNB(priors=None, var_smoothing=1e-09)

In [5]:
# Making predictions on the testing set
y_pred = gnb.predict(X_test)

# Compare actual vs expected response values (y_test vs y_pred)
from sklearn import metrics
print("Gaussian Naive Bayes model accuracy(in %):", metrics.accuracy_score(y_test, y_pred)*100)

Gaussian Naive Bayes model accuracy(in %): 95.0
