# Naive Bayes Classifiers
**On the Iris Flower Data Set**

Presented By: **Viraat Chandra**
<br>*12-H, Vector, DPS Gurgaon*

### Required Libraries
- sklearn for the iris dataset, GaussianNB model, and helper functions
- numpy for matrix ops.
- pandas for data analysis

In [68]:
from sklearn.datasets import load_iris
from sklearn.naive_bayes import GaussianNB
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
import numpy as np
import pandas as pd

### Import the dataset
`features` is the array of feature arrays for each flower in the iris data set. `labels` is the array of labels for each flower in the iris data set.

**Feature**: what describes a specific object <br>
**Label**: what is the 'category' an object with those features falls in

In [40]:
iris = load_iris()
features = iris.data
labels = iris.target

### Analyse the dataset

In [41]:
pd.DataFrame(iris.data, columns=iris.feature_names).describe()

Unnamed: 0,sepal length (cm),sepal width (cm),petal length (cm),petal width (cm)
count,150.0,150.0,150.0,150.0
mean,5.843333,3.054,3.758667,1.198667
std,0.828066,0.433594,1.76442,0.763161
min,4.3,2.0,1.0,0.1
25%,5.1,2.8,1.6,0.3
50%,5.8,3.0,4.35,1.3
75%,6.4,3.3,5.1,1.8
max,7.9,4.4,6.9,2.5


In [42]:
pd.DataFrame(iris.target).describe()

Unnamed: 0,0
count,150.0
mean,1.0
std,0.819232
min,0.0
25%,0.0
50%,1.0
75%,2.0
max,2.0


## Naive Bayes
In ML, Naive Bayes Classifiers are a family of simple 'probablistic' classifiers based on applying Bayes Thoerem with strong (naive) independence assumptions between the features.

[Naive Bayes Classifiers WIKI](https://en.wikipedia.org/wiki/Naive_Bayes_classifier) <br>
[Naive Bayes Classifiers SKLEARN](http://scikit-learn.org/stable/modules/naive_bayes.html)

In [43]:
gnb = GaussianNB()

### Bayes Theorem
Bayes’ Theorem finds the probability of an event occurring given the probability of another event that has already occurred. Bayes’ theorem is stated mathematically as the following equation: <br>
![](https://www.geeksforgeeks.org/wp-content/ql-cache/quicklatex.com-7777aa719ea14857115695676adc0914_l3.svg)

Where A and B are events.

- Basically, we are trying to find probability of event A, given the event B is true. Event B is also termed as evidence.
- P(A) is the priori of A (the prior probability, i.e. Probability of event before evidence is seen). The evidence is an attribute value of an unknown instance(here, it is event B).
- P(A|B) is a posteriori probability of B, i.e. probability of event after evidence is seen.

### Split data into train and test sets

In [44]:
xT, xt, yT, yt = train_test_split(features, labels, test_size=0.2)

print('X Train: ', np.array(xT).shape)
print('X Test: ', np.array(xt).shape)
print('Y Train: ', np.array(yT).shape)
print('Y Test: ', np.array(yt).shape)

X Train:  (120, 4)
X Test:  (30, 4)
Y Train:  (120,)
Y Test:  (30,)


### Train the model
ie. Make the probablistic classifier

In [59]:
from time import time
t = time()
gnb.fit(xT, yT)
print('Training Time: ', time() - t, 's')

Training Time:  0.0024912357330322266 s


### Calculate the accuracy of our model

In [None]:
print('Accuracy Of Model: ', accuracy_score(gnb.predict(xt), yt), '%')

## Additional Reading
- [Geeks For Geeks](https://www.geeksforgeeks.org/naive-bayes-classifiers/)
- Udacity Intro To Machine Learning Course