# Naive Bayes
Naive Bayes is a method of calculating the probability of a element belonging to a certain class. Naive Bayes is a classification algorithm that focuses on efficiency more than accuracy. The Bayes' Theorm states: 
$$ p(class|data) = (p(data|class) * p(class)) / p(data) $$
- $ p(class|data) $ is the probability of class given the provided data


## Dataset
In this mini-project I will be utilizing the **Iris Flower Species Dataset** which involves the process of predicting the flower species based on the measurements of the iris flowers.

## Steps
1. #### Seperate the dataset into two classes
    - [Iris-virginica] => 0
    - [Iris-versicolor] => 1
    - [Iris-setosa] => 2
2. #### Summarize the dataset
    - Calculate mean
    - Calculate standard deviation
3. #### Summarize data by blass
    - Calculate mean
    - Calculate standard deviation
    - Calculate statistics
4. #### Gaussian Probability Density Function
    - Calculate probability distribution function
5. #### Class Probabilities
    - Calculate probability of each class

In [154]:
from csv import reader
from random import seed
from random import randrange
from math import sqrt
from math import exp
from math import pi
import pandas as pd
import numpy as np

In [155]:
# Loading in the dataset
col_names = ['sepal_length','sepal_width','petal_length','petal_width','class']
dataset = pd.read_csv('iris.csv',names=col_names)
dataset

Unnamed: 0,sepal_length,sepal_width,petal_length,petal_width,class
0,5.1,3.5,1.4,0.2,Iris-setosa
1,4.9,3.0,1.4,0.2,Iris-setosa
2,4.7,3.2,1.3,0.2,Iris-setosa
3,4.6,3.1,1.5,0.2,Iris-setosa
4,5.0,3.6,1.4,0.2,Iris-setosa
...,...,...,...,...,...
145,6.7,3.0,5.2,2.3,Iris-virginica
146,6.3,2.5,5.0,1.9,Iris-virginica
147,6.5,3.0,5.2,2.0,Iris-virginica
148,6.2,3.4,5.4,2.3,Iris-virginica


In [156]:
# Mapping classes to integer values
classes = {'Iris-virginica':0, 'Iris-versicolor':1, 'Iris-setosa':2}
dataset['class'] = dataset['class'].map(classes)
dataset

Unnamed: 0,sepal_length,sepal_width,petal_length,petal_width,class
0,5.1,3.5,1.4,0.2,2
1,4.9,3.0,1.4,0.2,2
2,4.7,3.2,1.3,0.2,2
3,4.6,3.1,1.5,0.2,2
4,5.0,3.6,1.4,0.2,2
...,...,...,...,...,...
145,6.7,3.0,5.2,2.3,0
146,6.3,2.5,5.0,1.9,0
147,6.5,3.0,5.2,2.0,0
148,6.2,3.4,5.4,2.3,0


In [157]:
# Splitting dataset into classes
Ivirg = dataset.loc[dataset['class'] == 0]
Ivers = dataset.loc[dataset['class'] == 1]
Iseto = dataset.loc[dataset['class'] == 2]
Ivirg.pop('class')
Ivers.pop('class')
Iseto.pop('class')
Ivirg

Unnamed: 0,sepal_length,sepal_width,petal_length,petal_width
100,6.3,3.3,6.0,2.5
101,5.8,2.7,5.1,1.9
102,7.1,3.0,5.9,2.1
103,6.3,2.9,5.6,1.8
104,6.5,3.0,5.8,2.2
105,7.6,3.0,6.6,2.1
106,4.9,2.5,4.5,1.7
107,7.3,2.9,6.3,1.8
108,6.7,2.5,5.8,1.8
109,7.2,3.6,6.1,2.5


In [158]:
# Grabbing statistics
Ivirg_stats = Ivirg.describe()
Ivirg_stats = Ivirg_stats.transpose()
Ivirg_stats

Unnamed: 0,count,mean,std,min,25%,50%,75%,max
sepal_length,50.0,6.588,0.63588,4.9,6.225,6.5,6.9,7.9
sepal_width,50.0,2.974,0.322497,2.2,2.8,3.0,3.175,3.8
petal_length,50.0,5.552,0.551895,4.5,5.1,5.55,5.875,6.9
petal_width,50.0,2.026,0.27465,1.4,1.8,2.0,2.3,2.5


In [159]:
Ivers_stats = Ivers.describe()
Ivers_stats = Ivers_stats.transpose()
Ivers_stats

Unnamed: 0,count,mean,std,min,25%,50%,75%,max
sepal_length,50.0,5.936,0.516171,4.9,5.6,5.9,6.3,7.0
sepal_width,50.0,2.77,0.313798,2.0,2.525,2.8,3.0,3.4
petal_length,50.0,4.26,0.469911,3.0,4.0,4.35,4.6,5.1
petal_width,50.0,1.326,0.197753,1.0,1.2,1.3,1.5,1.8


In [160]:
Iseto_stats = Iseto.describe()
Iseto_stats = Iseto_stats.transpose()
Iseto_stats

Unnamed: 0,count,mean,std,min,25%,50%,75%,max
sepal_length,50.0,5.006,0.35249,4.3,4.8,5.0,5.2,5.8
sepal_width,50.0,3.418,0.381024,2.3,3.125,3.4,3.675,4.4
petal_length,50.0,1.464,0.173511,1.0,1.4,1.5,1.575,1.9
petal_width,50.0,0.244,0.10721,0.1,0.2,0.2,0.3,0.6


In [161]:
dict_stats = {0:Ivirg_stats,1:Ivers_stats,2:Iseto_stats}

In [168]:
# Calculate the Gaussian probability distribution function for x
def calculate_probability(x, stats):
    exponent = np.exp(-((x-stats['mean'])**2 / (2 * stats['std']*2)))
    return (1/(sqrt(2*pi)*stats['std'])*exponent)

def calculate_class_probability(x):
    probabilities = dict()
    for i in range(len(classes)):
        probabilities[i] = len(dataset[dataset['class']==i].index) / len(dataset['class'].index)
        probabilities[i] *= np.prod(calculate_probability(x, dict_stats[i]))
    max_key = max(probabilities, key=probabilities.get)
    return max_key
    
predicted_class = calculate_class_probability([5.7,2.9,4.2,1.3])
predicted_class

1

#### Credits: [Naive Bayes Classifier From Scratch in Python](https://machinelearningmastery.com/naive-bayes-classifier-scratch-python/)