# implementing Naive bayes using Scikit learn

### now there are three types of naive bayes in scikit learn

 - Multinomial. 
 http://scikit-learn.org/stable/modules/generated/sklearn.naive_bayes.MultinomialNB.html
 
 - Bernoulli. 
 http://scikit-learn.org/stable/modules/generated/sklearn.naive_bayes.BernoulliNB.html
 
 - and finally Gaussian.
 http://scikit-learn.org/stable/modules/generated/sklearn.naive_bayes.GaussianNB.html
 
 ## a quick reminder, we hve implemented gaussian naive bayesian in the above code


In [1]:
# import dependencies
import numpy as np
import pandas as pd

# other dependencies that you might not need
# just for publishing image in notebook
from IPython.display import Image
from IPython.core.display import HTML 
%matplotlib  inline

In [2]:
# column has all the name of column name 
# our data is stored in dataframe: data

column = ["Pregnancies","Glucose","BloodPressure","SkinThickness","Insulin","BMI","DiabetesPedigreeFunction","Age","Outcome"]
data = pd.read_csv('pima-indians-diabetes.data.csv',names=column)

In [3]:
data.head()

Unnamed: 0,Pregnancies,Glucose,BloodPressure,SkinThickness,Insulin,BMI,DiabetesPedigreeFunction,Age,Outcome
0,6,148,72,35,0,33.6,0.627,50,1
1,1,85,66,29,0,26.6,0.351,31,0
2,8,183,64,0,0,23.3,0.672,32,1
3,1,89,66,23,94,28.1,0.167,21,0
4,0,137,40,35,168,43.1,2.288,33,1


In [4]:
# to recollect, this is our bayesian formula
Image(url= "images/bayes.PNG")

Where,
 - P(c|x) is the posterior probability of class c given predictor ( features).
 - P(c) is the probability of class.
 - P(x|c) is the likelihood which is the probability of predictor given class.
 - P(x) is the prior probability of predictor.

In [5]:

X = data.iloc[:,0:-1] # X is the features in our dataset
y = data.iloc[:,-1]   # y is the Labels in our dataset

In [6]:
# divide the dataset in train test using scikit learn
# now the model will train in training dataset and then we will use test dataset to predict its accuracy

from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state=42) 

In [7]:
# now preparing our model as per Gaussian Naive Bayesian

from sklearn.naive_bayes import GaussianNB

model = GaussianNB().fit(X_train, y_train) #fitting our model

In [8]:
predicted_y = model.predict(X_test) #now predicting our model to our test dataset

In [9]:
from sklearn.metrics import accuracy_score

# now calculating that how much accurate our model is with comparing our predicted values and y_test values
accuracy_score = accuracy_score(y_test, predicted_y) 
print (accuracy_score)

0.7362204724409449


## wow!! 

### we got 73% accuracy. It means it is accurate about the result 73%

In [10]:
# Lets test our model with random input

In [11]:
# Create an empty dataframe that we have to predict 
person = pd.DataFrame()

# Create some feature values for this single row
person['Pregnancies'] = [7]
person['Glucose'] = [130]
person['BloodPressure'] = [86]
person['SkinThickness'] = [34]
person['Insulin'] = [0]
person['BMI'] = [33.5]
person['DiabetesPedigreeFunction'] = [0.564]
person['Age'] = [50]
# View the data 
person

Unnamed: 0,Pregnancies,Glucose,BloodPressure,SkinThickness,Insulin,BMI,DiabetesPedigreeFunction,Age
0,7,130,86,34,0,33.5,0.564,50


In [12]:
# the data is stored in Datadrame person
predicted_y = model.predict(person)

In [13]:
print (predicted_y)

[1]
