<a href="https://colab.research.google.com/github/anilkumar4274/aiassignments/blob/sharath/Naive_Bayes_in_scikit_learn.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# implementing Naive bayes using Scikit learn

### now there are three types of naive bayes in scikit learn

 - Multinomial. 
 http://scikit-learn.org/stable/modules/generated/sklearn.naive_bayes.MultinomialNB.html
 
 - Bernoulli. 
 http://scikit-learn.org/stable/modules/generated/sklearn.naive_bayes.BernoulliNB.html
 
 - and finally Gaussian.
 http://scikit-learn.org/stable/modules/generated/sklearn.naive_bayes.GaussianNB.html
 
 ## a quick reminder, we have implemented gaussian naive bayesian in the above code


Three types of naive bayes in scikit learn:
    
1. Multinomial
2. Bernouli
3. Gaussian

The general function naive_bayes() detects the class of each feature in the dataset and, depending on the user choices, assumes possibly different distribution for each feature. It currently supports following class conditional distributions:

categorical distribution for discrete features
Poisson distribution for non-negative integers
Gaussian distribution for continuous features
non-parametrically estimated densities via Kernel Density Estimation for continuous features
In addition to that specialized functions are available which implement:

Bernoulli Naive Bayes via bernoulli_naive_bayes()
Multinomial Naive Bayes via multinomial_naive_bayes()
Poisson Naive Bayes via poisson_naive_bayes()
Gaussian Naive Bayes via gaussian_naive_bayes()
Non-Parametric Naive Bayes via nonparametric_naive_bayes()

In [0]:
# import dependencies
import numpy as np
import pandas as pd

# other dependencies that you might not need
# just for publishing image in notebook
from IPython.display import Image
from IPython.core.display import HTML 
%matplotlib  inline

In [0]:
data = pd.read_csv('pima-indians-diabetes.csv')
data.columns

Index(['6', '148', '72', '35', '0', '33.6', '0.627', '50', '1'], dtype='object')

In [0]:
# column has all the name of column name 
# our data is stored in dataframe: data

column = ["Pregnancies","Glucose","BloodPressure","SkinThickness","Insulin","BMI","DiabetesPedigreeFunction","Age","Outcome"]
data = pd.read_csv('pima-indians-diabetes.csv',names=column)

In [0]:
data.head()

Unnamed: 0,Pregnancies,Glucose,BloodPressure,SkinThickness,Insulin,BMI,DiabetesPedigreeFunction,Age,Outcome
0,6,148,72,35,0,33.6,0.627,50,1
1,1,85,66,29,0,26.6,0.351,31,0
2,8,183,64,0,0,23.3,0.672,32,1
3,1,89,66,23,94,28.1,0.167,21,0
4,0,137,40,35,168,43.1,2.288,33,1


In [0]:
# to recollect, this is our bayesian formula
Image(url= "images/bayes.PNG")

Where,
 - P(c|x) is the posterior probability of class c given predictor ( features).
 - P(c) is the probability of class.
 - P(x|c) is the likelihood which is the probability of predictor given class.
 - P(x) is the prior probability of predictor.

In [0]:
X = data.iloc[:,0:-1] # X is the features in our dataset
y = data.iloc[:,-1]   # y is the Labels in our dataset
X.head()

Unnamed: 0,Pregnancies,Glucose,BloodPressure,SkinThickness,Insulin,BMI,DiabetesPedigreeFunction,Age
0,6,148,72,35,0,33.6,0.627,50
1,1,85,66,29,0,26.6,0.351,31
2,8,183,64,0,0,23.3,0.672,32
3,1,89,66,23,94,28.1,0.167,21
4,0,137,40,35,168,43.1,2.288,33


In [0]:
X=data.iloc[:,0:-1]
y=data.iloc[:,-1]
#type(y)
#type(X)
X.shape

(768, 8)

In [0]:
# divide the dataset in train test using scikit learn
# now the model will train in training dataset and then we will use test dataset to predict its accuracy

from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state=1) 

In [0]:
X_train.head()

Unnamed: 0,Pregnancies,Glucose,BloodPressure,SkinThickness,Insulin,BMI,DiabetesPedigreeFunction,Age
718,1,108,60,46,178,35.5,0.415,24
274,13,106,70,0,0,34.2,0.251,52
725,4,112,78,40,0,39.4,0.236,38
700,2,122,76,27,200,35.9,0.483,26
116,5,124,74,0,0,34.0,0.22,38


In [0]:
# now preparing our model as per Gaussian Naive Bayesian

from sklearn.naive_bayes import GaussianNB

model = GaussianNB().fit(X_train, y_train) #fitting our model

In [0]:
y_predicted = model.predict(X_test) #now predicting our model to our test dataset

In [0]:
from sklearn.metrics import accuracy_score

# now calculating that how much accurate our model is with comparing our predicted values and y_test values
accuracy_score = accuracy_score(y_test, y_predicted) 
print(accuracy_score)

0.7598425196850394


In [0]:
model.score(X_test,y_test)

0.7598425196850394

## wow!! 

### we got 73% accuracy. It means it is accurate about the result 73%

In [0]:
# Lets test our model with random input

In [0]:
# Create an empty dataframe that we have to predict 
person = pd.DataFrame()

# Create some feature values for this single row
person['Pregnancies'] = [7]
person['Glucose'] = [130]
person['BloodPressure'] = [86]
person['SkinThickness'] = [34]
person['Insulin'] = [0]
person['BMI'] = [33.5]
person['DiabetesPedigreeFunction'] = [0.52]
person['Age'] = [50]
# View the data 
person

Unnamed: 0,Pregnancies,Glucose,BloodPressure,SkinThickness,Insulin,BMI,DiabetesPedigreeFunction,Age
0,7,130,86,34,0,33.5,0.52,50


In [0]:
# the data is stored in Datadrame person
predicted_y = model.predict(person)

In [0]:
print(predicted_y)

[0]
