# Gaussian Naive Bayes

### This notebook contains implementation of Gaussian Naive Bayes Algorithm on PIMA Indian Diabetes Patient dataset.

**Step 1:** Import all the necessary python libraries

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline

plt.rcParams['figure.figsize']= (20.0, 10.0)

**Step 2:** Collecting Data and performing basic data analysis

In [2]:
data = pd.read_csv('data/diabetes.csv')
data.head()

Unnamed: 0,Pregnancies,Glucose,BloodPressure,SkinThickness,Insulin,BMI,DiabetesPedigreeFunction,Age,Outcome
0,6,148,72,35,0,33.6,0.627,50,1
1,1,85,66,29,0,26.6,0.351,31,0
2,8,183,64,0,0,23.3,0.672,32,1
3,1,89,66,23,94,28.1,0.167,21,0
4,0,137,40,35,168,43.1,2.288,33,1


In [3]:
data = data.astype(np.float64)
data.isnull().sum()

Pregnancies                 0
Glucose                     0
BloodPressure               0
SkinThickness               0
Insulin                     0
BMI                         0
DiabetesPedigreeFunction    0
Age                         0
Outcome                     0
dtype: int64

**Step 3:** Import *train_test_split* function from scikit-learn model_selection and split the given dataset into training and testing data.

In [4]:
from sklearn.model_selection import train_test_split

In [5]:
X = data.drop('Outcome', axis=1)
Y = data['Outcome']

In [6]:
x_train, x_test, y_train, y_test = train_test_split(X, Y, test_size=0.3)

**Step 4: Naive Bayes Implementation in Scikit-Learn**

In [7]:
from sklearn.naive_bayes import GaussianNB

In [8]:
clf = GaussianNB()
clf.fit(x_train, y_train)

GaussianNB(priors=None, var_smoothing=1e-09)

In [9]:
y_pred = clf.predict(x_test)

In [10]:
from sklearn.metrics import accuracy_score

In [11]:
print(accuracy_score(y_test, y_pred))

0.7835497835497836


Check the accuracy of the model

In [12]:
clf.score(x_test, y_test)

0.7835497835497836