# Challenge - Diabetes Classification

### Diabetes Challenge!
#### Machine Learning in Health

- Diabetes is a condition that impairs the body's ability to process blood glucose, otherwise known as blood sugar. In the United States, the estimated number of people over 18 years of age with diagnosed and undiagnosed diabetes is 30.2 million. The figure represents between 27.9 and 32.7 percent of the population.

- Without ongoing, careful management, diabetes can lead to a buildup of sugars in the blood, which can increase the risk of dangerous complications, including stroke and heart disease.

- Different kinds of diabetes can occur, and managing the condition depends on the type. Not all forms of diabetes stem from a person being overweight or leading an inactive lifestyle. In fact, some are present from childhood.

### Challenge

- In this problem you are given a Diabetes Data set consisting of following features -

- ['Pregnancies', 'Glucose', 'BloodPressure', 'SkinThickness', 'Insulin', 'BMI', 'DiabetesPedigreeFunction', 'Age', 'Outcome']

- and your task is to predict whether a person is suffering from diabetes or not (Binary Classification)

### Tasks

1) Plot a bar graph showing number of classes and no of examples in each class.

2) Classification Task, classify a person as 0 or 1 (Diabetic or Not) using K-Nearest Neighbors classifier.

In [2]:
# importing basic required libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

In [3]:
x_train = pd.read_csv('./Diabetes_XTrain.csv')
y_train = pd.read_csv('./Diabetes_YTrain.csv')
x_test = pd.read_csv('./Diabetes_Xtest.csv')

print(type(x_train), x_train.shape)
print(type(y_train), y_train.shape)
print(type(x_test), x_test.shape)

<class 'pandas.core.frame.DataFrame'> (576, 8)
<class 'pandas.core.frame.DataFrame'> (576, 1)
<class 'pandas.core.frame.DataFrame'> (192, 8)


In [4]:
x_train.head()

Unnamed: 0,Pregnancies,Glucose,BloodPressure,SkinThickness,Insulin,BMI,DiabetesPedigreeFunction,Age
0,7,168,88,42,321,38.2,0.787,40
1,8,110,76,0,0,27.8,0.237,58
2,7,147,76,0,0,39.4,0.257,43
3,2,100,66,20,90,32.9,0.867,28
4,4,129,86,20,270,35.1,0.231,23


In [5]:
x_train.isnull().sum()

Pregnancies                 0
Glucose                     0
BloodPressure               0
SkinThickness               0
Insulin                     0
BMI                         0
DiabetesPedigreeFunction    0
Age                         0
dtype: int64

In [6]:
y_train.head()

Unnamed: 0,Outcome
0,1
1,0
2,1
3,1
4,0


In [7]:
y_train.isnull().sum()

Outcome    0
dtype: int64

In [42]:
y_train.value_counts()

Outcome
0          375
1          201
dtype: int64

In [43]:
from sklearn.neighbors import KNeighborsClassifier

In [46]:
model = KNeighborsClassifier(n_neighbors=11)
model.fit(x_train, y_train['Outcome'])

In [49]:
# Accuracy
print("Training Score = ", model.score(x_train, y_train))

Training Score =  0.7934027777777778


In [51]:
y_pred = model.predict(x_test)
y_pred

array([1, 0, 0, 0, 0, 1, 1, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 1, 1, 1, 1, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 0, 1, 0, 0, 0,
       1, 0, 0, 1, 0, 1, 0, 1, 0, 0, 1, 0, 1, 0, 0, 1, 0, 0, 0, 0, 1, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0,
       0, 0, 1, 1, 1, 0, 1, 1, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 1, 1, 0, 0,
       1, 1, 1, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 1, 1, 1, 0, 1, 0, 1, 1,
       0, 0, 0, 0, 1, 1, 1, 0, 0, 1, 1, 1, 1, 0, 0, 1, 1, 1, 0, 0, 1, 1,
       0, 1, 1, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0,
       0, 0, 1, 1, 0, 0, 1, 1, 0, 0, 0, 1, 0, 1, 0, 0], dtype=int64)

In [54]:
f = pd.read_csv('./sample_submission.csv')
f.head(10)

Unnamed: 0,Outcome
0,1
1,1
2,1
3,1
4,1
5,1
6,1
7,1
8,1
9,1


In [57]:
y_pred = pd.DataFrame(y_pred)
y_pred.rename(columns={0 : 'Outcome'}, inplace=True)
y_pred

Unnamed: 0,Outcome
0,1
1,0
2,0
3,0
4,0
...,...
187,1
188,0
189,1
190,0


In [59]:
y_pred.value_counts()

Outcome
0          127
1           65
dtype: int64

In [62]:
y_pred.to_csv('predictions.csv', index=False)

In [63]:
d = pd.read_csv('./predictions.csv')
d.head()

Unnamed: 0,Outcome
0,1
1,0
2,0
3,0
4,0
