# Pima Indians diabetes analysis using Keras

Created from the page http://machinelearningmastery.com/tutorial-first-neural-network-python-keras/

In [1]:
import pandas as pd
import requests
from urllib.parse import urljoin

Load keras libraries and set random number seed

In [2]:
from keras.models import Sequential
from keras.layers import Dense
import numpy
seed = 7  # set randome seed for reproducibility
numpy.random.seed(seed)

Using TensorFlow backend.


In [3]:
url_base = 'http://archive.ics.uci.edu/ml/machine-learning-databases/pima-indians-diabetes/'
url_names = 'pima-indians-diabetes.names'
url_data = 'pima-indians-diabetes.data'

req = requests.get(urljoin(url_base, url_names))
desc = req.text  # print(desc) to display description
url = urljoin(url_base, url_data)

In [4]:
columns = [
    'preg_count',  # 1. Number of times pregnant
    'glucose',  # 2. Plasma glucose concentration
    'bp',       # 3. Diastolic blood pressure (mm Hg)
    'triceps',  # 4. Triceps skin fold thickness (mm)
    'insulin',  # 5. 2-Hour serum insulin (mu U/ml)
    'bmi',      # 6. Body mass index (weight in kg/(height in m)^2)
    'dpf',      # 7. Diabetes pedigree function
    'age',      # 8. Age (years)
    'diabetes'  # 9. Class variable (0 or 1 - positive)
]

In [5]:
df = pd.read_csv(url, header=None, names=columns)

## describe dataset

In [6]:
df.describe()

Unnamed: 0,preg_count,glucose,bp,triceps,insulin,bmi,dpf,age,diabetes
count,768.0,768.0,768.0,768.0,768.0,768.0,768.0,768.0,768.0
mean,3.845052,120.894531,69.105469,20.536458,79.799479,31.992578,0.471876,33.240885,0.348958
std,3.369578,31.972618,19.355807,15.952218,115.244002,7.88416,0.331329,11.760232,0.476951
min,0.0,0.0,0.0,0.0,0.0,0.0,0.078,21.0,0.0
25%,1.0,99.0,62.0,0.0,0.0,27.3,0.24375,24.0,0.0
50%,3.0,117.0,72.0,23.0,30.5,32.0,0.3725,29.0,0.0
75%,6.0,140.25,80.0,32.0,127.25,36.6,0.62625,41.0,1.0
max,17.0,199.0,122.0,99.0,846.0,67.1,2.42,81.0,1.0


Split dat into input X and output Y variables

In [7]:
X = df.values[:, 0:8]
y = df.values[:, 8]

Use a fully-connected network with three layers.  Set the number of input neurons to 12 with 8 inputs to match the 8 input variables. The second layer will have 8 neurons and the output layer has 1 neuraon to predict the class (onset of diabetes or not).

The network weights are inntialized with a small random number generated from a uniform distribution **uniform** between 0 and 0.05 which is the default in Keras. 

The rectified linear unit **relu** is used on the first two layers and the sigmod function is used in the output layer. The sigmoid activation ensures that ht eoutput is between 0 and 1 and is easy to map to 0 or 1 by using a threshold of 0.5.

In [8]:
model = Sequential()
model.add(Dense(12, input_dim=8, init='uniform', activation='relu'))
model.add(Dense(8, init='uniform', activation='relu'))
model.add(Dense(1, init='uniform', activation='sigmoid'))

Compile the model using *crossentropy* as the loss function and a gradient descent algorithm *adam* as the optimier. Calculate the *accuracy* of the classification

In [9]:
model.compile(
    loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])

Fit the model on the data running 150 iterations with a small batch size of 10

In [13]:
model.fit(X, y, nb_epoch=150, batch_size=10, verbose=0)

<keras.callbacks.History at 0x7fd9f1ada0b8>

Evaluate the model by printing the scores

In [11]:
scores = model.evaluate(X, y)
print('{}: {}'.format(model.metrics_names[1], scores[1] * 100))

 32/768 [>.............................] - ETA: 0sacc: 83.46354166666666


In [12]:
scores

[0.36076173620919388, 0.83463541666666663]

In [18]:
model.metrics_names

['loss', 'acc']