# The Data: diabetes

First, we use this data set from Kaggle which tracks diabetes in Pima Native Americans. We use it to build a predictive model of how likely someone is to get or have diabetes given their age, body mass index, glucose and insulin levels, skin thickness, etc.

The code below plugs these features (glucode, BMI, etc.) and labels (the single value yes [1] or no [0]) into a Keras neural network to build a model that with about 80% accuracy can predict whether someone has or will get Type II diabetes.

In [1]:
import tensorflow as tf
from keras.models import Sequential
import pandas as pd
from keras.layers import Dense

data = pd.read_csv('diabetes.csv', delimiter=',')

Using TensorFlow backend.


First let’s browse the data, listing maximum and minimum and average values

In [2]:
data.describe()

Unnamed: 0,Pregnancies,Glucose,BloodPressure,SkinThickness,Insulin,BMI,DiabetesPedigreeFunction,Age,Outcome
count,768.0,768.0,768.0,768.0,768.0,768.0,768.0,768.0,768.0
mean,3.845052,120.894531,69.105469,20.536458,79.799479,31.992578,0.471876,33.240885,0.348958
std,3.369578,31.972618,19.355807,15.952218,115.244002,7.88416,0.331329,11.760232,0.476951
min,0.0,0.0,0.0,0.0,0.0,0.0,0.078,21.0,0.0
25%,1.0,99.0,62.0,0.0,0.0,27.3,0.24375,24.0,0.0
50%,3.0,117.0,72.0,23.0,30.5,32.0,0.3725,29.0,0.0
75%,6.0,140.25,80.0,32.0,127.25,36.6,0.62625,41.0,1.0
max,17.0,199.0,122.0,99.0,846.0,67.1,2.42,81.0,1.0


In [3]:
 data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 768 entries, 0 to 767
Data columns (total 9 columns):
Pregnancies                 768 non-null int64
Glucose                     768 non-null int64
BloodPressure               768 non-null int64
SkinThickness               768 non-null int64
Insulin                     768 non-null int64
BMI                         768 non-null float64
DiabetesPedigreeFunction    768 non-null float64
Age                         768 non-null int64
Outcome                     768 non-null int64
dtypes: float64(2), int64(7)
memory usage: 54.1 KB


Check correlation with heatmap graph
Next, run this code to see any correlation between variables. That is not important for the final model but is useful to gain further insight into the data.

Seaborn creates a heatmap-type chart, plotting each value from the dataset against itself and every other value. Then it figures out if these two values are in any way correlated with each other.

In [4]:
import seaborn as sns
import matplotlib as plt
corr = data.corr()
sns.heatmap(corr, 
            xticklabels=corr.columns.values,
            yticklabels=corr.columns.values)

<matplotlib.axes._subplots.AxesSubplot at 0x1ad309af2c8>

Items that are perfectly correlated have correlation value 1. Obviously, every metric is perfectly correlated with itself., illustrated by the tan line going diagonally across the middle of the chart.
There’s not a lot of orange squares in the chart. So, you can say that no single value is 80% likely to give you diabetes (outcome). There does not seem to be much correlation between these individual variables. But, we will see that when taken in the aggregate we can predict with almost 75% accuracy who will develop diabetes given all of these factors together.

You can check the correlation between two variables in a dataframe like shown below.  There is not much correlation here since 0.28 and 0.54 are far from 1.00.

# Prepare the test and training data sets
Outcome is the column with the label (0 or 1).
The rest of the columns are the features.
We use the scikit-learn function train_test_split(X, y, test_size=0.33, random_state=42) to split the data into training and test data sets, given 33% of the records to the test data set.  The training data set is used to train the mode, meaning find the weights and biases.  The test data set is used to check its accuracy.
labels is not an array. It is a column in a dataset.  So we use the NumPy np.ravel() function to convert that to an array.

In [5]:
import numpy as np

labels=data['Outcome']
features = data.iloc[:,0:8]

from sklearn.model_selection import train_test_split

X=features

y=np.ravel(labels)

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state=42) 

Now we normalize the values, meaning take each x in the training and test data set and 
calculate the distance from the mean divided by the standard deviation. 
That put the data on a standard scale, which is a standard practice with machine learning.

StandardScaler does this in two steps:  fit() and transform().

from sklearn.preprocessing import StandardScaler

scaler = StandardScaler().fit(X_train)

X_train = scaler.transform(X_train)

X_test = scaler.transform(X_test)  

# The Keras sequential model
The code below created a Keras sequential model, which means building up the layers in the neural network by adding them one at a time, as opposed to other techniques and neural network types.

Activation function
Pick an activation function for each layer. It takes that ((w • x) + b) and calculates a probability. Then it sets a threshold to determine whether the neuron ((w • x) + b) should be 1 (true) or (0) negative. (That’s not the same as saying diabetic, 1, or not, 0, as neural networks can handle problems with more than just two discrete outcomes.)

For the first two layers we use a relu (rectified linear unit) activation function. That choice means nothing, as you could have picked sigmoid.  reluI is 1 for all positive values and 0 for all negative ones.
This is the same as saying f(x) = max (0, x). So f(-1), for example = max(0, -1) = 0. In other words, if our probability function is negative, then pick 0 (false). Otherwise pick 1 (true).

The rule as to which activation function to pick is trial and error. Pick different ones and see which produces the most accurate predictions. There are others: Sigmoid, tanh, Softmax, ReLU, and Leaky ReLU. Some are more suitable to multiple rather than binary outputs.

Sigmoid uses the logistic function, 1 / (1 + e**z) where  z = f(x) =  ((w • x) + b).


In [6]:
from keras.models import Sequential
from keras.layers import Dense

model = Sequential()

model.add(Dense(8, activation='relu', input_shape=(8,)))

model.add(Dense(8, activation='relu'))

model.add(Dense(1, activation='sigmoid'))

Some notes on the code:

input_shape—we only have to give it the shape (dimensions) of the input on the first layer. It’s (8,) since it’s a vector of 8 features. In other words its 8 x 1.
Dense—to apply the activation function over ((w • x) + b). The first argument in the Dense function is the number of hidden units, a parameter that you can adjust to improve the accuracy of the model. Hidden units is, like the number of hidden layers, a complex topic not easy to understand or explain, but it’s one we can safely gloss over.  (The complexity of these two topics is what makes most people say that working with neural networks is art. A mathematician would mock that lack of rigor.)

loss—the goal of the neural network is to minimize the loss function, i.e., the difference between predicted and observed values. There are many functions we can use. We pick binary_crossentropy because our label data is binary (1) diabetic and (0) not diabetic.
optimizer—we use the optimizer function sgd, Stochastic Gradient Descent. It’s an algorithm designed to minimize the loss function in the quickest way possible. There are others.
epoch—means how many times to run the model. Remember that it is an iterative process. You could add additional epochs, but the accuracy might not change much. You just have to try and see.
metrics—means what metrics to display as it runs. Accuracy means how accurately the evolving model predicts the outcome.
batch size—n means divide the input data into n batches and process each in parallel.
fit()—trains the model, meaning calculates the weights, biases, number of layers, etc.


In [7]:
model.compile(loss='binary_crossentropy',
              optimizer='sgd',
              metrics=['accuracy'])
                   
model.fit(X_train, y_train,epochs=4, batch_size=1, verbose=1)

Epoch 1/4
Epoch 2/4
Epoch 3/4
Epoch 4/4


<keras.callbacks.callbacks.History at 0x1ad31857a48>

Above, we talked about the iterative process of solving a neural network for weights and bias.  That’s done with epochs. Here is the output as it runs those. As you can see the accuracy goes up quickly then levels off.

In [8]:
for layer in model.layers:
    weights = layer.get_weights()

In [9]:
!pip install pydot



We can also draw a picture of the layers and their shapes. It’s not very useful but nice to see.

As you would expect, the shape of the output is 1, as there we have our prediction:

In [10]:
y_pred = model.predict_classes(X_test)

This prints the score, or accuracy.

In [11]:
score = model.evaluate(X_test, y_test,verbose=1)

print(score)

[0.641478711695183, 0.6614173054695129]


So, our predictive model is 72% accurate.

# Hyperparameter tuning

## Tune Batch Size and Number of Epochs

In [43]:
from keras.models import Sequential
from keras.layers import Dense, Dropout
from keras.wrappers.scikit_learn import KerasClassifier
from sklearn.model_selection import GridSearchCV
from keras.callbacks import EarlyStopping

def DL_Model():
    model = Sequential()
    model.add(Dense(8, activation='relu', input_shape=(8,)))
    model.add(Dense(8, activation='relu'))
    model.add(Dense(1, activation='sigmoid'))
    model.compile(loss='binary_crossentropy', optimizer= 'adam', metrics=['accuracy'])
    return model

In [49]:
model = KerasClassifier(build_fn=DL_Model, verbose=0)
batch_size = [10, 20, 40, 60, 80, 100]
epochs = [10, 50, 100]
param_grid = dict(batch_size=batch_size, epochs=epochs)
grid = GridSearchCV(estimator=model, param_grid=param_grid, n_jobs=-1, cv=3)
grid_result = grid.fit(X_train, y_train)
# summarize results
print("Best: %f using %s" % (grid_result.best_score_, grid_result.best_params_))
means = grid_result.cv_results_['mean_test_score']
stds = grid_result.cv_results_['std_test_score']
params = grid_result.cv_results_['params']
for mean, stdev, param in zip(means, stds, params):
    print("%f (%f) with: %r" % (mean, stdev, param))



Best: 0.690661 using {'batch_size': 10, 'epochs': 100}
0.607004 (0.040586) with: {'batch_size': 10, 'epochs': 10}
0.671206 (0.003298) with: {'batch_size': 10, 'epochs': 50}
0.690661 (0.034608) with: {'batch_size': 10, 'epochs': 100}
0.593385 (0.016918) with: {'batch_size': 20, 'epochs': 10}
0.669261 (0.051961) with: {'batch_size': 20, 'epochs': 50}
0.653697 (0.035852) with: {'batch_size': 20, 'epochs': 100}
0.552529 (0.053944) with: {'batch_size': 40, 'epochs': 10}
0.599222 (0.051499) with: {'batch_size': 40, 'epochs': 50}
0.628405 (0.011931) with: {'batch_size': 40, 'epochs': 100}
0.515564 (0.121114) with: {'batch_size': 60, 'epochs': 10}
0.603113 (0.055104) with: {'batch_size': 60, 'epochs': 50}
0.661479 (0.032541) with: {'batch_size': 60, 'epochs': 100}
0.529183 (0.054522) with: {'batch_size': 80, 'epochs': 10}
0.544747 (0.041930) with: {'batch_size': 80, 'epochs': 50}
0.651751 (0.022515) with: {'batch_size': 80, 'epochs': 100}
0.589494 (0.026787) with: {'batch_size': 100, 'epochs':

In [50]:
from sklearn.metrics import accuracy_score
y_pred=grid.predict(X_test)
accuracy_score(y_test,y_pred)

0.7362204724409449

## Tune Activation,Neurons and optimizer

In [23]:
def DL_Model(activation= 'linear', neurons= 5, optimizer='Adam'):
    model = Sequential()
    model.add(Dense(neurons, activation=activation, input_shape=(8,)))
    model.add(Dense(neurons, activation=activation))
    model.add(Dense(1, activation='sigmoid'))
    model.compile(loss='binary_crossentropy', optimizer= optimizer, metrics=['accuracy'])
    return model

In [25]:
model = KerasClassifier(build_fn=DL_Model, verbose=0)
activation = ['softmax', 'relu', 'tanh', 'sigmoid']
neurons = [5, 10, 15]
optimizer = ['SGD', 'Adam','RMSprop']
# weight_constraint = [1, 2, 3]
# dropout_rate = [0.0, 0.1, 0.2, 0.3, 0.4, 0.5]
param_grid = dict(activation = activation, neurons = neurons, optimizer = optimizer)
clf = KerasClassifier(build_fn= DL_Model, epochs=15, batch_size=5, verbose= 1)
model = GridSearchCV(estimator= clf, param_grid=param_grid, n_jobs=-1)
model.fit(X_train,y_train)

print("Max Accuracy Registred: {} using {}".format(round(model.best_score_,3), 
                                                   model.best_params_))

Epoch 1/15
Epoch 2/15
Epoch 3/15
Epoch 4/15
Epoch 5/15
Epoch 6/15
Epoch 7/15
Epoch 8/15
Epoch 9/15
Epoch 10/15
Epoch 11/15
Epoch 12/15
Epoch 13/15
Epoch 14/15
Epoch 15/15
Max Accuracy Registred: 0.679 using {'activation': 'relu', 'neurons': 15, 'optimizer': 'SGD'}


In [27]:
from sklearn.metrics import accuracy_score
y_pred=model.predict(X_test)
accuracy_score(y_test,y_pred)



0.6653543307086615

## Tune Learning Rate and Momentum

In [28]:
def DL_Model(learn_rate=0.01, momentum=0):
    model = Sequential()
    model.add(Dense(8, activation='relu', input_shape=(8,)))
    model.add(Dense(8, activation='relu'))
    model.add(Dense(1, activation='sigmoid'))
    optimizer = SGD(lr=learn_rate, momentum=momentum)
    model.compile(loss='binary_crossentropy', optimizer= optimizer, metrics=['accuracy'])
    return model

In [31]:
from keras.optimizers import SGD
model = KerasClassifier(build_fn=DL_Model, epochs=100, batch_size=10, verbose=0)
# define the grid search parameters
learn_rate = [0.001, 0.01, 0.1, 0.2, 0.3]
momentum = [0.0, 0.2, 0.4, 0.6, 0.8, 0.9]
param_grid = dict(learn_rate=learn_rate, momentum=momentum)
grid = GridSearchCV(estimator=model, param_grid=param_grid, n_jobs=-1, cv=3)
grid_result = grid.fit(X_train, y_train)
# summarize results
print("Best: %f using %s" % (grid_result.best_score_, grid_result.best_params_))
means = grid_result.cv_results_['mean_test_score']
stds = grid_result.cv_results_['std_test_score']
params = grid_result.cv_results_['params']
for mean, stdev, param in zip(means, stds, params):
    print("%f (%f) with: %r" % (mean, stdev, param))



Best: 0.669261 using {'learn_rate': 0.01, 'momentum': 0.0}
0.655642 (0.020429) with: {'learn_rate': 0.001, 'momentum': 0.0}
0.661479 (0.028479) with: {'learn_rate': 0.001, 'momentum': 0.2}
0.663424 (0.034974) with: {'learn_rate': 0.001, 'momentum': 0.4}
0.651751 (0.006041) with: {'learn_rate': 0.001, 'momentum': 0.6}
0.661479 (0.008760) with: {'learn_rate': 0.001, 'momentum': 0.8}
0.651751 (0.028398) with: {'learn_rate': 0.001, 'momentum': 0.9}
0.669261 (0.031239) with: {'learn_rate': 0.01, 'momentum': 0.0}
0.647860 (0.006367) with: {'learn_rate': 0.01, 'momentum': 0.2}
0.649805 (0.007822) with: {'learn_rate': 0.01, 'momentum': 0.4}
0.642023 (0.005086) with: {'learn_rate': 0.01, 'momentum': 0.6}
0.645914 (0.002419) with: {'learn_rate': 0.01, 'momentum': 0.8}
0.645914 (0.002419) with: {'learn_rate': 0.01, 'momentum': 0.9}
0.645914 (0.002419) with: {'learn_rate': 0.1, 'momentum': 0.0}
0.645914 (0.002419) with: {'learn_rate': 0.1, 'momentum': 0.2}
0.645914 (0.002419) with: {'learn_rate': 

In [32]:
from sklearn.metrics import accuracy_score
y_pred=grid.predict(X_test)
accuracy_score(y_test,y_pred)

0.6614173228346457

## Tune Network Weight Initialization

In [34]:
def DL_Model(init_mode='uniform'):
    model = Sequential()
    model.add(Dense(8, activation='relu', input_shape=(8,),kernel_initializer=init_mode))
    model.add(Dense(8, activation='relu',kernel_initializer=init_mode))
    model.add(Dense(1, activation='sigmoid',kernel_initializer=init_mode))
    model.compile(loss='binary_crossentropy', optimizer= 'adam', metrics=['accuracy'])
    return model

In [36]:
model = KerasClassifier(build_fn=DL_Model, epochs=100, batch_size=10, verbose=0)
# define the grid search parameters
init_mode = ['uniform', 'lecun_uniform', 'normal', 'zero', 'glorot_normal', 'glorot_uniform', 'he_normal', 'he_uniform']
param_grid = dict(init_mode=init_mode)
grid = GridSearchCV(estimator=model, param_grid=param_grid, n_jobs=-1, cv=3)
grid_result = grid.fit(X_train, y_train)
# summarize results
print("Best: %f using %s" % (grid_result.best_score_, grid_result.best_params_))
means = grid_result.cv_results_['mean_test_score']
stds = grid_result.cv_results_['std_test_score']
params = grid_result.cv_results_['params']
for mean, stdev, param in zip(means, stds, params):
    print("%f (%f) with: %r" % (mean, stdev, param))

Best: 0.702335 using {'init_mode': 'normal'}
0.669261 (0.038336) with: {'init_mode': 'uniform'}
0.659533 (0.021570) with: {'init_mode': 'lecun_uniform'}
0.702335 (0.004840) with: {'init_mode': 'normal'}
0.645914 (0.002419) with: {'init_mode': 'zero'}
0.677043 (0.011176) with: {'init_mode': 'glorot_normal'}
0.696498 (0.028531) with: {'init_mode': 'glorot_uniform'}
0.671206 (0.030348) with: {'init_mode': 'he_normal'}
0.649805 (0.000963) with: {'init_mode': 'he_uniform'}


In [37]:
from sklearn.metrics import accuracy_score
y_pred=grid.predict(X_test)
accuracy_score(y_test,y_pred)

0.7598425196850394

## Tune Dropout Regularization

In [38]:
def DL_Model(dropout_rate=0.0, weight_constraint=0):
    model = Sequential()
    model.add(Dense(8, activation='relu', input_shape=(8,)))
    model.add(Dense(8, activation='relu'))
    model.add(Dropout(dropout_rate))
    model.add(Dense(1, activation='sigmoid'))
    model.compile(loss='binary_crossentropy', optimizer= 'adam', metrics=['accuracy'])
    return model

In [41]:
model = KerasClassifier(build_fn=DL_Model, epochs=100, batch_size=10, verbose=0)
# define the grid search parameters
weight_constraint = [1, 2, 3, 4, 5]
dropout_rate = [0.0, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9]
param_grid = dict(dropout_rate=dropout_rate, weight_constraint=weight_constraint)
grid = GridSearchCV(estimator=model, param_grid=param_grid, n_jobs=-1, cv=3)
grid_result = grid.fit(X_train, y_train)
# summarize results
print("Best: %f using %s" % (grid_result.best_score_, grid_result.best_params_))
means = grid_result.cv_results_['mean_test_score']
stds = grid_result.cv_results_['std_test_score']
params = grid_result.cv_results_['params']
for mean, stdev, param in zip(means, stds, params):
    print("%f (%f) with: %r" % (mean, stdev, param))

Best: 0.688716 using {'dropout_rate': 0.3, 'weight_constraint': 2}
0.663424 (0.015023) with: {'dropout_rate': 0.0, 'weight_constraint': 1}
0.678988 (0.036925) with: {'dropout_rate': 0.0, 'weight_constraint': 2}
0.678988 (0.004024) with: {'dropout_rate': 0.0, 'weight_constraint': 3}
0.680934 (0.023014) with: {'dropout_rate': 0.0, 'weight_constraint': 4}
0.647860 (0.054443) with: {'dropout_rate': 0.0, 'weight_constraint': 5}
0.663424 (0.023160) with: {'dropout_rate': 0.1, 'weight_constraint': 1}
0.638132 (0.007251) with: {'dropout_rate': 0.1, 'weight_constraint': 2}
0.673152 (0.017004) with: {'dropout_rate': 0.1, 'weight_constraint': 3}
0.657588 (0.009001) with: {'dropout_rate': 0.1, 'weight_constraint': 4}
0.657588 (0.031811) with: {'dropout_rate': 0.1, 'weight_constraint': 5}
0.638132 (0.003934) with: {'dropout_rate': 0.2, 'weight_constraint': 1}
0.643969 (0.018212) with: {'dropout_rate': 0.2, 'weight_constraint': 2}
0.667315 (0.025828) with: {'dropout_rate': 0.2, 'weight_constraint': 

In [42]:
from sklearn.metrics import accuracy_score
y_pred=grid.predict(X_test)
accuracy_score(y_test,y_pred)

0.6614173228346457