### Create a model which will predict if the patient is suffering from diabetes or not.

In [12]:
import pandas as pd

In [13]:
train_df = pd.read_csv("data/diabetes_data.csv")

train_df.head()

Unnamed: 0,pregnancies,glucose,diastolic,triceps,insulin,bmi,dpf,age,diabetes
0,6,148,72,35,0,33.6,0.627,50,1
1,1,85,66,29,0,26.6,0.351,31,0
2,8,183,64,0,0,23.3,0.672,32,1
3,1,89,66,23,94,28.1,0.167,21,0
4,0,137,40,35,168,43.1,2.288,33,1


In [14]:
train_X = train_df.drop(columns=['diabetes'])

In [15]:
train_X.head()

Unnamed: 0,pregnancies,glucose,diastolic,triceps,insulin,bmi,dpf,age
0,6,148,72,35,0,33.6,0.627,50
1,1,85,66,29,0,26.6,0.351,31
2,8,183,64,0,0,23.3,0.672,32
3,1,89,66,23,94,28.1,0.167,21
4,0,137,40,35,168,43.1,2.288,33


Since we are removing the target column i.e `diabetes` from the dataframe, we will call ‘to_categorical()’ function from the keras.utils so that column will be ‘one-hot encoded’.

Before one-hot encoding,
0 -> Patient with no diabetes
1 -> Patient with diabetes

With one-hot encoding, the above integer will be removed and a binary variable is input for no diabetes and diabetes.
[1 0] -> Patient with no diabetes
[0 1] -> Patient with diabetes

In [16]:
from tensorflow.keras.utils import to_categorical

# Perform to_categorical on the target column
train_y = to_categorical(train_df.diabetes)

In [17]:
train_y[0:5]

array([[0., 1.],
       [1., 0.],
       [0., 1.],
       [1., 0.],
       [0., 1.]], dtype=float32)

In [18]:
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense

# Create a model using sequential API
model = Sequential()

# Get the number of columns in the training data
n_cols = train_X.shape[1]

# Adding layers to the model
model.add(Dense(250, activation='relu', input_shape=(n_cols,)))
model.add(Dense(250, activation='relu'))
model.add(Dense(250, activation='relu'))
model.add(Dense(2, activation='softmax'))

The last layer has two nodes:
Node1 -> Patients having diabetes
Node2 -> Patients not having diabetes

In [19]:
# Compile model using accuracy to measure model performance
model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])

‘categorical_crossentropy’ as loss function. Lower score = better performance

In [20]:
from tensorflow.keras.callbacks import EarlyStopping

early_stopping_monitor = EarlyStopping(patience=3)

# Train model
model.fit(train_X, train_y, epochs=30, validation_split=0.2, callbacks=[early_stopping_monitor])

Train on 614 samples, validate on 154 samples
Epoch 1/30
Epoch 2/30
Epoch 3/30
Epoch 4/30
Epoch 5/30
Epoch 6/30
Epoch 7/30
Epoch 8/30
Epoch 9/30
Epoch 10/30
Epoch 11/30
Epoch 12/30


<tensorflow.python.keras.callbacks.History at 0x7fde6c4d7a50>

In [21]:
import numpy as np

test_X = np.array([[0,88,86,20,0,34.6,0.37100000000000003,37]])

test_y = model.predict(test_X)
print(test_y)

[[0.75040454 0.24959546]]


In [22]:
# To save the trained model in the disk
model.save('keras_model.h5')