## Model Development

You are ready to create a Diabetes model, which will predict whether or not a patient has diabetes, based on medical readings. 

Import the required libraries and packages.

In [1]:
import pandas as pd

import torch
import torch.nn as nn
import torch.nn.functional as F

from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score, classification_report

Load the data into the dataframe.

In [23]:
data_file_name = './data/diabetes.csv'
data = pd.read_csv(data_file_name)

Split the data into two data frames: features (`X`) and target variable (`y`)

In [24]:
X = data.drop('Outcome', axis=1)
y = data['Outcome']

Inspect the two dataframes.

In [25]:
X.head(4)

Unnamed: 0,Pregnancies,Glucose,BloodPressure,SkinThickness,Insulin,BMI,DiabetesPedigreeFunction,Age
0,6,148,72,35,0,33.6,0.627,50
1,1,85,66,29,0,26.6,0.351,31
2,8,183,64,0,0,23.3,0.672,32
3,1,89,66,23,94,28.1,0.167,21


In [26]:
y.head(4)

0    1
1    0
2    1
3    0
Name: Outcome, dtype: int64

Divide the data into training and test datasets. 
Use the `train_test_split` method of Scikit-learn to split the dataset into random train and test subsets.

In [37]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=0)

print(f"Number of samples in training set: {X_train.shape[0]}")
print(f"Number of samples in test set: {X_test.shape[0]}")

Number of samples in training set: 614
Number of samples in test set: 154


Encode the data as tensors.

In [38]:
X_train = torch.FloatTensor(X_train.values)
X_test = torch.FloatTensor(X_test.values)
y_train = torch.LongTensor(y_train.values)
y_test = torch.LongTensor(y_test.values)

Preview the training features tensor and its shape.

In [39]:
X_train

tensor([[7.0000e+00, 1.5000e+02, 7.8000e+01,  ..., 3.5200e+01, 6.9200e-01,
         5.4000e+01],
        [4.0000e+00, 9.7000e+01, 6.0000e+01,  ..., 2.8200e+01, 4.4300e-01,
         2.2000e+01],
        [0.0000e+00, 1.6500e+02, 9.0000e+01,  ..., 5.2300e+01, 4.2700e-01,
         2.3000e+01],
        ...,
        [4.0000e+00, 9.4000e+01, 6.5000e+01,  ..., 2.4700e+01, 1.4800e-01,
         2.1000e+01],
        [1.1000e+01, 8.5000e+01, 7.4000e+01,  ..., 3.0100e+01, 3.0000e-01,
         3.5000e+01],
        [5.0000e+00, 1.3600e+02, 8.2000e+01,  ..., 0.0000e+00, 6.4000e-01,
         6.9000e+01]])

In [40]:
X_train.shape

torch.Size([614, 8])

Preview the training target value tensor (y) and its shape.

In [41]:
y_train

tensor([1, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0,
        0, 0, 1, 1, 0, 1, 1, 0, 1, 1, 1, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0,
        0, 0, 1, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 1, 1, 0, 0, 1, 0, 1, 1, 0, 0,
        1, 1, 0, 0, 0, 1, 0, 0, 1, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 0, 0, 0,
        0, 1, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 1, 1, 1, 0, 0, 0, 0,
        1, 1, 0, 1, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 1, 1, 0,
        0, 0, 0, 1, 0, 0, 1, 1, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0,
        0, 1, 0, 0, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 1, 0,
        0, 0, 1, 1, 1, 1, 0, 1, 0, 1, 1, 0, 0, 0, 1, 0, 0, 0, 1, 0, 1, 1, 0, 1,
        1, 0, 0, 0, 0, 1, 1, 0, 0, 1, 1, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 1, 0,
        1, 0, 1, 0, 1, 1, 1, 1, 0, 1, 1, 0, 0, 1, 1, 0, 0, 1, 1, 0, 0, 0, 1, 1,
        0, 0, 1, 0, 1, 0, 0, 1, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0,
        1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0,

In [42]:
y_train.shape

torch.Size([614])

## 2. Create the Model

Define a simple neural network model with PyTorch.
The network must take eight input features and output two target values, corresponding to whether a patients has diabetes or not.

In [43]:
# Seed for reproducible results
torch.manual_seed(20)

class ANN_model(nn.Module):
    def __init__(
        self, 
        num_input_features=8, 
        num_neurons_layer1=20, 
        num_neurons_layer2=10, 
        num_targets=2
    ):
        super().__init__()
        # Define the neural network layers
        self.f_connected1 = nn.Linear(num_input_features, num_neurons_layer1)
        self.f_connected2 = nn.Linear(num_neurons_layer1, num_neurons_layer2)
        self.out = nn.Linear(num_neurons_layer2, num_targets)
            
    def forward(self, X):
        # pass the data through the layers
        x = F.relu(self.f_connected1(X))
        x = F.relu(self.f_connected2(x))
        return self.out(x)

Instantiate the model and define the loss function, the optimizer for training and the epochs.

In [44]:
model = ANN_model()

# Backward Propagation - loss and optimizer
loss_function = nn.CrossEntropyLoss()  # CrossEntropyLoss also used in Tensorflow
optimizer = torch.optim.Adam(model.parameters(), lr=0.01)  # Tensorflow also uses Adam
epochs = 500

Train the model.

In [45]:
final_losses = []
for i in range(epochs):
    i = i+1
    y_pred = model.forward(X_train)
    loss = loss_function(y_pred, y_train)
    final_losses.append(loss)
    optimizer.zero_grad()
    loss.backward()
    optimizer.step()

Test the model accuracy.  We can do this by comparing the results from the test data set.

In [46]:
# Predict the labels of the test data
y_predicted = []
with torch.no_grad():
    for i, data in enumerate(X_test):
        predictions = model(data)
        y_predicted.append(predictions.argmax())

# Generate the classification report
print("Classification Report:")
print(classification_report(y_test, y_predicted))

Classification Report:
              precision    recall  f1-score   support

           0       0.84      0.85      0.85       107
           1       0.65      0.64      0.65        47

    accuracy                           0.79       154
   macro avg       0.75      0.74      0.75       154
weighted avg       0.78      0.79      0.79       154



The trained model has an accuracy value of nearly 79%.

You can improve the score by retraining the model with better data or more features or by tweaking the hyper parameters.

# Test the Model with Sample Cases
Test the model with data from two patients: one patient with diabetes and one patient without diabetes.

In [57]:
# Dict for textual display of prediction
classes = {0: 'No diabetes', 1: 'Diabetes'}


def predict(patients):
    inputs = pd.DataFrame(patients)
    # Convert input data frame to tensor
    inputs_tensor = torch.FloatTensor(inputs.values) 
    
    predictions = []
    for i, case in enumerate(inputs_tensor):
        predictions_tensor = model(case)
        # From the prediction tensor, get the index of the max value (either 0 or 1)
        prediction_index = predictions_tensor.argmax().item()
        predictions.append(classes[prediction_index])
    
    return predictions


diabetes_patient = {
    "Pregnancies": 6.0,
    "Glucose": 110.0,
    "BloodPressure": 65.0,
    "SkinThickness": 15.0,
    "Insulin": 1.0,
    "BMI": 45.7,
    "DiabetesPedigreeFunction": 0.627,
    "Age": 50
}

no_diabetes_patient = {
    "Pregnancies": 0,
    "Glucose": 88.0,
    "BloodPressure": 60.0,
    "SkinThickness": 35.0,
    "Insulin": 1.0,
    "BMI": 45.7,
    "DiabetesPedigreeFunction": 0.27,
    "Age": 20
}

predictions = predict([diabetes_patient, no_diabetes_patient])
print(predictions)

['Diabetes', 'No diabetes']
