## Model Development

We are ready to create a Diabetes model (using PyTorch) which will predict whether or not a patient has diabetes based on current medical readings. 

To start we will need to import our required libraries and packages.  We will load our diabetes data set, create test and training sets and then start developing our model.

In [8]:
# Diabetes model using PyTorch
# Uses the data file:  diabetes.csv
import pandas as pd
import torch
import torch.nn as nn
import torch.nn.functional as F

from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

After we have imported our required lbraries and packages we load the data into a dataframe (df).

In [9]:
#This code assumes that the file is in a data folder up one dir from this file.
data_file_name = '../data/diabetes.csv'
model_name = '../model/PytorchDiabetesModel.pt'

df = pd.read_csv(data_file_name)
X = df.drop('Outcome' , axis = 1) #independent Feature
y = df['Outcome']                 #dependent Feature

Before we can train the model, we need to divide the data into 'training' and 'testing' datasets.  We will use sklearn's train_test_split method to split the dataset into random train and test subsets.

Once we have done this, we create tensors.  Tensors are specialized data structures that are similar to arrays and matrices.  In PyTorch, we use tensors to encode the inputs and outputs of a model, as well as the model's parameters.  Below we are initializing the tensors directly from the data.

In [10]:
X_train,X_test,y_train,y_test = train_test_split(X,y , test_size =0.2,random_state=0)

# Creating Tensors (multidimensional matrix) x-input data  y-output data
X_train = torch.FloatTensor(X_train.values)
X_test = torch.FloatTensor(X_test.values)
y_train = torch.LongTensor(y_train.values)
y_test = torch.LongTensor(y_test.values)

Now we can create our model.  We will need to eventually create a python file to house our model and api code.  Therefore let's put our model into a class called "ANN_model" which we can re-use later in our Python (.py) file.

In [11]:
class ANN_model(nn.Module):
    def __init__(self,input_features=8,hidden1=20, hidden2=10,out_features=2):
        super().__init__()
        self.f_connected1 = nn.Linear(input_features,hidden1)
        self.f_connected2 = nn.Linear(hidden1,hidden2)
        self.out          = nn.Linear(hidden2,out_features)
        
    def forward(self,x):
        x = F.relu(self.f_connected1(x))
        x = F.relu(self.f_connected2(x))
        x = self.out(x)
        return x

    def save(self, model_path):
        torch.save(model.state_dict(), model_path)

    def load(self, model_path):
        self.load_state_dict(torch.load(model_path))
        self.eval()

torch.manual_seed(20)
model = ANN_model()


# Backward Propagation - loss and optimizer
loss_function = nn.CrossEntropyLoss()   #CrossEntropyLoss also used in Tensorflow
optimizer = torch.optim.Adam(model.parameters(),lr=0.01)  #note Tensorflow also uses Adam

epochs        =500
final_losses  =[]
for i in range(epochs):
    i      = i+1
    y_pred = model.forward(X_train)
    loss   = loss_function(y_pred,y_train)
    final_losses.append(loss)
    optimizer.zero_grad()
    loss.backward()
    optimizer.step()

Once our model is created we should test the model's accuracy.  We can do this by comparing the results from the test data set.

In [12]:
#Accuracy - comparing the results from the test data

predictions = []
with torch.no_grad():
    for i,data in enumerate(X_test):
        y_pred = model(data)
        predictions.append(y_pred.argmax())
        
score = accuracy_score(y_test , predictions)  # Simply calculates number of hits / length of y_test
print(score)

0.7922077922077922


We see that our model has an accuracy of nearly 80%  This is a pretty decent score. We can improve our score by retraining the model with better data or more features or by tweaking the hyper parameters. We will see that in the next chapter.

Now that we have built a model, let's test it with some data from 2 patients:  one patient with diabetes and one patient without diabetes.

To make our model testing easier, let's create a prediction function.

In [13]:
#create prediction function

def predict(dataset):
    predict_data = dataset
    predict_data_tensor = torch.tensor(predict_data)      #Convert input array to tensor
    prediction_value    = model(predict_data_tensor)  # This is a tensor

    # Dict for textual display of prediction
    outcomes            = {0: 'No diabetes',1:'Diabetes Predicted'}

    # From the prediction tensor, get the index of the max value ( Either 0 or 1)
    prediction_index   = prediction_value.argmax().item()
    prediction = outcomes[prediction_index]
    return prediction

#test our prediction function
# Pregnancies, Glucose, BloodPressure, SkinThickness, Insulin, BMI, DiabetesPedigreeFunction, Age
dataset = [6.0, 110.0, 65.0, 15.0, 1.0, 45.7, 0.627, 50.0] #has diabetes
dataset_nb = [0, 88.0, 60.0, 35.0, 1.0, 45.7, 0.27, 20.0] #no diabetes

diabetes_prediction = predict(dataset)
print(diabetes_prediction)

no_diabetes_prediction = predict(dataset_nb)
print(no_diabetes_prediction)

Diabetes Predicted
No diabetes


Now that we are satisfied with the accuracy of the model and the predictions that it is able to make, we can save the model for using it in our applications.  The following python code saves the model locally to the specified path

In [None]:
# save model
# for more information on saving and loading PyTorch models: https://pytorch.org/tutorials/beginner/saving_loading_models.html
# we are saving the model as a 'pt'.  Another file format we could use is a Pickle file.  Following article describes this process
# https://machinelearningmastery.com/save-load-machine-learning-models-python-scikit-learn/

# save model
model.save(model_name)