In [17]:
import numpy as np #this is the math component

#pytorch is library that helps develop deep learning models
import torch
import torch.nn as nn 
import torch.optim as optim

Data is the Pima Indians onset of diabetes which describes patient medical record for Pima Indians and whether or not they had an onset of diabetes within 5 years

# Loading Data

This is a binary classification problem (either 0 or 1). There are 8 input variables:

1. Number of times pregnant
2. Plasma glucose concentration at 2 hours in an oral glucose tolerance test
3. Diastolic blood pressure (mm Hg)
4. Triceps skin fold thickness (mm)
5. 2-hour serum insulin (μIU/ml)
6. Body mass index (weight in kg/(height in m)2)
7. Diabetes pedigree function
8. Age (years)


Note that the data that I am using is already cleaned. Usually, you would take dataset and use feature engineering to make it usuable

In [18]:
#load the dataset, split into input (X) and output (Y) variables
dataset = np.loadtxt('pima-indians-diabetes.data.csv', delimiter= ',')
X = dataset[:,0:8]
Y = dataset[:,8]

#data should be converted to pytorch tensors first. Helps avoid the implicit conversion between 64-bit and 32-bit floating point

X = torch.tensor(X, dtype=torch.float32)
Y = torch.tensor(Y, dtype=torch.float32).reshape(-1,1)




# Define the Model

A model can be defined as a sequence of layers. You create a SEQUENTIAL model with the layers listed out. First thing to do is to ensure that the first layer has the correct number of input features (8 for this example). Often, the best NN structure is found through trial and error. In this example, will be using 3 layers. 

Fully connected layers are defined using the LINEAR class in pytorch. Simply means an operation similar to matrix multiplication. Specifiy the number of inputs as the first argument and the number of outputs as the second argument. Number of outputs are called the number of neurons or number of nodes in the layer.

Need activation function after the layer. If not provided, you take output of the matrix multiplication to next step, or sometimes call it using linear activation. In this example, we will use ReLU (rectified linear unit activation function) on the first two layers and the sigmoid function in the output layer.

A sigmoid on the output layer ensures the output is between 0 and 1. Note that using sigmoid can lead to the problem of vanishing gradient in deep NN, and ReLU activation is found to provide better performance in terms of speed and accuracy.

- The model expects rows of data with 8 variables
- The first hidden layer has 12 neurons, followed by ReLU function
- The second hidden layer has 8 neurons, followed by ReLU function
- Output layer has 1 neuron, followed by sigmoid function

In [19]:
# model = nn.Sequential(
#     nn.Linear(8,12),
#     nn.ReLU(),
#     nn.Linear(12,8),
#     nn.ReLU(),
#     nn.Linear(8,1),
#     nn.Sigmoid()
# )

# print(model)
class PimaClassifier(nn.Module):
    def __init__(self):
        super().__init__()
        self.hidden1 = nn.Linear(8, 12)
        self.act1 = nn.ReLU()
        self.hidden2 = nn.Linear(12, 8)
        self.act2 = nn.ReLU()
        self.output = nn.Linear(8, 1)
        self.act_output = nn.Sigmoid()
 
    def forward(self, x):
        x = self.act1(self.hidden1(x))
        x = self.act2(self.hidden2(x))
        x = self.act_output(self.output(x))
        return x
 
model = PimaClassifier()
print(model)

PimaClassifier(
  (hidden1): Linear(in_features=8, out_features=12, bias=True)
  (act1): ReLU()
  (hidden2): Linear(in_features=12, out_features=8, bias=True)
  (act2): ReLU()
  (output): Linear(in_features=8, out_features=1, bias=True)
  (act_output): Sigmoid()
)


# Preparation for Training

A defined model is ready, but still need to specify what the goal of the training is. Training a NN means finding the best set of weights to map inputs to outputs in your dataset. The loss function is the metric to measure predictions. In this example, since the project is a binary classification, we will use binary cross entropy. 

You also need an optimizer, the algorithm you use to adjust the model weights to produce better outputs. We will use Adam, which is a popular version of gradient descent and can automatically tune itself

In [20]:
loss_fn = nn.BCELoss() #binary cross entropy
optimizer = optim.Adam(model.parameters(), lr=0.001)

#lr is the learning rate which is a config parameter for the optimizer. You pass it on model.parameters which is a generator of all parameters from the model created

# Training a Model

Training a NN takes in epochs and batches

EPOCH - Passes the entire training dataset to the model once

BATCH - one or more samples passed to the model, from which the gradient descent algorithm will be executed for one iteration

The entire dataset is split into batches and you pass the batches one by one into a model using a training loop. Once you have exhausted all batches, you have finished 1 epoch. Then you can start over and refine the model.

The size of the batch is limited by the system's memory. The number of computations required is linearly proportional to the size of the batch. The total number of batches over many epochs is how many times you run the GD to refine the model. Note that it is a tradeoff that you want more iterations for the GD so you can produced a better model, but at the same time, you do not want the training to take too long to complete.

The goal of training a model is to ensure it learns a good enough mappying of input data to output classification. It won't be perfect. You will see the amount of error reducing when it the later epochs, it will level out, known as model convergence.

Simplest way is to use 2 nested loops, one for epochs and one for batches

In [21]:
n_epochs = 100
batch_size = 10


for epoch in range(n_epochs):
    for i in range(0, len(X), batch_size):
        Xbatch = X[i:i+batch_size]
        y_pred = model(Xbatch)
        Ybatch = Y[i:i+batch_size]
        loss = loss_fn(y_pred, Ybatch)
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()
    print(f'Finished epoch {epoch}, latest loss {loss}')

Finished epoch 0, latest loss 0.6136674284934998
Finished epoch 1, latest loss 0.5868992209434509
Finished epoch 2, latest loss 0.5670740008354187
Finished epoch 3, latest loss 0.5466744303703308
Finished epoch 4, latest loss 0.5284450650215149
Finished epoch 5, latest loss 0.508657693862915
Finished epoch 6, latest loss 0.48628512024879456
Finished epoch 7, latest loss 0.45772385597229004
Finished epoch 8, latest loss 0.4490693211555481
Finished epoch 9, latest loss 0.4439621567726135
Finished epoch 10, latest loss 0.4399668574333191
Finished epoch 11, latest loss 0.4279786944389343
Finished epoch 12, latest loss 0.42225590348243713
Finished epoch 13, latest loss 0.4138154685497284
Finished epoch 14, latest loss 0.41570693254470825
Finished epoch 15, latest loss 0.4139489531517029
Finished epoch 16, latest loss 0.4115311801433563
Finished epoch 17, latest loss 0.4028107523918152
Finished epoch 18, latest loss 0.41689756512641907
Finished epoch 19, latest loss 0.39571043848991394
Finis

# Evaluate the Model

We have trained the NN on the entire dataset. Now, we can evaulate the performance of the network on the same dataset. This doesn't tell you how well the algorithm might perform on new data, only the modeled dataset. You could seperate your data in train and test datasets for that purpose. 

You will generate predictions for each input, but then you still need to compute a score for the evaluation. 

In [22]:
#compute accuracy 
with torch.no_grad():
    y_pred = model(X)

accuracy = (y_pred.round() == Y).float().mean()
print(f'Accuracy {accuracy}')

#the round function rounds off the floating point to nearest integer.
#== comparess and returns Boolean tensor
#mean() provides the count of the number of 1's (pred matches label) divided by number of samples
#no_grad() context is optional but suggested, so you relieve y_pred from remembering how it comes up with the number

Accuracy 0.7669270634651184
