# Neural Network Introduction

Neural networks are often applied to complex problems like image recognition, natural language processing, and more. However, they can also be used for the standard classification and regression tasks that you are used to.

This week we will be creating a classification model on the PIMA diabetes data - which you have already done in week 5. You will also be creating a regression model in the exercise.

If you find the code hard to follow, please check out the NN_Tensor_Basics.ipynb notebook which introduces nerual networks and pytorch at a slower pace.

Let's get started!

In [1]:
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

import torch
import torch.nn as nn
import torch.nn.functional as F 
import torch.optim as optim
from torch.utils.data import Dataset, DataLoader

from sklearn.preprocessing import StandardScaler    
from sklearn.model_selection import train_test_split
from sklearn.metrics import confusion_matrix, classification_report, mean_squared_error, mean_absolute_error, r2_score


In [2]:
# Data
pima = pd.read_csv('diabetes.csv')
pima.info()

y = pima['Outcome']
# We want to leave this as 0,1 values of int64 data type
pima.drop('Outcome', axis = 1, inplace = True)
X = pima

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 768 entries, 0 to 767
Data columns (total 9 columns):
 #   Column                    Non-Null Count  Dtype  
---  ------                    --------------  -----  
 0   Pregnancies               768 non-null    int64  
 1   Glucose                   768 non-null    int64  
 2   BloodPressure             768 non-null    int64  
 3   SkinThickness             768 non-null    int64  
 4   Insulin                   768 non-null    int64  
 5   BMI                       768 non-null    float64
 6   DiabetesPedigreeFunction  768 non-null    float64
 7   Age                       768 non-null    int64  
 8   Outcome                   768 non-null    int64  
dtypes: float64(2), int64(7)
memory usage: 54.1 KB


In [3]:
# Split your Data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state=69)


In [4]:
# Scale your Data
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)

In [5]:
# Set the global parameters of your model

EPOCHS = 50
BATCH_SIZE = 64
LEARNING_RATE = 0.001

In [6]:
## train data
class trainData(Dataset):
    
    def __init__(self, X_data, y_data):
        self.X_data = torch.tensor(np.array(X_data), dtype = torch.float32)
        # Note that using float32 means we don't need to change the .parameters() datatype
        self.y_data = torch.tensor(np.array(y_data), dtype = torch.float32)
        
    def __getitem__(self, index):
        # get and set are common functions to have in a class
        # get returns the values that you store, set changes them
        return self.X_data[index], self.y_data[index]
        
    def __len__ (self):
        return len(self.X_data)


train_data = trainData(X_train, y_train)

## test data    
class testData(Dataset):
    
    def __init__(self, X_data):
        self.X_data = torch.tensor(np.array(X_data), dtype = torch.float32)
        
    def __getitem__(self, index):
        return self.X_data[index]
        
    def __len__ (self):
        return len(self.X_data)
    
test_data = testData(X_test)

In [7]:
## Data Loader 

train_loader = DataLoader(dataset=train_data, batch_size=BATCH_SIZE, shuffle=True)
test_loader = DataLoader(dataset=test_data, batch_size=1)

In [8]:
class binaryClassification(nn.Module):
    def __init__(self):
        super(binaryClassification, self).__init__()
        # Number of input features is 8.
        self.layer_1 = nn.Linear(8, 64)
        # Number of outputs in one layer must == inputs in the next layer
        self.layer_2 = nn.Linear(64, 64)
        self.layer_out = nn.Linear(64, 1) 
        
        self.relu = nn.ReLU()
        # The activation function that we are going to use
        # Can put it here or call it with F.ReLU() in the forward function
        self.dropout = nn.Dropout(p=0.1)
        # This is the dropout rate
        self.batchnorm1 = nn.BatchNorm1d(64)
        # Performs mini-batch normalization 
        ## See additional resources at the bottom if curious
        self.batchnorm2 = nn.BatchNorm1d(64)
        
    def forward(self, inputs):
        # this is our forward propagation
        x = self.relu(self.layer_1(inputs))
        x = self.batchnorm1(x)
        x = self.relu(self.layer_2(x))
        x = self.batchnorm2(x)
        x = self.dropout(x)
        x = self.layer_out(x)
        # We will apply our sigmoid transformation later
        return x

In [9]:
model = binaryClassification()
# Creating an instance of our model

print(model)
criterion = nn.BCEWithLogitsLoss()
# Binary Cross Entropy Loss with LogitsLoss
# This contains sigmoid in it already so we don't need to include it in our forward layer
optimizer = optim.Adam(model.parameters(), lr=LEARNING_RATE)
# Adam optimizer

binaryClassification(
  (layer_1): Linear(in_features=8, out_features=64, bias=True)
  (layer_2): Linear(in_features=64, out_features=64, bias=True)
  (layer_out): Linear(in_features=64, out_features=1, bias=True)
  (relu): ReLU()
  (dropout): Dropout(p=0.1, inplace=False)
  (batchnorm1): BatchNorm1d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  (batchnorm2): BatchNorm1d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)


In [10]:
def binary_acc(y_pred, y_test):
    # Classifying how accurate our model is
    y_pred_tag = torch.round(torch.sigmoid(y_pred))
    # This is where we turn our output into a 0/1 prediction

    correct_results_sum = (y_pred_tag == y_test).sum().float()
    acc = correct_results_sum/y_test.shape[0]
    acc = torch.round(acc * 100)
    
    return acc

In [11]:
## Training our model

model.train()
for e in range(1, EPOCHS+1):
    # Iterate through the epochs
    epoch_loss = 0
    epoch_acc = 0
    for X_batch, y_batch in train_loader:
        # Iterate through the batches our dataloader created
        optimizer.zero_grad()
        # This removes the stored gradient
        
        y_pred = model(X_batch)
        # From forward propagation
        
        loss = criterion(y_pred, y_batch.unsqueeze(1))
        # Calculate the loss
        acc = binary_acc(y_pred, y_batch.unsqueeze(1))
        # Calculate accuracy
        
        loss.backward()
        # Perform backwards propagation
        optimizer.step()
        # Update the weights
        
        epoch_loss += loss.item()
        epoch_acc += acc.item()
        

    print(f'Epoch {e+0:03}: | Loss: {epoch_loss/len(train_loader):.5f} | Acc: {epoch_acc/len(train_loader):.3f}')


Epoch 001: | Loss: 0.63094 | Acc: 65.333
Epoch 002: | Loss: 0.51134 | Acc: 77.667
Epoch 003: | Loss: 0.51412 | Acc: 78.111
Epoch 004: | Loss: 0.54557 | Acc: 73.778
Epoch 005: | Loss: 0.45924 | Acc: 78.556
Epoch 006: | Loss: 0.51186 | Acc: 75.556
Epoch 007: | Loss: 0.47909 | Acc: 77.111
Epoch 008: | Loss: 0.43489 | Acc: 82.333
Epoch 009: | Loss: 0.47919 | Acc: 76.667
Epoch 010: | Loss: 0.46396 | Acc: 77.333
Epoch 011: | Loss: 0.42739 | Acc: 84.000
Epoch 012: | Loss: 0.40304 | Acc: 84.222
Epoch 013: | Loss: 0.43886 | Acc: 77.889
Epoch 014: | Loss: 0.43034 | Acc: 79.889
Epoch 015: | Loss: 0.44224 | Acc: 79.222
Epoch 016: | Loss: 0.40896 | Acc: 85.000
Epoch 017: | Loss: 0.38559 | Acc: 83.778
Epoch 018: | Loss: 0.44857 | Acc: 74.889
Epoch 019: | Loss: 0.36999 | Acc: 85.667
Epoch 020: | Loss: 0.37189 | Acc: 85.000
Epoch 021: | Loss: 0.35637 | Acc: 86.000
Epoch 022: | Loss: 0.33963 | Acc: 87.000
Epoch 023: | Loss: 0.37231 | Acc: 86.333
Epoch 024: | Loss: 0.45434 | Acc: 82.667
Epoch 025: | Los

In [12]:
y_pred_list = []

model.eval()
with torch.no_grad():
    for X_batch in test_loader:
        y_test_pred = model(X_batch)
        y_test_pred = torch.sigmoid(y_test_pred)
        y_pred_tag = torch.round(y_test_pred)
        y_pred_list.append(y_pred_tag.cpu().numpy())

y_pred_list = [a.squeeze().tolist() for a in y_pred_list]

In [13]:
confusion_matrix(y_test, y_pred_list)

array([[139,  25],
       [ 41,  49]])

In [14]:
print(classification_report(y_test, y_pred_list))

              precision    recall  f1-score   support

           0       0.77      0.85      0.81       164
           1       0.66      0.54      0.60        90

    accuracy                           0.74       254
   macro avg       0.72      0.70      0.70       254
weighted avg       0.73      0.74      0.73       254



## Results

So we got an accuracy of .74, a recall of .56 and a precision of .65. This is a kaggle competition and the general consensus is that [78 % is very good](https://www.kaggle.com/general/19387), there are some scores in the low 80s, while the best (unverified) results was 90.6%!

So our results are very good - but not the best it could be. We haven't tuned our model at all and it isn't a very deep model either - with the just one hidden layer.

What would you do to improve the results? The 'Basics' notebook has some suggestions. We won't try to get better results here as this is just an introduction to the structure of a network, but it is worth thinking about. I played around with the number of epochs to see if I could get an easy boost and actually got a worse accuracy with 500 epochs - meaning that I overfit and should either increase the drop out rate or leave epochs at 50 and tune other parameters.

# Your Turn

We just made a classification model that was a more complex version of a logistic regression. Your task is to make a network that is more complex version of a linear regression to predict the BMI variable in the PIMA dataset. The data has not been collected with BMI prediction in mind - so we shouldn't expect good results. But it is a great exercise to see if you can adapt the code above to generate a numeric prediction!

You should
- have at least 3 layers
- use the train_data and test_data classes from above (no need to rewrite them)
- print the mean squared error and loss for each epoch. Printing out the MSE could be a little tricky so start out with just the loss.
- print the root mean squared error (so it is on the same scale as bmi), mean absolute error and r2 for the test set
- feel free to change parameters (you will need to change at least one) 


Good luck!

In [45]:
# Data

pima = pd.read_csv('diabetes.csv')
pima.info()

y = pima['BMI']
pima.drop('BMI', axis = 1, inplace = True)
X = pima

In [44]:
## Our new network


In [43]:
## Error metrics


#### How did we do?

No very good - as expected. Our RMSE and MAE suggest that we are 12-14 BMI points away from the truth on average. 'Healthy' BMI is 18-25 (7 BMI points) while overweight is 25-30 (5 BMI points) - so being 12-14 off on average will predict people in the wrong category a majority of the time. Our r2 score is also negative - meaning it predicts worse than a horizontal line.

Some tuning probably won't help us out  here - as the low r2 indicates that perhaps the data is just not good enough to generate a strong prediction. This would be an indication to abandon this method and to try out some classical methods to determine whether the method doesn't fit the data or whether the data is just substandard.

## Additional Resources

[Batch Normalization](https://machinelearningmastery.com/batch-normalization-for-training-of-deep-neural-networks/)

[Data Loaders](https://stanford.edu/~shervine/blog/pytorch-how-to-generate-data-parallel)