# Heart Failure Prediction using Neural Network 

## Overview

In this project, we will use [PyTorch](https://pytorch.org), a framework for building and training neural networks, to train a simple neural network on **Heart Failure Prediction**. PyTorch in a lot of ways behaves like the arrays from Numpy. These Numpy arrays, after all, are just tensors. PyTorch takes these tensors and makes it simple to move them to GPUs for the faster processing needed when training neural networks. It also provides a module that automatically calculates gradients (for backpropagation) and another module specifically for building neural networks.

In [1]:
import os
import sys

DATA_PATH = "lib/data/"

## 1 PyTorch Basics

It turns out neural network computations are just a bunch of linear algebra operations on tensors, a generalization of matrices. A vector is a 1-dimensional tensor, a matrix is a 2-dimensional tensor, an array with three indices is a 3-dimensional tensor (RGB color images for example). The fundamental data structure for neural networks are tensors and PyTorch (as well as pretty much every other deep learning framework) is built around tensors.

In [2]:
import random
import numpy as np
import torch
import torch.nn as nn

In [3]:
# set seed
seed = 24
random.seed(seed)
np.random.seed(seed)
torch.manual_seed(seed)
os.environ["PYTHONHASHSEED"] = str(seed)

### 1.1 ReLU Implementation from scratch

In [26]:

def relu(x):

    """ 
    Implement a ReLU activation function from scratch.
    REFERENCE: https://pytorch.org/docs/stable/generated/torch.nn.ReLU.html#torch.nn.ReLU
    input
        x: torch.Tensor
    output
        relu(x): torch.Tensor
    """
    zero_tensor = torch.zeros_like(x)
    return torch.maximum(zero_tensor,x)

### 1.2 Sigmoid Implementation from scratch

In [42]:
def sigmoid(x):

    """ 
    Implement a Sigmoid activation function from scratch.
    input
        x: torch.Tensor
    output
        sigmoid(x): torch.Tensor
    """
    tensor_ones = torch.ones_like(x)
    return torch.div(tensor_ones, tensor_ones + torch.exp(-x))

### 1.3 Softmax Implementation from scratch

Note that softmax degenerates to sigmoid when we have 2 classes.

In [85]:
def softmax(x):

    """ 
    Implement a Softmax activation function from scratch.
    input
        x: torch.Tensor, 2D matrix
    output
        softmax(x): torch.Tensor, 2D matrix with sum over rows is 1
    """
    return torch.exp(x) / torch.sum(torch.exp(x), dim = 1, keepdim = True)

### 1.4 Single layer network with sigmoid

Now, let us try to use the `sigmoid` function to calculate the output for a simple single layer network.

In [35]:
# Generate some data
# Features are 5 random normal variables
features = torch.randn((1, 5))
# weights for our data, random normal variables again
weights = torch.randn_like(features)
# and a bias term
bias = torch.randn((1, 1))

Above I generated data we can use to get the output of our simple network. This is all just random for now, going forward we will start using normal data.

`features = torch.randn((1, 5))` creates a tensor with shape (1, 5), one row and five columns, that contains values randomly distributed according to the normal distribution with a mean of zero and standard deviation of one.

`weights = torch.randn_like(features)` creates another tensor with the same shape as features, again containing values from a normal distribution.

`bias = torch.randn((1, 1))` creates a single value from a normal distribution.

Next, we can use the generated data to calculate the output of this simple single layer network. Input features are `features`, weights are `weights`, and bias are `bias`. Use `sigmoid` as the activation function.

In [46]:

def single_layer_network(features, weights, bias):

    """ 
    Calculate the output of this simple single layer network.
    input
        features: torch.Tensor
        weights: torch.Tensor
        bias: torch.Tensor
    output
        output of a sinlge layer network: torch.Tensor
    """
    return sigmoid(features.matmul(weights.T) + bias)

That is how we can calculate the output for a sinlge layer. The real power of this algorithm happens when we start stacking these individual units into layers and stacks of layers, into a network of neurons. The output of one layer of neurons becomes the input for the next layer.

## 2 NN with PyTorch

Deep learning networks tend to be massive with dozens or hundreds of layers, that is where the term "deep" comes from. PyTorch has a nice module `nn` that provides a nice way to efficiently build large neural networks.

In this project, we will train a neural network to predict heart failure. The data has been processed and saved in SVMLight format under `DATA_PATH`.

In [50]:
!ls {DATA_PATH}

features_svmlight.train  features_svmlight.val


### 2.1 Load the Data

In [51]:
import utils

""" load SVMLight data """
# training data
X_train, Y_train = utils.get_data_from_svmlight(DATA_PATH + "features_svmlight.train")
# validation data
X_val, Y_val = utils.get_data_from_svmlight(DATA_PATH + "features_svmlight.val")

""" convert to torch.tensor """
X_train = torch.from_numpy(X_train.toarray()).type(torch.float)
Y_train = torch.from_numpy(Y_train).type(torch.float)
X_val = torch.from_numpy(X_val.toarray()).type(torch.float)
Y_val = torch.from_numpy(Y_val).type(torch.float)

print("X_train shape:", X_train.shape)
print("Y_train shape:", Y_train.shape)
print("X_val shape:", X_val.shape)
print("Y_val shape:", Y_val.shape)

X_train shape: torch.Size([2485, 1473])
Y_train shape: torch.Size([2485])
X_val shape: torch.Size([604, 1473])
Y_val shape: torch.Size([604])


Now, we will create a `TensorDataset` to wrap those tensors. (https://pytorch.org/docs/stable/data.html#torch.utils.data.TensorDataset)

In [52]:
from torch.utils.data import TensorDataset

train_dataset = TensorDataset(X_train, Y_train)
val_dataset = TensorDataset(X_val, Y_val)

print("Size of train_dataset:", len(train_dataset))
print("Size of val_dataset:", len(val_dataset))

#we can index train_dataset to get a (data, label) tuple
print(train_dataset[0])
print([_t.shape for _t in train_dataset[0]])

Size of train_dataset: 2485
Size of val_dataset: 604
(tensor([0., 0., 0.,  ..., 0., 0., 0.]), tensor(0.))
[torch.Size([1473]), torch.Size([])]


Next, we will load the dataset into a dataloader so that we can we can use it to loop through the dataset for training and validating. (https://pytorch.org/docs/stable/data.html#torch.utils.data.DataLoader)

In [53]:
from torch.utils.data import DataLoader

# how many samples per batch to load
batch_size = 32

# prepare dataloaders
train_loader = DataLoader(train_dataset, batch_size=batch_size, shuffle=True)
val_loader = DataLoader(val_dataset, batch_size=batch_size)

print("# of train batches:", len(train_loader))
print("# of val batchse:", len(val_loader))

# of train batches: 78
# of val batchse: 19


Note that the data loader is created with a batch size of $32$, and `shuffle=True`. 

The batch size is the number of images we get in one iteration from the data loader and pass through our network, often called a batch. 

And `shuffle=True` tells it to shuffle the dataset every time we start going through the data loader again.

In [54]:
train_iter = iter(train_loader)
x, y = next(train_iter)

print('Shape of a batch x:', x.shape)
print('Shape of a batch y:', y.shape)

Shape of a batch x: torch.Size([32, 1473])
Shape of a batch y: torch.Size([32])


### 2.2 Build the Model

Now, let us build a real NN model. For each patient, the NN model will take an input tensor of 1473-dim, and produce an output tensor of 1-dim (0 for normal, 1 for heart failure). The detailed model architecture is shown in the table below.

Layers | Configuration | Activation Function | Output Dimension (batch, feature)
--- | --- | --- | ---
fully connected | input size 1473, output size 64 | ReLU | (32, 64)
fully connected | input size 64, output size 32 | ReLU | (32, 32)
dropout | probability 0.5 | - | (32, 32)
fully connected | input size 32, output size 1 | Sigmoid | (32, 1)

In [59]:
# Build the MLP shown above using `nn.Linear`, `nn.Dropout`, `torch.relu`, `torch.sigmoid`.

class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()

        self.fc1 = nn.Linear(1473, 64)
        self.fc2 = nn.Linear(64, 32)
        self.dropout = nn.Dropout(0.5)
        self.fc3 = nn.Linear(32, 1)

    def forward(self, x):
        return torch.sigmoid(self.fc3(self.dropout(self.fc2(torch.relu(self.fc1(x))))))

# initialize the NN
model = Net()
print(model)

Net(
  (fc1): Linear(in_features=1473, out_features=64, bias=True)
  (fc2): Linear(in_features=64, out_features=32, bias=True)
  (dropout): Dropout(p=0.5, inplace=False)
  (fc3): Linear(in_features=32, out_features=1, bias=True)
)


Now that we have a network, let's see what happens when we pass in an image.

In [63]:
# Grab some data 
train_iter = iter(train_loader)
x, y = next(train_iter)

# Forward pass through the network
output = model.forward(x)

print('Input x shape:', x.shape)
print('Output shape: ', output.shape)

Input x shape: torch.Size([32, 1473])
Output shape:  torch.Size([32, 1])


### 2.3 Train the Network

In this step, we will train the NN model. 

Neural networks with non-linear activations work like universal function approximators. There is some function that maps the input to the output. The power of neural networks is that we can train them to approximate this function, and basically any function given enough data and compute time.

In [64]:
model = Net()

### Losses in PyTorch.

Let us start by seeing how we calculate the loss with PyTorch. Through the `nn.module`, PyTorch provides losses such as the binary cross-entropy loss (`nn.BCELoss`). The loss is usually assigned to `criterion`. 

As noted in the last part, with a classification problem such as Heart Failure Prediction, we are using the Sigmoid function to predict heart failure probability. With a Sigmoid output, we want to use binary cross-entropy as the loss. To actually calculate the loss, we first define the criterion then pass in the output of the network and the correct labels.

In [65]:
# Define the loss (BCELoss), assign it to `criterion`.
# REFERENCE: https://pytorch.org/docs/stable/generated/torch.nn.BCELoss.html#torch.nn.BCELoss
criterion = nn.BCELoss()

### Optimizer in PyTorch.

Optimizer can update the weights with the gradients. We can get these from PyTorch's `optim` package. For example we can use stochastic gradient descent with `optim.SGD`.

In [68]:
# Define the optimizer (SGD) with learning rate 0.01, assign it to `optimizer`.
# REFERENCE: https://pytorch.org/docs/stable/optim.html
optimizer = torch.optim.SGD(model.parameters(),lr=0.01)

Now let us train the NN model we previously created.

First, let us implement the `evaluate` function that will be called to evaluate the model performance when training.

***Note:*** For prediction, probability > 0.5 is considered class 1 otherwise class 0

In [70]:
from sklearn.metrics import *

#input: Y_score,Y_pred,Y_true
#output: accuracy, auc, precision, recall, f1-score
def classification_metrics(Y_score, Y_pred, Y_true):
    acc, auc, precision, recall, f1score = accuracy_score(Y_true, Y_pred), \
                                           roc_auc_score(Y_true, Y_score), \
                                           precision_score(Y_true, Y_pred), \
                                           recall_score(Y_true, Y_pred), \
                                           f1_score(Y_true, Y_pred)
    return acc, auc, precision, recall, f1score

#input: model, loader
def evaluate(model, loader):
    model.eval()
    all_y_true = torch.LongTensor()
    all_y_pred = torch.LongTensor()
    all_y_score = torch.FloatTensor()
    for x, y in loader:
        y_hat = model(x)
        # convert shape from [batch size, 1] to [batch size]
        y_hat = y_hat.view(y_hat.shape[0])

        # obtain the predicted class (0, 1) by comparing y_hat against 0.5, and assign the predicted class to y_pred.
        y_pred = [1 if y >= 0.5 else 0 for y in y_hat]
        y_pred = torch.from_numpy(np.asarray(y_pred))
        
        all_y_true = torch.cat((all_y_true, y.to('cpu').long()), dim=0)
        all_y_pred = torch.cat((all_y_pred,  y_pred.to('cpu').long()), dim=0)
        all_y_score = torch.cat((all_y_score,  y_hat.to('cpu')), dim=0)
        
    acc, auc, precision, recall, f1 = classification_metrics(all_y_score.detach().numpy(), 
                                                             all_y_pred.detach().numpy(), 
                                                             all_y_true.detach().numpy())
    print(f"acc: {acc:.3f}, auc: {auc:.3f}, precision: {precision:.3f}, recall: {recall:.3f}, f1: {f1:.3f}")
    return acc, auc, precision, recall, f1

In [71]:
print("model perfomance before training:")
# initialized the model
# model = Net()
auc_train_init = evaluate(model, train_loader)[1]
auc_val_init = evaluate(model, val_loader)[1]

model perfomance before training:
acc: 0.417, auc: 0.552, precision: 0.000, recall: 0.000, f1: 0.000
acc: 0.392, auc: 0.517, precision: 0.000, recall: 0.000, f1: 0.000


Steps to train the model:
- Clear the gradients of all optimized variables
- Forward pass: compute predicted outputs by passing inputs to the model
- Calculate the loss
- Backward pass: compute gradient of the loss with respect to model parameters
- Perform a single optimization step (parameter update)
- Update average training loss

In [73]:
# number of epochs to train the model
n_epochs = 100

# prep model for training
model.train()

train_loss_arr = []
for epoch in range(n_epochs):
    
    train_loss = 0
    for x, y in train_loader:
        # Step 1. clear gradients
        optimizer.zero_grad()

        # Step 2. perform forward pass using `model`, save the output to y_hat
        y_hat = model(x)
        y_hat = y_hat.view(y_hat.shape[0])
        
        # Step 3. calculate the loss using `criterion`, save the output to loss        
        loss = criterion(y_hat, y)
        
        # Step 4. backward pass
        loss.backward()
        
        # Step 5. optimization
        optimizer.step()
        
        # Step 6. record loss
        train_loss += loss.item()
        
    train_loss = train_loss / len(train_loader)
    
    if epoch % 20 == 0:
        train_loss_arr.append(np.mean(train_loss))
        print('Epoch: {} \tTraining Loss: {:.6f}'.format(epoch, train_loss))
        evaluate(model, val_loader)

Epoch: 0 	Training Loss: 0.701075
acc: 0.445, auc: 0.556, precision: 0.808, recall: 0.114, f1: 0.200
Epoch: 20 	Training Loss: 0.649222
acc: 0.614, auc: 0.683, precision: 0.612, recall: 0.997, f1: 0.759
Epoch: 40 	Training Loss: 0.547752
acc: 0.712, auc: 0.725, precision: 0.704, recall: 0.907, f1: 0.793
Epoch: 60 	Training Loss: 0.480068
acc: 0.709, auc: 0.739, precision: 0.727, recall: 0.834, f1: 0.777
Epoch: 80 	Training Loss: 0.414760
acc: 0.714, auc: 0.738, precision: 0.727, recall: 0.847, f1: 0.782
