# **CSE6521(AU22): PyTorch Tutorial**

### Author: [Zexin (Jason) Xu](https://asonjay.github.io)

This notebook will serve as a basic introduction to PyTorch. This notebook will include 3 sessions: tensor, neural network, and a sample NLP task. After finishing this notebook, you will be able to build a fully-connected feed-forward neural network with PyTorch.

### Credit
* ["Word Window Classification" tutorial notebook](https://web.stanford.edu/class/archive/cs/cs224n/cs224n.1204/materials/ww_classifier.ipynb) by Matt Lamm, from Winter 2020 offering of CS224N
* CSE224N: PyTorch Tutorial (Winter '22) by Dilara Soylu, Ethan Chi
* Official PyTorch Documentation on [Deep Learning with PyTorch: A 60 Minute Blitz](https://pytorch.org/tutorials/beginner/deep_learning_60min_blitz.html) by Soumith Chintala
* PyTorch Tutorial Notebook, [Build Basic Generative Adversarial Networks (GANs) | Coursera](https://www.coursera.org/learn/build-basic-generative-adversarial-networks-gans) by Sharon Zhou, offered on Coursera

 Thanks ``Dr. Yu Su`` for his precious feedback!

### Content
0. Packages
1. Tensors
    - Create a tensor
    - Data Type
    - Conversion with NumPy array
    - Initialize a tensor
    - Tensor view/reshape
    - Matrix operations
    - Vectorized operations
    - Indexing
2. Neural Network
    - Layers
        - Linear layer
        - Activation layer
    - Pile them up!
    - Create your own nn.Modules!
    - Basics: backward() and "grad"
    - Optimization
    - loss functions
3. Demo - Heart Disease Indicators
    - Collecting data
    - Train/valid/test split
    - Batch/dataloader
    - Feed-forward neural network
    - Training
    - Evaluation & confusion matrix
    - What is wrong?
4. What is next?

# Part 0: Packages 

[PyTorch](https://pytorch.org/) is an open source machine learning framework. Other popular framework on the market is [TensorFlow](https://www.tensorflow.org/). Feel free to chek it out.

First off, let's import packages.

In [2]:
import torch
import torch.nn as nn
import numpy as np
import pandas as pd

# Part I: Tensors
In this session, we will learn the basics about tensors, along with some basic manipulations about it. In short, tensor is a multi-dimensional matrix. And it is the most basic and important component of PyTorch framework. 

Without future ado, let's learn about tensors!

### Creating a tensor
Tensor can be created with **array** during initialization. Here, we are creating a `3x2` tensor:

In [3]:
tensor = torch.tensor([[0, 1], 
                       [2, 3], 
                       [4, 5]])
tensor

tensor([[0, 1],
        [2, 3],
        [4, 5]])

### Data Type
Each Tensor can have [**data types**](https://pytorch.org/docs/stable/tensors.html). You can specify the `dtype` attribute while creating the tensor. 

In [4]:
# Create a tensor with np.array
np_arr = np.array([[0, 1],
                   [2, 3],
                   [4, 5]])
tensor = torch.tensor(np_arr)
print(tensor)
print('-' * 25)

# Create a tensor with given data type
tensor = torch.tensor([[0, 1],
                       [2, 3],
                       [4, 5]], dtype=torch.float32)
print(tensor)
print('-' * 25)

# Create a tensor with long type (Tensor)
tensor = torch.LongTensor(np_arr)
print(tensor)


tensor([[0, 1],
        [2, 3],
        [4, 5]], dtype=torch.int32)
-------------------------
tensor([[0., 1.],
        [2., 3.],
        [4., 5.]])
-------------------------
tensor([[0, 1],
        [2, 3],
        [4, 5]])


### Conversion with NumPy array
Note that when creating a tensor, you can also convert a **NumPy array** to a tensor. And of course, you can do it other way around.

In [5]:
# Create a tensor with from_numpy function
tensor = torch.from_numpy(np.array([[0, 1], 
                                    [2, 3], 
                                    [4, 5]]))
print(tensor)
print('-' * 25)

# Convert a tensor to numpy array
np_arr = tensor.numpy()
print(np_arr)
print(type(np_arr))

tensor([[0, 1],
        [2, 3],
        [4, 5]], dtype=torch.int32)
-------------------------
[[0 1]
 [2 3]
 [4 5]]
<class 'numpy.ndarray'>


### Initialze a tensor
There are more ways to initialize a tensor. Below are multiple ways of doing so.

In [6]:
# Initialize a tensor with zeros
zeros = torch.zeros(3, 2)
print(zeros) 
print('-' * 25)

# Initialize a tensor with ones
ones = torch.ones(3, 2) 
print(ones)
print('-' * 25)

# Initialize a tensor with random values
# By default, torch.rand() returns a tensor with floating points values in [0, 1)
randoms = torch.rand(3, 2)  
print(randoms)
# Initialize a tensor with random values in [2, 12)
randoms1 = torch.rand(3, 2) * 10 + 2
print(randoms1)
print('-' * 25)

# Initialize a tensor with empty value
# Even though you can still print the empty tensor, but note that torch.empty() returns # a tensor filled with uninitialized data
empties = torch.empty(3, 2)
print(empties) 

tensor([[0., 0.],
        [0., 0.],
        [0., 0.]])
-------------------------
tensor([[1., 1.],
        [1., 1.],
        [1., 1.]])
-------------------------
tensor([[0.8569, 0.3636],
        [0.2975, 0.5786],
        [0.0524, 0.0527]])
tensor([[ 2.1447,  4.5004],
        [11.8152,  6.1781],
        [ 9.2662,  8.8248]])
-------------------------
tensor([[0., 0.],
        [0., 0.],
        [0., 0.]])


### Tensor view/reshape
Since tensor is a multi-dimensional matrix, PyTorch provides couple utilities to view/manipulate its size.

In [7]:
# Get the shape of a tensor
tensor = torch.tensor([[0, 1], 
                       [2, 3], 
                       [4, 5]])
print(tensor.shape)
print('-' * 25)

# Reshape a tensor
tensor = torch.tensor([[0, 1], 
                       [2, 3], 
                       [4, 5]])
tensor_reshape = tensor.reshape(2, 3)
print(tensor_reshape)
print(tensor_reshape.shape)
print(tensor.shape)

# --- Difference between view and reshape ---
# view() will try to change the shape of the tensor while keeping the underlying data 
# allocation the same, thus data will be shared between the two tensors. reshape() will # create a new underlying memory allocation if necessary. Thus, view() will require 
# your operation to be contiguous, while reshape() will not.


torch.Size([3, 2])
-------------------------
tensor([[0, 1, 2],
        [3, 4, 5]])
torch.Size([2, 3])
torch.Size([3, 2])


### Matrix operations
Tensor can perform matrix-like operations as well, such as element-wise addition/multiplication, and matrix multiplication, etc.

In [8]:
# Element-wise addition/multiplication
a = torch.tensor([[0, 1],
                  [2, 3],
                  [4, 5]])
b = torch.tensor([[6, 7],
                  [8, 9],
                  [10, 11]])
print(a + b)
print('-' * 25)
print(a * b)
print('-' * 25)

# Matrix multiplication
c = torch.tensor([[0, 1, 2],
                  [3, 4, 5]])
print(a.matmul(c))
print(a @ c) # Other way to do matrix multiplication

tensor([[ 6,  8],
        [10, 12],
        [14, 16]])
-------------------------
tensor([[ 0,  7],
        [16, 27],
        [40, 55]])
-------------------------
tensor([[ 3,  4,  5],
        [ 9, 14, 19],
        [15, 24, 33]])
tensor([[ 3,  4,  5],
        [ 9, 14, 19],
        [15, 24, 33]])


### Vectorized operations
Like matrix, tensor can perform **vectorized operations**: operations that be conducted in parallel over a particular dimension of a tensor. 

In [9]:
tensor1 = torch.arange(0, 12, dtype=torch.float32).reshape(3, 4)
print(tensor1)
print('-' * 25)
# sum over different dimensions
print(tensor1.sum(dim=0))
print(tensor1.sum(dim=1))
print('-' * 25)

tensor2 = torch.arange(0, 12, dtype=torch.float32).reshape(3, 2, 2)
print(tensor2)
print('-' * 25)
# sum over different dimensions
print(tensor2.sum(dim=0))
print(tensor2.sum(dim=1))
print(tensor2.sum(dim=2))
print('-' * 25)

# Other operations (std, mean, max, min, argmax, argmin, etc.)
print(tensor2.mean(dim=0))

tensor([[ 0.,  1.,  2.,  3.],
        [ 4.,  5.,  6.,  7.],
        [ 8.,  9., 10., 11.]])
-------------------------
tensor([12., 15., 18., 21.])
tensor([ 6., 22., 38.])
-------------------------
tensor([[[ 0.,  1.],
         [ 2.,  3.]],

        [[ 4.,  5.],
         [ 6.,  7.]],

        [[ 8.,  9.],
         [10., 11.]]])
-------------------------
tensor([[12., 15.],
        [18., 21.]])
tensor([[ 2.,  4.],
        [10., 12.],
        [18., 20.]])
tensor([[ 1.,  5.],
        [ 9., 13.],
        [17., 21.]])
-------------------------
tensor([[4., 5.],
        [6., 7.]])


### Indexing
Tensor elements can be accessed with `[]` like array. In the meanwhile, you can also pass a tensor in `[]`. Note that that data type of index tensor should be a int/long.

In [10]:
tensor = torch.arange(0, 24, dtype=torch.float32).reshape(4, 3, 2)
print(tensor)
print('-' * 25)

print(tensor[0])
print(tensor[0, 1])
print(tensor[0, 1, 1])
print('-' * 25)

print(tensor[[1, 2], 1]) # Same as tensor[[1:3, 1]]
print('-' * 25)

tensor([[[ 0.,  1.],
         [ 2.,  3.],
         [ 4.,  5.]],

        [[ 6.,  7.],
         [ 8.,  9.],
         [10., 11.]],

        [[12., 13.],
         [14., 15.],
         [16., 17.]],

        [[18., 19.],
         [20., 21.],
         [22., 23.]]])
-------------------------
tensor([[0., 1.],
        [2., 3.],
        [4., 5.]])
tensor([2., 3.])
tensor(3.)
-------------------------
tensor([[ 8.,  9.],
        [14., 15.]])
-------------------------


In [11]:
# Indexing with a tensor
tensor = torch.arange(0, 12, dtype=torch.float32).reshape(3, 2, 2)
print(tensor)
print('-' * 25)
index = torch.tensor([0, 1, 0]) # Reprint 0th row two times
print(tensor[index])

tensor([[[ 0.,  1.],
         [ 2.,  3.]],

        [[ 4.,  5.],
         [ 6.,  7.]],

        [[ 8.,  9.],
         [10., 11.]]])
-------------------------
tensor([[[0., 1.],
         [2., 3.]],

        [[4., 5.],
         [6., 7.]],

        [[0., 1.],
         [2., 3.]]])


# Part II: Neural Network (nn.module)

In this session, we will look at some basic components to build a simple neural network from scractch using PyTorch. 

Layers are important during forward propagation. In this tutorial I will introduce some basic layers (nn.linear() and some activation functions). We are going to building our layers with `nn.sequential()`.

In the meanwhile, calculating gradients and back propagates the loss are significant when training our model. `backwards()`, **gradients, optimizer, and loss functions**  are the components in PyTorch to realize this.

After this session, you will develop a basic understanding of components that PyTorch offers to help us build our own neural network. Let's dive in!

## Layers
PyTorch provides a tons of [layers](https://pytorch.org/docs/stable/nn.html#). In this tutorial, we will mainly talk about building a neural network with linear layers.

### Linear Layer
Applies a linear transformation to the incoming data: 
y = xA^T + b (x = Input, y = Output, A = weight, b = bias)

Input: (∗,H_in) where ∗ means any number of dimensions including none and H_in = in_features.

Output: (∗,H_out) where all but the last dimension are the same shape as the input and H_out = out_features.



In [14]:
inp = torch.ones(2, 3, 4)
print(inp)
print('-' * 25)

linear = nn.Linear(4, 3)
print(linear(inp))
print('-' * 25)
# Check closely, we are transforming the input from 2x3x4 to 2x3x3
# nn.Linear applies a linear transformation to the last dimension of the input

print(list(linear.named_parameters())) # The weights and bias are randomly initialized

tensor([[[1., 1., 1., 1.],
         [1., 1., 1., 1.],
         [1., 1., 1., 1.]],

        [[1., 1., 1., 1.],
         [1., 1., 1., 1.],
         [1., 1., 1., 1.]]])
-------------------------
tensor([[[ 0.7863,  0.3632, -0.1071],
         [ 0.7863,  0.3632, -0.1071],
         [ 0.7863,  0.3632, -0.1071]],

        [[ 0.7863,  0.3632, -0.1071],
         [ 0.7863,  0.3632, -0.1071],
         [ 0.7863,  0.3632, -0.1071]]], grad_fn=<ViewBackward0>)
-------------------------
[('weight', Parameter containing:
tensor([[ 0.1067,  0.0534,  0.4368, -0.0046],
        [ 0.1344, -0.1546,  0.3087, -0.3330],
        [ 0.4091, -0.2223, -0.4721, -0.2260]], requires_grad=True)), ('bias', Parameter containing:
tensor([0.1940, 0.4076, 0.4042], requires_grad=True))]


### Activation Layer
Activation functions are used to add non-linearity to our network. Activation functions will not change the dimension of the input, thus it somewhat "activates" the input value by scaling it up or down. We are going to try out some basic functions as shown below:
1. Sigmoid
2. ReLU
3. Tanh
4. Softmax / LogSoftmax

In [164]:
sigmoid = nn.Sigmoid()
ReLU = nn.ReLU()
tanh = nn.Tanh()
softmax = nn.Softmax(dim=0)

tensor = torch.rand(2, 3)
print(tensor)
print('-' * 25)

# Apply different activation functions to same tensor
# Check how the values are transformed with different activation functions
out = sigmoid(tensor)
print(out)
print('-' * 25)
out = ReLU(tensor)
print(out)
print('-' * 25)
out = tanh(tensor)
print(out)
print('-' * 25)
out = softmax(tensor)
print(out)

tensor([[0.3810, 0.0616, 0.8571],
        [0.2484, 0.9643, 0.5612]])
-------------------------
tensor([[0.5941, 0.5154, 0.7020],
        [0.5618, 0.7240, 0.6367]])
-------------------------
tensor([[0.3810, 0.0616, 0.8571],
        [0.2484, 0.9643, 0.5612]])
-------------------------
tensor([[0.3636, 0.0615, 0.6947],
        [0.2434, 0.7462, 0.5089]])
-------------------------
tensor([[0.5331, 0.2885, 0.5734],
        [0.4669, 0.7115, 0.4266]])


## Pile them up!
To build a neural network, we will need multiple layers. And here is how we do it.

In [22]:
inp = torch.rand(2, 4, 6)

fc1 = nn.Linear(6, 4) 
ac1 = nn.Sigmoid()
fc2 = nn.Linear(4, 2)
ac2 = nn.ReLU()
fc3 = nn.Linear(2, 2)
softmax = nn.Softmax(dim=0)

out = fc1(inp)
out = ac1(out)
out = fc2(out)
out = ac2(out)
out = fc3(out)
out = softmax(out)
print(out)

tensor([[[0.4980, 0.4933],
         [0.5055, 0.4937],
         [0.5017, 0.4970],
         [0.5021, 0.4943]],

        [[0.5020, 0.5067],
         [0.4945, 0.5063],
         [0.4983, 0.5030],
         [0.4979, 0.5057]]], grad_fn=<SoftmaxBackward0>)


This is so redundant. Luckily, PyTorch provides us with a more compact module to pile our layers up!

In [166]:
inp = torch.rand(2, 4, 6)

model = nn.Sequential(
    nn.Linear(6, 4),
    nn.Sigmoid(),
    nn.Linear(4, 2),
    nn.ReLU(),
    nn.Linear(2, 2),
    nn.Softmax(dim=1)
)

print(model(inp)) # nice and clean :)

tensor([[[0.2472, 0.2518],
         [0.2509, 0.2516],
         [0.2502, 0.2478],
         [0.2517, 0.2488]],

        [[0.2495, 0.2488],
         [0.2503, 0.2486],
         [0.2463, 0.2528],
         [0.2539, 0.2498]]], grad_fn=<SoftmaxBackward0>)


## Create your own nn.Modules!
So far you have equipped with basic knowledge to build your own neural network class! Here is what basic structure looks like:

In [167]:
class FFNN(nn.Module):

  def __init__(self, inp, hid, out):
    # Call to the __init__ function of the super class
    super(FFNN, self).__init__()
    # Init parameters 
    self.input_size = inp
    self.hidden_size = hid
    self.output_size = out
    # Building models
    self.model = nn.Sequential(
        nn.Linear(self.input_size, self.hidden_size),
        nn.ReLU(),
        nn.Linear(self.hidden_size, self.output_size),
        nn.Sigmoid()
    )
    
  def forward(self, x): 
    return self.model(x)
  
# Let's test it with some random input
input = torch.rand(2, 4, 6)

ffnn = FFNN(inp=6, 
            hid=4, 
            out=2)

print(ffnn(input)) # same as "print(ffnn.forward(input))"

tensor([[[0.5382, 0.4697],
         [0.5660, 0.5021],
         [0.5391, 0.4662],
         [0.4726, 0.4748]],

        [[0.5094, 0.4699],
         [0.5244, 0.4723],
         [0.4983, 0.4488],
         [0.5111, 0.4967]]], grad_fn=<SigmoidBackward0>)


In [168]:
print(list(ffnn.named_parameters())) # Check parameters
# One thing to note that, only linear layers have parameters
# while activation functions do not have any parameters

[('model.0.weight', Parameter containing:
tensor([[ 0.0333,  0.0981, -0.0923, -0.3743,  0.0235,  0.3826],
        [ 0.0529,  0.0706, -0.1433,  0.1922, -0.0642,  0.3250],
        [-0.2854,  0.3561,  0.0471,  0.3494, -0.0483,  0.0387],
        [ 0.3905,  0.0492,  0.2485,  0.0249,  0.2655,  0.2105]],
       requires_grad=True)), ('model.0.bias', Parameter containing:
tensor([-1.1215e-01,  3.9474e-01,  2.4125e-04,  1.9658e-01],
       requires_grad=True)), ('model.2.weight', Parameter containing:
tensor([[ 0.2055,  0.1479,  0.4395, -0.3037],
        [ 0.3720, -0.3727,  0.0471, -0.2854]], requires_grad=True)), ('model.2.bias', Parameter containing:
tensor([0.1303, 0.3216], requires_grad=True))]


## Basics: backward() and "grad"
Now, we have our model. But that is not enough. Before taking about how to calculate the loss generated, let's first talk about how we back propagates the losses.

In [169]:
x = torch.tensor([5], dtype=torch.float32, requires_grad=True)
print(x)

x.backward()
print(x.grad) # Gradient of x with respect to itself / d(x)/d(x) = 1
print('-' * 25)

y = 6 * x ** 2
y.backward() # d(y)/d(x) + d(x)/d(x) = d(6x^2)/d(x) + 1 = 12x + 1 = 12 * 5 + 1 = 61
print(x.grad)

# ===================
# So far x.grad is updated accumulatively. This is also the nature when we do back 
# propagation - we are summing up all the losses. However, one thing to keep in mind
# that we need to reset the gradients to zero using `zero_grad()` after one iteration
# so that the losses are calculated correctly.
# ===================

# This shows how the gradients are accumulated
for i in range(5):
    y = 6 * x ** 2
    y.backward()
    print(x.grad)


tensor([5.], requires_grad=True)
tensor([1.])
-------------------------
tensor([61.])
tensor([121.])
tensor([181.])
tensor([241.])
tensor([301.])
tensor([361.])


## Optimization
Now, let's use a toy task to demostrate how to calculate loss, propagates it and updates the parameters. 

`torch.optim` modules contains multiple [optimizers](https://pytorch.org/docs/stable/optim.html) we can use. Then, `.parameters()` will be able to tell optimizer what parameters to update. `backward()` will then back propagates the gradient calculated.

**Toy Task:** 

getting a random tensor [0, 1), ground truth is a tensor with all 1s. After optimizing and back propagates, the model should train the parameters and output should be close to all ones.

In [170]:
import torch.optim as optim

class FFNN(nn.Module):

  def __init__(self, inp, hid, out):
    # Call to the __init__ function of the super class
    super(FFNN, self).__init__()
    # Init parameters 
    self.input_size = inp
    self.hidden_size = hid
    self.output_size = out
    # Building models
    self.model = nn.Sequential(
        nn.Linear(self.input_size, self.hidden_size),
        nn.ReLU(),
        nn.Linear(self.hidden_size, self.output_size),
        nn.Sigmoid()
    )
    
  def forward(self, x): 
    return self.model(x)

input = torch.rand(2, 2)
print(input)
goal = torch.ones(2, 2)
print(goal)

ffnn = FFNN(inp=2,
            hid=4,
            out=2)

## nn's parameters are passed into Adam with .parameters()
adam = optim.Adam(ffnn.parameters(), lr=0.05)
loss_function = nn.BCELoss()

pred = ffnn(input) # same as nn.forward(input)
print(f'Loss = {loss_function(pred, goal).item()}')

tensor([[0.9570, 0.3133],
        [0.8001, 0.9954]])
tensor([[1., 1.],
        [1., 1.]])
Loss = 0.7061628103256226


Now we have the loss function, what we need to do is just to updates our loss and updates the parameter. Let's start training!

In [171]:
print('Initial input:')
print(input)
print('-' * 25)
epochs = 100

for epoch in range(epochs):
    adam.zero_grad()
    pred = ffnn(input)
    loss = loss_function(pred, goal)
    loss.backward()
    adam.step()
    if epoch % 10 == 0:
        print(f'Epoch {epoch}/{epochs} - Loss = {loss.item()}')
print('-' * 25)
print('Final prediction:')
print(pred)

Initial input:
tensor([[0.9570, 0.3133],
        [0.8001, 0.9954]])
-------------------------
Epoch 0/100 - Loss = 0.7061628103256226
Epoch 10/100 - Loss = 0.2066076397895813
Epoch 20/100 - Loss = 0.0243238378316164
Epoch 30/100 - Loss = 0.004261927213519812
Epoch 40/100 - Loss = 0.0016252856003120542
Epoch 50/100 - Loss = 0.0010026042582467198
Epoch 60/100 - Loss = 0.0007797690341249108
Epoch 70/100 - Loss = 0.0006714300834573805
Epoch 80/100 - Loss = 0.0006035299156792462
Epoch 90/100 - Loss = 0.000552400597371161
-------------------------
Final prediction:
tensor([[0.9992, 0.9989],
        [0.9999, 0.9999]], grad_fn=<SigmoidBackward0>)


Not bad huh? Our model successfully predicts our input as close to torch.ones(2, 2) as possible! Good job :)

# Part III: Demo - Heart Disease Indicators
We have learned all the basics of PyTorch and we are ready to build a network to solve a problem! In this session, I will use a simple dataset to demonstrate how to build a PyTorch neural network from scratch.

This model will use a data science approach to indicate **whether a patient will have a heart disease based on the indicators provided.**

The model will include:

0. Dataset
1. train/test split
2. Dataloader
3. Batch
4. GPU
5. Training
6. Evaluation

## Collecting data
This dataset is imported from [kaggle](https://kaggle.com). Kaggle is a wonderful website to collect datasets. Feel free to check it out!

This dataset is from https://www.kaggle.com/datasets/alexteboul/heart-disease-health-indicators-dataset. It includes 253680 health indicators entries. Some indicators are BMI, age, diabetes, cholestrol, income, heart attack, etc. In this example, we are going to use heart attack as our label, using the rest data to predict whether a patient has heart attack or not.

In [172]:
dataframe = pd.read_csv('Data/heart_disease_health_indicators_BRFSS2015.csv')
print(dataframe.shape)
print(dataframe.head()) # Print out the first 5 rows

# First indicator is the label, so we skip it
indicators = dataframe.iloc[:, 1:].values
labels = dataframe.HeartDiseaseorAttack.values

(253680, 15)
   HeartDiseaseorAttack  HighBP  HighChol  CholCheck  Smoker  Stroke  \
0                     0       1         1          1       1       0   
1                     0       0         0          0       1       0   
2                     0       1         1          1       0       0   
3                     0       1         0          1       0       0   
4                     0       1         1          1       0       0   

   Diabetes  PhysActivity  Fruits  Veggies  HvyAlcoholConsump  AnyHealthcare  \
0         0             0       0        1                  0              1   
1         0             1       0        0                  0              0   
2         0             0       1        0                  0              1   
3         0             1       1        1                  0              1   
4         0             1       1        1                  0              1   

   NoDocbcCost  DiffWalk  Sex  
0            0         1    0  
1        

## Train/valid/test split
Spliting your data into different set is an important technique to evaluate your model. `sklearn` provides a function to achieve this. For more about train/valid/test split, check [this link](https://towardsdatascience.com/how-to-split-data-into-three-sets-train-validation-and-test-and-why-e50d22d3e54c)

In [173]:
from sklearn.model_selection import train_test_split

# Splitting the data
X_train, X_test, Y_train, Y_test = train_test_split(indicators, labels, test_size=0.2)

# ===================
# If you want to have train/valid/test split, you can split your test set again,
# thus you have 3 sets of data: train, valid, and test.
# ===================
print(X_train.shape)
print(X_test.shape)

(202944, 14)
(50736, 14)


## Batch/dataloader

One thing that is elegant about matrix is that we can batch our training data in bulk. Doing matrix operation to train our model is way faster and efficient than sending data one by one. PyTorch provides us `DataLoader` to do this in a simpler manner.

In [174]:
from torch.utils.data import DataLoader

train_data = list(zip(X_train, Y_train))
batch_size = 4
shuffle = True

dataloader = DataLoader(train_data, batch_size=batch_size, shuffle=shuffle)
counter = 0
batched_corpus, batched_labels = next(iter(dataloader))
print(len(batched_corpus))
print("Batched Input:")
print(batched_corpus)
print("Batched Labels:")
print(batched_labels)

4
Batched Input:
tensor([[1, 0, 1, 0, 0, 0, 1, 1, 1, 0, 1, 0, 0, 1],
        [0, 0, 1, 0, 0, 0, 1, 0, 1, 0, 1, 0, 0, 0],
        [1, 1, 1, 1, 0, 0, 1, 1, 1, 0, 1, 0, 0, 1],
        [1, 0, 1, 1, 0, 2, 0, 1, 1, 0, 1, 0, 1, 1]])
Batched Labels:
tensor([0, 0, 0, 0])


## Feed-Forward Neural Network
Now we have equipped all the tools, let's put everything together!

In [175]:
import torch
import torch.nn as nn
import numpy as np
import pandas as pd

from torch.utils.data import DataLoader
import torch.optim as optim
from sklearn.model_selection import train_test_split

dataframe = pd.read_csv('Data/heart_disease_health_indicators_BRFSS2015.csv')
indicators = dataframe.iloc[:, 1:].values
labels = dataframe.HeartDiseaseorAttack.values
X_train, X_test, Y_train, Y_test = train_test_split(indicators, labels, test_size=0.2)

In [176]:
class FFNN(nn.Module):
    
    def __init__(self, hyper_params):
        super(FFNN, self).__init__()
        self.hyper_params = hyper_params
        
        # model
        self.model = nn.Sequential(
            nn.Linear(hyper_params['input_dim'], hyper_params['hidden_dim']),
            nn.Sigmoid(),
            nn.Linear(hyper_params['hidden_dim'], hyper_params['output_dim']),
            nn.LogSoftmax(dim=0)
        )
        
    def forward(self, x):
        return self.model(x)
    
    def predict(self, x):
        return torch.argmax(self.model(x), dim=1)

## Training
This is the most basic and simple framework of building neural network using PyTorch. 
- First, you need to specify your training hyperparameters. 
- Then, initialize your model, optimizer and loss functions. 
- Last, follow the workflow we have learned above and train your model iteratively.

In [177]:
# This code will make our training on GPU if available
print(torch.device('cuda' if torch.cuda.is_available() else 'cpu'))

# Hyperparameters
hyper_params = {
    "batch_size": 1024,
    "input_dim" : len(X_train[0]),
    "hidden_dim": int(len(X_train[0])),
    "output_dim": 2,
    "learning_rate": 0.005,
    "num_epochs": 100,
}

# Batching data
train_data = list(zip(X_train, Y_train))
dataloader = DataLoader(train_data, batch_size=hyper_params['batch_size'], shuffle=True)

# Initializing model/optimizer/loss function
ffnn = FFNN(hyper_params)
optimizer = optim.Adam(ffnn.parameters(), lr=hyper_params['learning_rate'])
loss_function = nn.NLLLoss()

# Training
for epoch in range(hyper_params['num_epochs']):
    total_loss = 0
    for batch in dataloader:
        X_batch, Y_batch = batch
        # Zero the grads!
        optimizer.zero_grad()
        pred = ffnn(X_batch.float())
        loss = loss_function(pred, Y_batch)
        # Backpropagation
        loss.backward()
        optimizer.step()
        # Calculate total loss
        total_loss += loss.item()
    if epoch % 10 == 0:
        print(f'Epoch {epoch}/{hyper_params["num_epochs"]} - Loss = {total_loss}')


cuda
Epoch 0/100 - Loss = 1368.977780342102
Epoch 10/100 - Loss = 1366.5116505622864
Epoch 20/100 - Loss = 1366.4105682373047
Epoch 30/100 - Loss = 1366.409029006958
Epoch 40/100 - Loss = 1366.3991341590881
Epoch 50/100 - Loss = 1366.3535523414612
Epoch 60/100 - Loss = 1366.3289322853088
Epoch 70/100 - Loss = 1366.3732919692993
Epoch 80/100 - Loss = 1366.3712034225464
Epoch 90/100 - Loss = 1366.3232707977295


## Evaluation & confusion matrix
Now it is the time to test our test result. One way is to calculate [confusion matrix](https://en.wikipedia.org/wiki/Confusion_matrix) for our model. 

Credits: https://www.cs.utexas.edu/~gdurrett/courses/fa2019/cs388.shtml

In [178]:
def print_evaluation(golds, predictions):
    """
    Prints evaluation statistics comparing golds and predictions, each of which is a sequence of 0/1 labels.
    Prints accuracy as well as precision/recall/F1 of the positive class, which can sometimes be informative if either
    the golds or predictions are highly biased.

    :param golds: gold labels
    :param predictions: pred labels
    :return:
    """
    num_correct = 0
    num_pos_correct = 0
    num_pred = 0
    num_gold = 0
    num_total = 0
    if len(golds) != len(predictions):
        raise Exception("Mismatched gold/pred lengths: %i / %i" % (len(golds), len(predictions)))
    for idx in range(0, len(golds)):
        gold = golds[idx]
        prediction = predictions[idx]
        if prediction == gold:
            num_correct += 1
        if prediction == 1:
            num_pred += 1
        if gold == 1:
            num_gold += 1
        if prediction == 1 and gold == 1:
            num_pos_correct += 1
        num_total += 1
    acc = float(num_correct) / num_total
    output_str = "Accuracy: %i / %i = %f" % (num_correct, num_total, acc)
    prec = float(num_pos_correct) / num_pred if num_pred > 0 else 0.0
    rec = float(num_pos_correct) / num_gold if num_gold > 0 else 0.0
    f1 = 2 * prec * rec / (prec + rec) if prec > 0 and rec > 0 else 0.0
    output_str += ";\nPrecision (fraction of predicted positives that are correct): %i / %i = %f" % (num_pos_correct, num_pred, prec)
    output_str += ";\nRecall (fraction of true positives predicted correctly): %i / %i = %f" % (num_pos_correct, num_gold, rec)
    output_str += ";\nF1 (harmonic mean of precision and recall): %f;\n" % f1
    print(output_str)

pred_test = ffnn.predict(torch.Tensor(X_test))
print_evaluation(Y_test, pred_test.tolist())

Accuracy: 35479 / 50736 = 0.699287;
Precision (fraction of predicted positives that are correct): 3734 / 17911 = 0.208475;
Recall (fraction of true positives predicted correctly): 3734 / 4814 = 0.775654;
F1 (harmonic mean of precision and recall): 0.328625;



## What is wrong?
Though we achieve a good accuracy, but if you paid close attention to precision score, it is not ideal. Our model having a bad performance when predicting positives. With that being said, even though our model achieves relatively high accuracy, but it may tend to predict non-positives more. Why does this happen?

Let's check our training data first:

In [179]:
print(f'Percentage of positive label: {sum(Y_train)/len(Y_train) * 100}%')

Percentage of positive label: 9.401115578681804%


Wow! This dataset is super imbalanced! Our data only has 9% positive labels. Thus, our model is trained with a huge ammount of "0" labels. No wonder it tends to predict "0" more.

# What is next?
Here I demonstrate a simple toy task to show you how to build a neural network from scratch. As you can see, there is a lot more we can improve. Some points we can work on:
- Collect a better (balanced) dataset
- Preprocessing your dataset
- Try different combinations of layers (adding dropouts, initializing layers differently)
- Play with different combinations of optimizers(`SGD`) and loss functions(`BCELoss`)
- Explore different tasks (NLP: sentiment analysis, CV: multiclass classification)
- Try out different models (LSTM, RNN, transformers, CNN)
- Research on different evaluation method

Thanks for reading! Wish you all the best and have a wonderful life <3