# Introduction to Neural Networks

## In this tutorial, we will learn how to build Nerual Networks using PyTorch.

### In the first part, we will build a MLP to recognize handwritten digits using PyTorch. There will be three coding questions in this part.

### In the second part, we will build an Autoencoder, and use it in the same dataset.

First we need to enable GPU in colab, click ```Edit```, ```Notebook Setings``` to enable T4 GPU, then we could use GPU in this tutorial.

Before the example, please review some concepts and the examples in the slides.

We need to be familiar with the following concepts:

1. ```Forward Process```

2. ```Activation Functions```

3. ```Backpropagation Algorithm```

4. ```Regularization```

5. ```loss function```

6. ```Encoder```

7. ```Decoder```

## Import necessary libraries

matplotlib, tqdm, and plotly for visualization.

numpy, pandas, keras, sklearn, and torch for modelling.



In [2]:
import os
import gc
import time
import numpy as np
import pandas as pd

import warnings
warnings.filterwarnings('ignore')
import random
from tqdm.notebook import tqdm
import matplotlib.pyplot as plt
from sklearn.utils import shuffle
from keras.utils import to_categorical

import plotly.express as px
import plotly.graph_objects as go
import plotly.figure_factory as ff
from plotly.subplots import make_subplots

import torch
import torch.nn as nn
from torch.optim import Adam
from torch.utils.data import Dataset, DataLoader

In [3]:
# you can use "#" to write comments in code part.
# now we set up random seeds
def set_seed(seed):
    random.seed(seed)
    np.random.seed(seed)
    torch.manual_seed(seed)
    torch.backends.cudnn.benchmark = False
    torch.backends.cudnn.deterministic = True
    if torch.cuda.is_available() > 0:
        torch.cuda.manual_seed_all(seed)
set_seed(2023)

<torch._C.Generator at 0x7a5e10fc1b90>

## Load data

### we need to first load data from Google Drive, then define Pytorch dataset
### Raw csv file
Here is the link of train.csv file: https://drive.google.com/file/d/16o7yH3keerXyYOONwqM8AU56zFOKt-sL/view?usp=sharing

Here is the link of test.csv: https://drive.google.com/file/d/1E6GvUUnlhJShV9v04hdhqiub7ebIRt8o/view?usp=sharing

You need to download those two, and put it in your own google drive for using.

### Pytorch dataset
In order to put data into our model, we need to build a class which have three functions: ```init```, ```len```, and ```getitem```.

```init``` is used to initializes all components of the dataset class

```len``` is used to return the length of dataset, AKA the total number of data points

```getitem``` returns a data point with a index, this is used while training or testing.

In [4]:
from google.colab import drive
# this will direct you to a page to log in your google drive
drive.mount('/content/drive')

Mounted at /content/drive


In [5]:
# the specific path depends on where you put the csv file
# here we load the target file into DataFrame format.
train_df = pd.read_csv('drive/MyDrive/digit-recognizer/train.csv')
test_df = pd.read_csv('drive/MyDrive/digit-recognizer/test.csv')


In [6]:
# have a glance at our data
train_df.head()

Unnamed: 0,label,pixel0,pixel1,pixel2,pixel3,pixel4,pixel5,pixel6,pixel7,pixel8,...,pixel774,pixel775,pixel776,pixel777,pixel778,pixel779,pixel780,pixel781,pixel782,pixel783
0,1,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,1,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,4,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [7]:
def to_tensor(data):
  # this function is used to change values to tensor format
  return [torch.FloatTensor(point) for point in data]

class MNISTDataset(Dataset):
  def __init__(self, df, X_col, y_col):
    # the features is the input of our model
    # the target is the output of our model
    self.features = df[X_col].values/255
    self.targets = df[y_col].values.reshape((-1,1))

  def __len__(self):
    return len(self.targets)

  def __getitem__(self, idx):
    return to_tensor([self.features[idx], self.targets[idx]])

y_col = "label"

split = int(0.8*len(train_df))
# split train set into train and valid set
valid_df = train_df[split:].reset_index(drop=True)
train_df = train_df[:split].reset_index(drop=True)

In [8]:
X_col = list(train_df.columns[1:])
train_set = MNISTDataset(train_df, X_col, y_col)
valid_set = MNISTDataset(valid_df, X_col, y_col)

# then we use the dataset we build to initialze two Dataloader
# About the dsecription of Pytorch dataloader, please refer to the official documents
# Here we only need to use them as a function
train_loader = DataLoader(train_set, batch_size=1024, shuffle=True)
valid_loader = DataLoader(valid_set, batch_size=1024, shuffle=True)

In [9]:
####################assignment session I.1########################
# Assignment I.1 Requirements:
# You only need to choose one from I.1 and I.2.
# If you prefer, you could do both.
# Please try other batch_size in the Dataloader
# See what happens in the experiments
# you could write the new dataloader in this code block, and report
# your result as comments here
# your answer:
#
#
#
#
#






###################assignment session############################

## Build NN model

In the next section, we define a MLP model which we learn in the example.

The NN in example is a toy example, here we build an MLP with two hidden layers of 20 neurons and 15 neurons, and then the output layer is used to generate 10 values output.

And we use ReLU as our activation function after each neural layer.






In [10]:
class MLP(nn.Module):
  def __init__(self):
    super(MLP, self).__init__()
    self.relu_layer = nn.ReLU()
    self.nn_1 = nn.Linear(784,20)
    self.nn_2 = nn.Linear(20,15)
    self.output = nn.Linear(15,10)

  def forward(self, x):
    x = self.relu_layer(self.nn_1(x))
    x = self.relu_layer(self.nn_2(x))
    logits = self.output(x)
    return logits


In [11]:
####################assignment session II###########################
# Assignment II Requirements:
# Please try other activation functions in our slides in the MLP
# And use the new network in the following experiments
# See what happens
# you could write the new network in this code block, and report
# your result as comments here
# your answer:
#
#
#
#
#






###################assignment session############################

In [12]:
# define the device, network(MLP), optimizer
# here we use Adam optimizer, it is a advanced optimizer that suitable for most ML tasks
device = torch.device('cuda')
NN_network = MLP().to(device)
optimizer = Adam(params=NN_network.parameters(), lr=0.005)

# then we print the network to see what happened
print(NN_network)

MLP(
  (relu_layer): ReLU()
  (nn_1): Linear(in_features=784, out_features=20, bias=True)
  (nn_2): Linear(in_features=20, out_features=15, bias=True)
  (output): Linear(in_features=15, out_features=10, bias=True)
)


In [13]:
####################assignment session I.2########################
# Assignment I.2 Requirements:
# You only need to choose one from I.1 and I.2.
# If you prefer, you could do both.
# Please try other learning rate in the optimizer
# See what happens in the experiments
# you could write the new dataloader in this code block, and report
# your result as comments here
# your answer:
#
#
#
#
#






###################assignment session############################

In [14]:
# define binary cross entropy and accuracy
# our optimization goal is to reduce the loss function

def cross_entropy(y_true, y_pred):
  y_true = y_true.long().squeeze()
  return nn.CrossEntropyLoss()(y_pred, y_true)

def acc(y_true, y_pred):
  y_true = y_true.long().squeeze()
  y_pred = torch.argmax(y_pred, axis=1)
  return (y_true == y_pred).float().sum()/len(y_true)

In [15]:
# now let's start train the model on GPU
start = time.time()
print('starting traing...')

# those two lists are used to store the loss/acc in different epoches for visualization
train_losses, valid_losses = [],[]
train_accuracies, valid_accuracies = [],[]

for epoch in range(10):
  NN_network.train()
  print('epoch:{}'.format(epoch+1))
  batch_train_losses, batch_train_accuracies = [], []
  batch = 0
  for train_batch in train_loader:
    # here we clear the gradient in optimizer, to start a new batch of training
    # everytime we want to update the network by BP, we need to clear the gradient first
    optimizer.zero_grad()

    train_X, train_y = train_batch
    # since we use GPU, we need to deliver variables to our device GPU
    train_X = train_X.to(device)
    train_y = train_y.to(device)
    # the forward here means the forward process
    train_preds = NN_network.forward(train_X)
    # here we compute the loss function and the acc
    train_loss = cross_entropy(train_y, train_preds)
    train_acc = acc(train_y, train_preds)

    # now we compute the gradient of the newwork by backward()
    train_loss.backward()
    # here we let our optimizer update parameters, using the learning rate we define
    optimizer.step()
    end = time.time()
    batch = batch+1

    time_delta = np.round(end-start,3)
    # here we get the item() from tensor, and store it as the batch loss/acc
    batch_train_losses.append(np.round(train_loss.item(),3))
    batch_train_accuracies.append(np.round(train_acc.item(),3))
    # visualize the processing by printing
    if batch%20 == 0:
      print("Batch:{}, Train Loss:{}, Train Acc:{}, Time:{}".format(batch, train_loss, train_acc, time_delta))

  # compute the average loss/acc of those batches
  train_losses.append(np.mean(batch_train_losses))
  train_accuracies.append(np.mean(batch_train_accuracies))

  total_valid_loss = 0
  total_valid_points = 0
  total_valid_accuracy = 0

  # now we evaluate our model on the valid set, since we don't need to train the
  # model, we require no_grad() here.
  # you can observe the difference between training and evaluation
  with torch.no_grad():
    for valid_batch in valid_loader:
      valid_X, valid_y = valid_batch
      valid_X = valid_X.to(device)
      valid_y = valid_y.to(device)
      valid_preds = NN_network.forward(valid_X)
      valid_loss = cross_entropy(valid_y, valid_preds)
      valid_acc = acc(valid_y, valid_preds)

      # here we record the points, for computing the average loss/acc later
      total_valid_points += 1
      total_valid_loss += valid_loss.item()
      total_valid_accuracy += valid_acc.item()

  valid_loss = np.round(total_valid_loss/total_valid_points, 3)
  valid_acc = np.round(total_valid_accuracy/total_valid_points, 3)

  # we record those two into lists for visualization
  valid_losses.append(valid_loss)
  valid_accuracies.append(valid_acc)

  end = time.time()
  time_delta = np.round(end-start, 3)
  print('Epoch:{}, Valid Loss:{}, Valid Acc:{}, Time:{}'.format(epoch+1, valid_loss, valid_acc, time_delta))

print("finish training.")



starting traing...
epoch:1
Batch:20, Train Loss:1.2394614219665527, Train Acc:0.6953125, Time:1.495
Epoch:1, Valid Loss:0.61, Valid Acc:0.829, Time:2.072
epoch:2
Batch:20, Train Loss:0.45334598422050476, Train Acc:0.873046875, Time:2.666
Epoch:2, Valid Loss:0.39, Valid Acc:0.881, Time:3.257
epoch:3
Batch:20, Train Loss:0.30489853024482727, Train Acc:0.9052734375, Time:3.841
Epoch:3, Valid Loss:0.324, Valid Acc:0.907, Time:4.434
epoch:4
Batch:20, Train Loss:0.26262131333351135, Train Acc:0.927734375, Time:5.006
Epoch:4, Valid Loss:0.297, Valid Acc:0.914, Time:5.601
epoch:5
Batch:20, Train Loss:0.2510099411010742, Train Acc:0.9208984375, Time:6.404
Epoch:5, Valid Loss:0.272, Valid Acc:0.924, Time:7.007
epoch:6
Batch:20, Train Loss:0.2573152780532837, Train Acc:0.9287109375, Time:7.604
Epoch:6, Valid Loss:0.25, Valid Acc:0.925, Time:8.207
epoch:7
Batch:20, Train Loss:0.2785540819168091, Train Acc:0.9267578125, Time:8.873
Epoch:7, Valid Loss:0.254, Valid Acc:0.925, Time:9.809
epoch:8
Batch

In [16]:
# now we visualize the training process, observe how the loss change over epochs
fig = go.Figure()

fig.add_trace(go.Scatter(x=np.arange(1, len(valid_losses)),
                         y=valid_losses, mode="lines+markers", name="valid",
                         marker=dict(color="indianred", line=dict(width=.5,
                                                                  color='rgb(0, 0, 0)'))))

fig.add_trace(go.Scatter(x=np.arange(1, len(train_losses)),
                         y=train_losses, mode="lines+markers", name="train",
                         marker=dict(color="darkorange", line=dict(width=.5,
                                                                   color='rgb(0, 0, 0)'))))

fig.update_layout(xaxis_title="Epochs", yaxis_title="Cross Entropy",
                  title_text="Cross Entropy Change", template="plotly_white", paper_bgcolor="#f0f0f0")

fig.show()

In [17]:
####################assignment session III###########################
# Assignment III Requirements:
# we load both train and test in the previous codes
# in this assignment you need to build a test dataloader, and try to
# evaluate your well-trained model on test dataset
# then report the loss and acc
# you could write new codes in all following code blocks, and report
# your result as comments here
# your answer:
#
#
#
#
#






###################assignment session################################

# Autoencoder

## In this part, we build a Autoencoder to replace the MLP in the MNIST experiments

The structure of Autoencoder require the same number of input neurons and output neurons. In our experiments, let's build a autoencoder with 20-10-20 neurons.

In [23]:
class Autoencoder(nn.Module):
  def __init__(self):
    super(Autoencoder, self).__init__()
    self.relu_layer = nn.ReLU()
    self.nn_1 = nn.Linear(784,20)
    self.nn_2 = nn.Linear(20,10)
    self.nn_3 = nn.Linear(10,20)
    self.output = nn.Linear(20,10)

  def forward(self, x):
    x = self.relu_layer(self.nn_1(x))
    x = self.relu_layer(self.nn_2(x))
    x = self.relu_layer(self.nn_3(x))
    logits = self.output(x)
    return logits

In [24]:
device = torch.device('cuda')
NN_network = Autoencoder().to(device)
optimizer = Adam(params=NN_network.parameters(), lr=0.005)

# then we print the network to see what happened
print(NN_network)

Autoencoder(
  (relu_layer): ReLU()
  (nn_1): Linear(in_features=784, out_features=20, bias=True)
  (nn_2): Linear(in_features=20, out_features=10, bias=True)
  (nn_3): Linear(in_features=10, out_features=20, bias=True)
  (output): Linear(in_features=20, out_features=10, bias=True)
)


In [25]:
# now let's start train the model on GPU
start = time.time()
print('starting traing...')

# those two lists are used to store the loss/acc in different epoches for visualization
train_losses, valid_losses = [],[]
train_accuracies, valid_accuracies = [],[]

for epoch in range(10):
  NN_network.train()
  print('epoch:{}'.format(epoch+1))
  batch_train_losses, batch_train_accuracies = [], []
  batch = 0
  for train_batch in train_loader:
    # here we clear the gradient in optimizer, to start a new batch of training
    # everytime we want to update the network by BP, we need to clear the gradient first
    optimizer.zero_grad()

    train_X, train_y = train_batch
    # since we use GPU, we need to deliver variables to our device GPU
    train_X = train_X.to(device)
    train_y = train_y.to(device)
    # the forward here means the forward process
    train_preds = NN_network.forward(train_X)
    # here we compute the loss function and the acc
    train_loss = cross_entropy(train_y, train_preds)
    train_acc = acc(train_y, train_preds)

    # now we compute the gradient of the newwork by backward()
    train_loss.backward()
    # here we let our optimizer update parameters, using the learning rate we define
    optimizer.step()
    end = time.time()
    batch = batch+1

    time_delta = np.round(end-start,3)
    # here we get the item() from tensor, and store it as the batch loss/acc
    batch_train_losses.append(np.round(train_loss.item(),3))
    batch_train_accuracies.append(np.round(train_acc.item(),3))
    # visualize the processing by printing
    if batch%20 == 0:
      print("Batch:{}, Train Loss:{}, Train Acc:{}, Time:{}".format(batch, train_loss, train_acc, time_delta))

  # compute the average loss/acc of those batches
  train_losses.append(np.mean(batch_train_losses))
  train_accuracies.append(np.mean(batch_train_accuracies))

  total_valid_loss = 0
  total_valid_points = 0
  total_valid_accuracy = 0

  # now we evaluate our model on the valid set, since we don't need to train the
  # model, we require no_grad() here.
  # you can observe the difference between training and evaluation
  with torch.no_grad():
    for valid_batch in valid_loader:
      valid_X, valid_y = valid_batch
      valid_X = valid_X.to(device)
      valid_y = valid_y.to(device)
      valid_preds = NN_network.forward(valid_X)
      valid_loss = cross_entropy(valid_y, valid_preds)
      valid_acc = acc(valid_y, valid_preds)

      # here we record the points, for computing the average loss/acc later
      total_valid_points += 1
      total_valid_loss += valid_loss.item()
      total_valid_accuracy += valid_acc.item()

  valid_loss = np.round(total_valid_loss/total_valid_points, 3)
  valid_acc = np.round(total_valid_accuracy/total_valid_points, 3)

  # we record those two into lists for visualization
  valid_losses.append(valid_loss)
  valid_accuracies.append(valid_acc)

  end = time.time()
  time_delta = np.round(end-start, 3)
  print('Epoch:{}, Valid Loss:{}, Valid Acc:{}, Time:{}'.format(epoch+1, valid_loss, valid_acc, time_delta))

print("finish training.")

starting traing...
epoch:1
Batch:20, Train Loss:1.321092128753662, Train Acc:0.5634765625, Time:0.616
Epoch:1, Valid Loss:0.7, Valid Acc:0.782, Time:1.228
epoch:2
Batch:20, Train Loss:0.5376166701316833, Train Acc:0.8369140625, Time:2.062
Epoch:2, Valid Loss:0.414, Valid Acc:0.876, Time:2.677
epoch:3
Batch:20, Train Loss:0.45584404468536377, Train Acc:0.8623046875, Time:3.274
Epoch:3, Valid Loss:0.338, Valid Acc:0.905, Time:3.869
epoch:4
Batch:20, Train Loss:0.31345847249031067, Train Acc:0.8935546875, Time:4.469
Epoch:4, Valid Loss:0.311, Valid Acc:0.908, Time:5.062
epoch:5
Batch:20, Train Loss:0.2526029646396637, Train Acc:0.9228515625, Time:5.667
Epoch:5, Valid Loss:0.281, Valid Acc:0.916, Time:6.276
epoch:6
Batch:20, Train Loss:0.26279640197753906, Train Acc:0.92578125, Time:6.883
Epoch:6, Valid Loss:0.251, Valid Acc:0.925, Time:7.74
epoch:7
Batch:20, Train Loss:0.20235870778560638, Train Acc:0.9384765625, Time:8.478
Epoch:7, Valid Loss:0.234, Valid Acc:0.933, Time:9.403
epoch:8
Ba

In [None]:
#################### additional questions#########################
# you don't need write down any answer for credicts
# those questions are for critical thinking
# 1. Which one perform better on this data, MLP or Autoencoder?
# 2. When the network size increase, is it must better than smaller one?
# 3. Why we need regularization? How it will influence huge and small NNs?
# 4. Why we want the same size of output from Autoencoder? Can we use others instead?