# Neural Network

## 3.1 Dataloader
### What is Dataloader
Dataloader is a class that helps with shuffling and organizing the data in minibatches. We can import this class from `torch.utils.data`.

The job of a data loader is to sample minibatches from a dataset, giving us the flexibility to choose the size of our minibatch to be use for training in each iteration. The constructor takes a `Dataset` object as input, along with `batch_size` and a `shuffle` boolean variable that indicates whether the data needs to be shuffled at the beginning of each epoch.

In this chapter, we are going to do classification task based on Fashion MNIST dataset. Fashion MNIST dataset could be directly imported and downloaded from `torchvision.datasets.FashionMNIST`. Pytorch has collected several datasets (CIFAR, COCO, Cityscapes, etc..) in the `torchvision` library, you may have a look of the full list of datasets at [here](https://pytorch.org/docs/stable/torchvision/datasets.html).

In [None]:
# importing the required library
import torch
import torchvision
import torchvision.transforms as transforms
from torch.utils.data import Dataset, DataLoader
import matplotlib.pyplot as plt
import numpy as np

In [None]:
# Loading/Downloading the FashionMNIST dataset, download might takes some time 
train_set = torchvision.datasets.FashionMNIST(
    root = '../data',
    train = True,
    download = True,
    transform = transforms.ToTensor()
    )
test_set = torchvision.datasets.FashionMNIST(
    root = '../data',
    train = False,
    download = True,
    transform = transforms.ToTensor()
    )

Loading the dataset into the `DataLoader` and input your desired batch size for training

In [None]:
train_loader = DataLoader(train_set, batch_size = 32, shuffle = True)
test_loader = DataLoader(test_set, batch_size = 32, shuffle = False)

In [None]:
# A view of the DataLoader

batch = next(iter(train_loader))
images, labels = batch

# Output the size of each batch
print(images.shape, labels.shape)

Each images are assigned to one of the following labels:

- 0 T-shirt/top
- 1 Trouser
- 2 Pullover
- 3 Dress
- 4 Coat
- 5 Sandal
- 6 Shirt
- 7 Sneaker
- 8 Bag
- 9 Ankle boot

Let us plot the image out to have a look on how does the dataset looks like.

In [None]:
# Converting numeric labels to text label

def labelsText(labels):
    labelDict = {
                 0: "T-shirt/Top",
                 1: "Trouser",
                 2: "Pullover",
                 3: "Dress",
                 4: "Coat", 
                 5: "Sandal", 
                 6: "Shirt",
                 7: "Sneaker",
                 8: "Bag",
                 9: "Ankle Boot"
                 }
    label = (labels.item() if type(labels) == torch.Tensor else labels)
    return labelDict[label]

In [None]:
# Plotting out the images in the dataset

grid = torchvision.utils.make_grid(images[0:10], nrow = 10)

plt.figure(figsize = (15, 15))
plt.imshow(np.transpose(grid, (1, 2, 0)))

print("Labels: ")
for i in labels[0:10]:
    print(labelsText(i) + ", ", end = "")

## 3.2 Build your first Neural Network

### 3.2.1 Model Training
We had loaded our dataset into training and testing set, now let us build a simple Feedfoward Neural Network to perform classification on this dataset.

PyTorch has a whole submodule dedicated to neural networks, called `torch.nn`. It contains the building blocks needed to create all sorts of neural network architectures.

To build a Neural Network, it could be done in two ways :
- Calling the `nn.Sequential()` for fast implementation of the network
- Subclassing `nn.Module` to have more flexibility on designing the network, eg: writing the your own `foward()` method


Now let us start building the Neural Network

In [None]:
import torch.nn as nn
import torch.optim as optim
import torch.nn.functional as F

We would like to build a 4 layers neural network with ReLU activation function. Apply dropout with 20% probability to reduce the effect of overfitting. Let us try build our model using `nn.Sequential`.

In [None]:
# nn.Sequential()
torch.manual_seed(0)
model_sequential = nn.Sequential(nn.Linear(784,256),
                                 nn.Dropout(0.2),
                                 nn.ReLU(),
                                 nn.Linear(256,128),
                                 nn.Dropout(0.2),
                                 nn.ReLU(),
                                 nn.Linear(128,64),
                                 nn.Dropout(0.2),
                                 nn.ReLU(),
                                 nn.Linear(64,10),
                                )

We will build a wrapper function for our training called `training`. This wrapper function will take on parameters:
- n_epochs
- optimizer
- model
- loss_fn
- train_loader
- writer (Instance of Summary Writer to use TensorBoard for visualization)

Pytorch does support TensorBoard which provides the visualization and tooling needed for machine learning experimentation. It is a useful tool that we can use during our training. Now let's define our training loop and implement some of the TensorBoard methods. 

If you wish to know more on TensorBoard, you can access it at [here](https://pytorch.org/docs/stable/tensorboard.html)

In [None]:
from torch.utils.tensorboard import SummaryWriter

def training(n_epochs, optimizer, model, loss_fn, train_loader, writer):
    for epoch in range(1, n_epochs + 1):
        loss_train = 0.0
        total = 0
        correct = 0
        for imgs, labels in train_loader:
            # Clearing gradient from previous mini-batch gradient computation  
            optimizer.zero_grad()
            
            # Reshape the tensor so that it fits the dimension of our input layer
            # Get predictions output from the model
            outputs = model(imgs.view(-1, 784))
            
            # Calculate the loss for curernt batch
            loss = loss_fn(outputs, labels)
            
            # Calculating the gradient
            loss.backward()
            
            # Updating the weights and biases using optimizer.step
            optimizer.step()
            
            # Summing up the loss over each epoch
            loss_train += loss.item()
            
            # Calculating the accuracy
            predictions = torch.max(outputs, 1)[1]
            correct += (predictions == labels).sum().item()
            total += len(labels)

        accuracy = correct * 100 / total
        writer.add_scalar('Loss ', loss_train / len(train_loader), epoch)
        writer.add_scalar('Accuracy ', accuracy, epoch)
        print('Epoch {}, Training loss {} , Accuracy {:.2f} %'.format(epoch, loss_train / len(train_loader), accuracy))
    writer.close()

We can open our TensorBoard in the terminal with the command of `tensorboard --logdir=runs`. Do remember change to the same directory as this notebook.

Now we are ready for training. Let's use `SGD` as our optimizer and `CrossEntropy` as loss function. 

In [None]:
torch.manual_seed(0)
model_SGD = model_sequential 
optimizer = optim.SGD(model_SGD.parameters(), lr = 1e-3) 
loss_fn = nn.CrossEntropyLoss()
writer = SummaryWriter(comment = 'SGD')
training(
    n_epochs = 10,
    optimizer = optimizer,
    model = model_SGD,
    loss_fn = loss_fn,
    train_loader = train_loader,
    writer = writer
)

Let us build another model which we set log softmax as the activation function at the output layer and uses Negative log-likelihood loss function. Compare the results for both of these setting. This time we are going to build by subclassing `nn.Module`.

In [None]:
# Subclassing nn.Module
class Classifier(nn.Module):
    def __init__(self):
        super().__init__()
        self.fc_1 = nn.Linear(784, 256)
        self.act_1 = nn.ReLU()
        self.fc_2 = nn.Linear(256, 128)
        self.act_2 = nn.ReLU()
        self.fc_3 = nn.Linear(128, 64)
        self.act_3 = nn.ReLU()
        self.fc_4 = nn.Linear(64, 10)
        self.dropout = nn.Dropout(0.2)
        
    def forward(self, x):
        out = self.dropout(self.act_1(self.fc_1(x)))
        out = self.dropout(self.act_2(self.fc_2(out)))
        out = self.dropout(self.act_3(self.fc_3(out)))
        # adding in softmax
        out = F.log_softmax(self.fc_4(out), dim = 1)
        return out
    
# Or you can use the Pytorch provided functional API when defining the forward method. Both of these are the same.

class Classifier_F(nn.Module):
    def __init__(self):
        super().__init__()
        self.fc_1 = nn.Linear(784, 256)
        self.fc_2 = nn.Linear(256, 128)
        self.fc_3 = nn.Linear(128, 64)
        self.fc_4 = nn.Linear(64, 10)
        
    def forward(self, x):
        out = F.dropout(F.relu(self.fc_1(x)), p = 0.2)
        out = F.dropout(F.relu(self.fc_2(out)), p = 0.2)
        out = F.dropout(F.relu(self.fc_3(out)), p = 0.2)
        out = F.log_softmax(self.fc_4(out), dim = 1)
        return out

In [None]:
torch.manual_seed(0)
model_SGD = Classifier() 
optimizer = optim.SGD(model_SGD.parameters(), lr = 1e-3) 
loss_fn = nn.NLLLoss()
writer = SummaryWriter(comment = 'SGD')
training(
    n_epochs = 10,
    optimizer = optimizer,
    model = model_SGD,
    loss_fn = loss_fn,
    train_loader = train_loader,
    writer = writer
)

CrossEntropy is actually performing log softmax and negative log likelihood at the same time. Therefore during the construction of our model we could neglect the declaration of activation function at the output layer and save some memory during the backpropagation.

Let us try using other optimizer `Adam` to do our training. Optimizer is one of the hyperparameters that we can tune on.

In [None]:
model_Adam = Classifier() 
optimizer = optim.Adam(model_Adam.parameters(), lr = 1e-3) 
loss_fn = nn.CrossEntropyLoss()
writer = SummaryWriter(comment = 'Adam')
training(
    n_epochs = 10,
    optimizer = optimizer,
    model = model_Adam,
    loss_fn = loss_fn,
    train_loader = train_loader,
    writer = writer
)

In this case, we can see that `Adam` is performing better than the `SGD` with the same setting. Hyperparameter tuning is very important in order to obtain desired result

### 3.2.2 Model Saving
After training the model, we would like to save it for future usages. There are some pretty useful functions you might need to familar with:

- `torch.save`: It serialize the object to save to your machine. Models, tensors, and dictionaries of all kinds of objects can be saved using this function.
- `torch.load`: This function uses pickle’s unpickling facilities to deserialize pickled object files to memory.
- `torch.nn.Module.load_state_dict`: Loads a model’s parameter dictionary using a deserialized state_dict.

If you wish to know more on model saving, you can access it at [here](https://pytorch.org/tutorials/beginner/saving_loading_models.html)

#### Saving only the weights

In [None]:
import os
if not os.path.exists('../generated_model'):
    os.mkdir('../generated_model')

In [None]:
# Saving the weights only of the model
torch.save(model_Adam.state_dict(),  '../generated_model/mnist_state_dict.pt')

In [None]:
# To load the state_dict, you must have an instance of the model
modelLoad = Classifier()
modelLoad.load_state_dict(torch.load('../generated_model/mnist_state_dict.pt'))

#### Saving the entire model

In [None]:
# Saving the entire model
torch.save(model_Adam, '../generated_model/mnist_model.pt')

In [None]:
# Loading model
modelLoad = torch.load('../generated_model/mnist_model.pt')

### Add-ons: Saving Model in ONNX format
Pytorch also support saving model as ONNX (Open Neural Network Exchange) file type, which is a open format built to represent machine learning models. Let's see how to do it.

In [None]:
import torch.onnx 
dummy_input = torch.randn(32, 784, requires_grad = True)
torch.onnx.export(model_Adam, dummy_input, '../generated_model/model.onnx', verbose = True, input_names = ['input'], output_names = ['output'])

In [None]:
import onnx
#loading the onnx format model
model = onnx.load('../generated_model/model.onnx')

### 3.2.3 Inference
Sometimes, we would like to inference on the trained model to evaluate the performance. `model.eval()` will set the model to evaluation(inference) mode to set dropout, batch normalization layers, etc.. to evaluation mode. Evaluation mode will disable the usage of dropout and batch normalization during the `foward` method as it is not required during the inference.

In [None]:
# Using previous loaded model
modelLoad.eval()           

After setting it to inference mode, we could pass in test data with the setting of 
```python 
with torch.no_grad():
``` 
as we do not have to calculate the gradient during the inference, this can help us save some memory.

In [None]:
with torch.no_grad():
    correct = 0
    total = 0
    for images, labels in test_loader:
        outputs = modelLoad(images.view(-1, 784))
        predictions = torch.max(outputs, 1)[1]
        correct += (predictions == labels).sum()
        total += len(labels)
    accuracy_test = correct.item() * 100 / total
print("Test Accuracy : {:.2f} %".format(accuracy_test))

## 3.3 Build your second Neural Network
### 3.3.1 Model Training

Altough there are many other machine learning techniques to tackle multi-variate linear regression, it would be interesting for us to tackle it using deep learning for learning purposes.
<br>In this sub-section, we will try to perform said regression using PyTorch `SequentialModel` 

We will use the Real Estate dataset from the `realEstate.csv` for our linear regression example. 

Description of data:
- House Age
- Distance from the unit to MRT station
- The number of Convenience Stores around the unit
- House Unit Price per 1000 USD

In [None]:
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler

First we use pandas to load in the csv.<br>
Note that in this dataset there are a total of $3$ features and $1$ label.<br>
Thus from the data we will use `.iloc[]` to distinguish the features and labels.

In [None]:
data = pd.read_csv("../data/Regression/realEstate.csv", header = 0)
n_features = 3
X = data.iloc[:, 0:3].values
y = data.iloc[:, 3].values

Following that, we split our dataset into 70/30 train/test ratio.

In [None]:
X_train, X_test, y_train, y_test = train_test_split(X, y, train_size = 0.7, shuffle = True, random_state = 1022)

Next, we perform feature scaling onto `X_train` and `X_test` using `StandardScaler` from `scikit-learn`.<br>
*Note: only fit the train_set but transform both train and test sets*

In [None]:
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)

In section 3.1, we've touch on how Dataloaders are initialized and used in model training. It was simple, which is to pass in whatever `Dataset` we need into the Dataloader initializer. <br>

Here, we are using a custom dataset from a csv file as compared to the previous one which was prepared readily from torchvision. Thus in this case, we will have to build our own by subclassing from `torch.utils.data.Dataset`.

Whilst subclassing `Dataset`, PyTorch [documentation](https://pytorch.org/docs/stable/data.html#torch.utils.data.Dataset) notes that we have to override the `__getitem__()` method and optionally the `__len__()` method.<br>
We will mainly have three methods in this `Dataset` class:
- `__init__(self, data, label)`: helps us pass in the feature and labels into the dataset
- `__len__(self)`:allows the dataset to know how many instances of data there is 
- `__getitem__(self, idx)`:allows the dataset to get items from the data and labels by indexing

In [None]:
class Custom_Dataset(Dataset):
    def __init__(self, features, labels):
        self.features = torch.tensor(features, dtype = torch.float32)
        self.labels = torch.tensor(labels, dtype  = torch.float32)

    def __len__(self):
        return self.features.shape[0]
    
    def __getitem__(self, idx):
        return self.features[idx], self.labels[idx]

After feature scaling, we initialize our custom datasets and put them into `Dataloader` constructor and our data is prepared. The next step will be modeling.

In [None]:
train_dataset = Custom_Dataset(X_train, y_train)
test_dataset = Custom_Dataset(X_test, y_test)
train_loader = DataLoader(train_dataset, batch_size = 32)
test_loader = DataLoader(test_dataset, batch_size = 128 )

Like we previously stated, there are two approaches of modeling.
- Subclassing `nn.Module` 
- Calling the `nn.Sequential()` 

`torch.nn.Sequential` is a simple function that accepts a list of `nn.Modules` and returns a model with all the sequential layers. We will be implementing these few layers:
1. nn.Linear(3,50)
2. nn.ReLU()
3. nn.Linear(50,25)
4. nn.ReLU()
5. nn.Linear(25,10)
6. nn.ReLU()
7. nn.Linear(10,1)

In [None]:
torch.manual_seed(123)
model_sequential = nn.Sequential(nn.Linear(n_features, 50),
                                 nn.ReLU(),
                                 nn.Linear(50, 25),
                                 nn.ReLU(),
                                 nn.Linear(25, 10),
                                 nn.ReLU(),
                                 nn.Linear(10, 1)
                                 )

For this regression probelm, the loss/criterion we will use is Mean-Squared-Error loss, which in PyTorch is `nn.MSELoss()`<br>
We will also choose to use `Adam` as our optimizer.<br> Remember, `torch.optim.*any_optimizer*` accepts `model.parameters()` to keep track of the model's parameters, hence we should always initialize our model first before our optimizer.

In [None]:
criterion = nn.MSELoss()
optimizer = torch.optim.Adam(model_sequential.parameters(), lr = 0.01)

Now that our modeling is done, let's commence our training with using the training loop that defined previously

We will build a wrapper function for our training called `train_model`. This wrapper function will take on parameters:
- model
- loader
- loss_function/criterion
- optimizer
- number_of_epochs (optional)
- iteration_check (optional): *if False is passed in, losses of each iteration per epoch will not be printed>*

Below will be an overall workings an explaination of our train_model function:
1. In each epoch, each minibatch starts with `optimizer.zero_grad()`. This is to clear previously computed gradients from previous minibatches.
2. We get the features and labels by indexing our minibatch.
3. Compute forward propagation by calling `model(features)` and assigning it to a variable `prediction`
4. Compute the loss by calling `criterion(prediction, torch.unsqueeze(labels, dim=1))`
    - the reason we unsqueeze is to make sure the shape of the labels are the same as the predictions, which is (batch_size,1) 
5. Compute backward propagation by calling `loss.backward()`
6. Update the parameters(learning rate etc.) of the model by calling `optimizer.step()`
7. Increment our `running_loss` with the loss of our current batch
8. At the end of each epoch, compute the accuracy by dividing the accumulated loss and the amount of data samples, and finally zero the `running_loss` for the next epoch.


In [None]:
def train_model(model, loader, criterion, optimizer,epochs=5000):
#   this running_loss will keep track of the losses of every epoch from each respective iteration
    running_loss = 0.0
    for epoch in range(1, epochs + 1):
        for i, data in enumerate(loader):
#           zero the parameter gradients
            optimizer.zero_grad()
            features, labels = data[0],data[1]
            prediction = model(features)
            loss = criterion(prediction, torch.unsqueeze(labels,dim=1))
            loss.backward()
            optimizer.step()
            running_loss += loss.item()
        if (epoch % 100 == 0 or epoch == 1):
            print(f"Epoch {epoch} Loss: {running_loss / len(loader)}")     
        running_loss = 0.0

In [None]:
torch.manual_seed(0)
train_model(model_sequential, train_loader, criterion, optimizer)

### 3.3.2 Inference

Now let's evaluate our model. Use `model.eval()` to set the model to inference mode

In [None]:
model_sequential.eval()

Let's say your house age is 10, distance to MRT is 100 meters, and there are 6 convenience stores around the unit, could you predict your house price? Let's use our trained model to find out

In [None]:
with torch.no_grad():
    inference = torch.tensor([[10, 100, 6]])
    inference = torch.from_numpy(scaler.transform(inference))
    predict = model_sequential.forward(inference.float())
        
print("The prediction for your house price is :", predict.item() * 1000)

# Exercise

In this exercise we will try to build a classifier for our MNIST Handwriting dataset.

Construct transform with the following transforms:
- coverting to tensor
- normalize the tensor with mean=0.15 and std=0.3081

In [None]:
transform = transforms.Compose()

Obtain the MNIST dataset from `torchvision.datasets`. Load them into respective `Dataloaders`

In [None]:
from torchvision.datasets import MNIST

train = MNIST("../data", )
test = MNIST("../data",  )

In [None]:
train_loader = 
test_loader = 

Declare `SummaryWriter` for TensorBoard

In [None]:
writer =

Create a Model with the following layers:
- 4 linear/dense layers
- First 3 with ReLU activation functions

*Note: Remember to resize the incoming tensor first*

In [None]:
class Model(nn.Module):
    def __init__(self):
 

    def forward(self, x):
        return 

Initialize the model and load it to our **GPU**.

In [None]:
model = Model()
if torch.cuda.is_available():
    

Initialize criterion: `CrossEntropyLoss` and optimizer `Adam`.

In [None]:
criterion = 
optimizer = 

Build a wrapper function `train_model` to train the model using `CUDA`. `add_scalar` which shows a loss against epoch graph on TensorBoard.<br>
Here is a checklist for you to keep check what to do:
1. For each iteration in each epoch, zero the gradients of the parameters
2. Forward propagate
3. Calculate loss
4. Write the loss and train to TensorBoard
5. Back propagate
6. Update the parameters
7. For each epoch, calculate the accuracy on our test set

In [None]:
def train_model(model, train_loader, test_loader, criterion, optimizer, epochs = 5):
    accuraccy_list = []
    for epoch in range(epochs):
        total = 0
        correct = 0
        for i, data in enumerate(train_loader):

            
            
            
        print(f'\nAccuracy of network in epoch {epoch + 1}: {100 * correct / total}')
    writer.flush()

train_model(model, train_loader, test_loader, criterion, optimizer)
writer.close()

In [None]:
total = 0
correct = 0
for data, labels in test_loader:
    data = data.to(torch.device("cuda:0"))
    with torch.no_grad():
        validation = model(data)
        _,prediction = torch.max(validation, 1)
        total += labels.size(0)
        correct += (prediction.cpu() == labels).sum().item()
    
print(f'Accuracy of the network:{100 * correct / total}')