# Lab 2
Neural Networks for MNIST dataset

## Useful links:

Information on Pytorch layers: [link](https://pytorch.org/docs/stable/nn.html)

And more specifically:


*   Linear layers: [link](https://pytorch.org/docs/stable/nn.html#linear-layers)
*   Loss layers: [link](https://pytorch.org/docs/stable/nn.html#loss-functions)
*   Activation functions: [link](https://pytorch.org/docs/stable/nn.html#non-linear-activations-weighted-sum-nonlinearity)
*   Datasets and dataloaders: [link](https://pytorch.org/docs/stable/data.html)
*   Saving and loading models: [link](https://pytorch.org/tutorials/beginner/saving_loading_models.html)

TSNE visualization: [link](https://scikit-learn.org/stable/modules/generated/sklearn.manifold.TSNE.html)

#Information about network training:

### How to use loss function:
```
# define the loss function once (before training):
loss_function = nn.CrossEntropyLoss()

# to calculate our loss after the forward pass:
current_loss = loss_function(outputs, target)

# perform backward pass:
current_loss.backward()
```

### Network optimization (learning):
```
# define the optimizer once (before training):
optimizer = torch.optim.SGD(model.parameters(), lr=0.01) # optimizing parameters (weights) with learning rate 0.01

# before performing the backward pass clear the information about the gradients from the previous pass:
optimizer.zero_grad()

# after performing the backward pass
optimizer.step()
```

### Forward pass:
```
#if you have already defined a model, the only thing you have to do is:
outputs = model(inputs)
```

#Information about defining the network:

Every model will have similar structure:
```
class Net(nn.Module):

    def __init__(self):
        super(Net, self).__init__()
        # Define your layers here

    def forward(self, x):
        # Define the forward pass operations here
        return x
```

There are two distinc functions here, the ```__init__``` and ```forward```.

In the ```__init__``` function we will define layers that we will later on use in our forward pass.

In the ```forward``` function we will define step by step what should happen in our forward pass.

# Layers
### Linear
First layer we can use is a linear (or fully connected) layer. We can define it as:
```
# Linear layer with 5 inputs and 2 outputs (goes inside the __init__)
self.fc = torch.nn.Linear(in_features=5, out_features=2)

# moving data through the layer (goes into the forward function)
output = self.fc(input) 
```
### Activation
Activation functions don't have to be defined in the ```__init__``` function as long as they don't have any trainable parameters (and most of them don't have any).
```
# moving data through the layer with sigmoid (goes into the forward function)
output = F.sigmoid(self.fc(input)) 

# moving data through the layer with relu (goes into the forward function)
output = F.relu(self.fc(input)) 
```

### Reshape
Frequently you will have to reshape your input (from 2D to 1D for example).
```
# if input is of shape (N, 10, 10)
output = input.view(-1, 100)
# now output is of shape (N, 100)
```

### 2D Convolution
Definition of the convolution layer
```
# Convolution layer with 10 filters of size 3x3. The input has 5 channels.
self.conv = nn.Conv2d(5, 10, kernel_size=3)

# Forward pass:
output = self.conv(input)
```

### 2D max pooling
```
# Performing a 2D max pooling operation with a kernel of size 2x2
output = F.max_pool2d(self.conv(input), 2)
```

### Dropout
```
# 1D dropout performed on the output of a linear layer
output = F.dropout(self.linear(input))
```

# Data transformations
```
# simplest transformation (transforming image to PyTorch tensor):
transform = transforms.ToTensor()

# you can add more transformations (after you converted image to tensor). Simplest would be normalization (for 1 channel data (grayscale)):
transform = transforms.Compose([transforms.ToTensor(), transforms.Normalize((mean,), (std,))])
```

# Saving and loading models
Keeping progress of our training is very important. Being able to save and load our previous models will become very helpful.

Working on entire model:
```
PATH = "./mnist_model.pt"
# saving entire model:
torch.save(model, PATH)
# loading entire model:
model = torch.load(PATH)
```

Saving more details. Useful when stopping and resuming training.
```
PATH = "./mnist_model.pt"
# saving more detailed information:
torch.save({
            'epoch': epoch,
            'model_state_dict': model.state_dict(),
            'optimizer_state_dict': optimizer.state_dict(),
            'loss': loss,
            ...
            }, PATH)
# loading more detailed information:
model = Net() # initialize the object first
optimizer = torch.optim.SGD(model.parameters(), lr=0.01) # initialize the object first

checkpoint = torch.load(PATH)
model.load_state_dict(checkpoint['model_state_dict'])
optimizer.load_state_dict(checkpoint['optimizer_state_dict'])
epoch = checkpoint['epoch']
loss = checkpoint['loss']
```


# INSTRUCTIONS

Mount your Google Drive

In [None]:
from google.colab import drive
drive.mount('/content/gdrive')

Create a folder in your Google Drive named "data".

You can do it either manually or as command line:
```
%cd /content/gdrive/My\ Drive/
%mkdir data
```

In [None]:
# general path:
data_path = "/content/gdrive/My\ Drive/data/"

Move to that folder:

In [None]:
# go to the folder:
%cd /content/gdrive/My\ Drive/data/
# print out the content of the folder:
%ls

Imports:

In [None]:
import torch
import torch.nn as nn
import torch.nn.functional as F
from torch.utils.data import TensorDataset, DataLoader
import torchvision.transforms as transforms
from torchvision import datasets
import numpy as np
import matplotlib.pyplot as plt

Load the dataset

In [None]:
######## Read MNIST ########
# number of subprocesses to use for data loading
num_workers = 0 # means to use all
# how many samples per batch to load
batch_size = 64
# where the dataset is:
dataset_path = "./MNIST"

# convert data to torch.FloatTensor
transform = transforms.ToTensor()

# create training and test datasets
train_data = datasets.MNIST(root=dataset_path, train=True, download=True, transform=transform)
test_data = datasets.MNIST(root=dataset_path, train=False, download=True, transform=transform)

# create data loaders
train_loader = torch.utils.data.DataLoader(train_data, batch_size=batch_size, num_workers=num_workers, shuffle=True)
test_loader = torch.utils.data.DataLoader(test_data, batch_size=batch_size, num_workers=num_workers, shuffle=False)

Visualize the images

In [None]:
import matplotlib.pyplot as plt
%matplotlib inline
    
# obtain one batch of training images
dataiter = iter(train_loader)
images, labels = dataiter.next()
images = images.numpy()

# plot the images in the batch, along with the corresponding labels
fig = plt.figure(figsize=(25, 4))
for idx in np.arange(20):
    ax = fig.add_subplot(2, 20/2, idx+1, xticks=[], yticks=[])
    ax.imshow(np.squeeze(images[idx]), cmap='gray')
    # print out the correct label for each image
    # .item() gets the value contained in a Tensor
    ax.set_title(str(labels[idx].item()))

Define the network:

Have only a single linear layer (fully connected one). No activation functions.

Remember to reshape the input in your forwards pass (from Nx28x28 to Nx784) before feeding it to the linear layer.
N is the batch size.

In [None]:
class Net(nn.Module):

    def __init__(self):
        super(Net, self).__init__()
        # Define your layers here

    def forward(self, x):
        # Define the forward pass operations here
        return x

Initialize the network:

In [None]:
model = Net()
print(model)

Net()


Specify loss and optimization functions

In [None]:
# specify loss function
criterion = None # modify that

# specify optimizer
optimizer = None # modify that

Training the network.

We will iterate through our dataset. For evey iteration we need to:


In [None]:
# number of epochs to train the model
n_epochs = 10

model.train() # prep model for training

for epoch in range(n_epochs):
    # monitor training loss
    train_loss = 0.0
    
    ###################
    # train the model #
    ###################
    for data, target in train_dataloader:
        
        # clear the gradients of all optimized variables
        
        # forward pass: compute predicted outputs by passing inputs to the model
        
        # calculate the loss
        loss = 0 # change this
        
        # backward pass: compute gradient of the loss with respect to model parameters
        
        # perform a single optimization step (parameter update)

        # if you have a learning rate scheduler - perform a its step in here
        
        # update running training loss
        train_loss += loss.item()*data.size(0)
        
    # print training statistics 
    # calculate average loss over an epoch
    train_loss = train_loss/len(train_dataloader.dataset)

    print('Epoch: {} \tTraining Loss: {:.6f}'.format(epoch+1, train_loss))

Test the network:

In [None]:
correct = 0
total = 0
model.eval()  # prep model for testing

with torch.no_grad():
    for data, target in test_dataloader:
        outputs = model(data)
        _, predicted = torch.max(outputs.data, 1)
        total += target.size(0)
        correct += (predicted == target).sum().item()

print('Accuracy of the network on the test set: %d %%' % (100 * correct / total))

Now modify the code above to perform the test pass after every training epoch.

In two separate arrays keep track of the training loss and the testing accuracy after every epoch.

Plot them using the code below:

In [None]:
# Plotting:
x_range = np.arange(1, n_epochs+1)
fig, axs = plt.subplots(2)
axs[0].plot(x_range, train_loss_progress, c='b', label="Train loss")
axs[1].plot(x_range, test_accuracy_progress, c='r', label="Test accuracy")
axs[0].legend()
axs[1].legend()
plt.show()

# Further tasks:
Perform following modifications (one at a time):

1.   Change the model to be used on GPU. Compare the runtime.
2.   Modify the network to have 3 linear layers. Choose their dimensions.
3.   Compare using sigmoid vs relu activation functions. The last layer does not require activation function.
4.   Check the impact of the batch size. Use batch sizes 4, 16, and 128.
5.   For 3 layer network with relu activations:
    *   Train for 25 and 50 epochs.
    *   For 25 epoch training - add a learning rate scheduler to your training procedure. Compare the impact it has on the performance. Use a step scheduler that will decrease the learning rate by 10 after every 10 epochs.
    *   On top of that add momentum=0.9 to the optimizer

In your research report - provide results and analysis of the conducted experiments.


#Using a CNN:
Substitute your current network with a Convolutional Neural Network with following layers:

*   2D convolution with 10 kernels of size 5
*   2D convolution with 20 kernels of size 5
*   fully connected (linear) layer with output size 50
*   final fully connected layer with output size 10

Modify the transforms for the dataset to include a normalization of data. Numbers for MNIST are: mean is 0.1307, std is 0.3081.

Perform following modifications (one at a time) to the forward pass:
1.   Use relu activation functions for both conv layers and for the first linear layer
2.   Add a dropout layer between the two linear layers
3.   Add 2D max pooling layer before performing a relu activation on both conv layers

Note the differences in performances when modifying the network.

In your research report - provide results and analysis of the conducted experiments.

