## Lab Assignment: MNIST Classification Task

Design your own MNIST Classification model (see video recording for explanation of MNIST dataset). You may choose your own hyperparameters, including:
- Number of layers
- Number of neurons in each layer
- Learning rate
- Number of training epochs
- Optimizer

Using a fully-connected network, you should be able to accomplish >90% accuracy on the test set. Please report your hyperparameter selections and accuracy in a summary at the end of the notebook.

To load the MNIST dataset, we will use `torchvision`, which contains the datasets and has useful transformations. Start by defining the batch size you want for your training and test sets

In [1]:
import torch
import numpy as np

  from .autonotebook import tqdm as notebook_tqdm


In [2]:
import torchvision
train_batch_size = 100 #Define train batch size
test_batch_size  = 80 #Define test batch size (can be larger than train batch size)


# Use the following code to load and normalize the dataset
train_loader = torch.utils.data.DataLoader(
  torchvision.datasets.MNIST('./files/', train=True, download=True,
                             transform=torchvision.transforms.Compose([
                               torchvision.transforms.ToTensor(),
                               torchvision.transforms.Normalize(
                                 (0.1307,), (0.3081,)), torchvision.transforms.Lambda(lambda x: torch.flatten(x))
                             ])),
  batch_size=train_batch_size, shuffle=True)

test_loader = torch.utils.data.DataLoader(
  torchvision.datasets.MNIST('./files/', train=False, download=True,
                             transform=torchvision.transforms.Compose([
                               torchvision.transforms.ToTensor(),
                               torchvision.transforms.Normalize(
                                 (0.1307,), (0.3081,)), torchvision.transforms.Lambda(lambda x: torch.flatten(x))
                             ])),
  batch_size=test_batch_size, shuffle=True)

In [3]:
from torch import nn
#Define your network:
class Network(nn.Module):
  def __init__(self): #Can provide additional inputs for initialization
    #Define the network layer(s) and activation function(s)
    super(Network, self).__init__()
    self.layer1 = nn.Linear(784, 256)
    self.act1 = nn.ReLU()
    self.layer2 = nn.Linear(256, 64)
    self.act2 = nn.ReLU()
    self.layer3 = nn.Linear(64, 10)
     

  def forward(self, input):
    #How does your model process the input?
    x = self.act1(self.layer1(input))
    x = self.act2(self.layer2(x))
    output = self.layer3(x)
  
    return output

In [4]:
device = torch.device('cpu')
print(device)


cpu


In [5]:
#Define your optimizer
model = Network()

optimizer = torch.optim.Adam(model.parameters(), lr=0.001)
epochs = 5
criterion = nn.CrossEntropyLoss()
train_accuracy = []
train_loss = 0
for epoch in range(epochs):
  train_sample_counts = 0
  epoch_corrects = 0 
  train_loss = 0
  for x_train, y_train in train_loader:
    #Calculate training loss on model
    x_train, y_train = x_train, y_train
    y_pred = model(x_train)
    loss = criterion(y_pred, y_train)
    optimizer.zero_grad() # resest the gradients 
    loss.backward() # compute backpropagation
    optimizer.step() # perform parameter update
    # compute model metrics
    predictions = torch.max(y_pred, 1)[1]
    epoch_corrects += (predictions == y_train).sum()
    train_sample_counts += len(y_train)
    train_loss += loss.item() * x_train.size(0)
    # print(epoch)
    
  epoch_accuracy = epoch_corrects/train_sample_counts
  train_accuracy.append(epoch_accuracy)
  train_loss_epoch = train_loss/train_sample_counts
  print()
  print(f'Epoch: {epoch} \tTraining Loss: {train_loss_epoch:.6f} \tTraining Accuracy:  {epoch_accuracy:.6f}')
#Calculate loss on test set



Epoch: 0 	Training Loss: 0.268843 	Training Accuracy:  0.921317

Epoch: 1 	Training Loss: 0.104336 	Training Accuracy:  0.968000

Epoch: 2 	Training Loss: 0.070362 	Training Accuracy:  0.977767

Epoch: 3 	Training Loss: 0.050124 	Training Accuracy:  0.984400

Epoch: 4 	Training Loss: 0.039948 	Training Accuracy:  0.987050


In [6]:
epoch_corrects = 0
test_sample_counts = 0
epoch_accuracy = 0
for x_test, y_test in test_loader:    
    #Calculate training loss on model
    y_pred = model(x_test)
    loss = criterion(y_pred, y_test)
    predictions = torch.max(y_pred, 1)[1]
    epoch_corrects += (predictions == y_test).sum()
    test_sample_counts += len(y_test)

    # print(epoch)
epoch_accuracy = epoch_corrects/test_sample_counts
print(f'Test Accuracy:  {epoch_accuracy:.6f}')
# train_loss_epoch = test_loss/train_sample_counts
 

Test Accuracy:  0.975500


### Model Summary

#### A. Which hyperparameters did you vary to achieve optimal performance? Please describe how you selected (random sampling? grid search? etc.) the hyperparameters you varied. 
I changed Adam optimizer's learning rate. Furthermore, I varied the fully connected neural netowrk's number of layers. I tested sigmoid and ReLU activation functions too.

#### B. For the model that achieved the best performance, what values did you use for:
1. Number of layers <br />
I designed a 3-layers neural network. <br />
2. Neurons in each layer <br />
Hidden Layer 1: 256 <br />
Higgen Layer 2: 64 <br />
Output Layer: 10 <br />

3. Activation function <br />
I used ReLU activation function in all layers. <br />
4. Learning Rate <br />
I used Adam optimizer with learning rate = 1e-3 <br />
5. Number of Training Epochs <br />
I trained the network for 5 epochs <br />
6. Other hyperparameters/layers (dropout, regularization, etc., if applicable) <br />
I did not have any other hyperparameter. 

#### C. What was the accuracy on the test set for your best-performing network? <br />
Best Test Accuracy = 0.9755 <br />


