# Homework 3: Intro to Pytorch

In this homework, you will create and train a simple network with the [Fashion-MNIST dataset](https://github.com/zalandoresearch/fashion-mnist), a drop-in replacement for the MNIST dataset. Fashion-MNIST is a set of 28x28 greyscale images of clothes, consisting of a training set of 60,000 examples and a test set of 10,000 examples. 

<img src='fashion-mnist-sprite.png' width=500px>

It's important for you to write the code yourself and get it to work. If you are using any online resources for your work, please cite them to get full points. Feel free to consult the practical_04 for coding.

In [None]:
import torch
import torchvision
from torchvision import datasets, transforms
import matplotlib.pyplot as plt
import numpy as np

# Defining Hyperparameters of the model
Before we proceed, let's define some hyperparameters of the model. Hyperparameters for neural networks are those parameters that are set before the learning process (or training) begins and are not learned from the training data. Usually those are set by the developer and have a crucial role in the training process. For this notebook, we will be defining some hyperparameters throughout. They are:
* `epoch`: An epoch is a complete pass through the entire training dataset. During one epoch, the model processes all the training samples. 
* `batch_size`: Batch size is the number of data points used to train a model in each `iteration`[1]. Typical small batches are 32, 64, 128, 256, 512, while large batches can be thousands of examples. Batch size determines the frequency of updates. The smaller the batches, the more, and the quicker, the updates. The larger the batch size, the more accurate the gradient of the cost will be with respect to the parameters.

**Note[1]**: While it may sound confusing, an `iteration` here refers to the number of batches required to complete one epoch. Therefore, the total number of iterations per epoch will be `total # of samples / batch_size` 

In [None]:
# Defining hyperparameters: batch_size
batch_size = 4

# Define a transform to normalize the data
transform = transforms.Compose([transforms.ToTensor(),
                                transforms.Normalize((0.5,), (0.5,))])
# Download and load the training data
trainset = datasets.FashionMNIST('./data', download=True, train=True, transform=transform)
trainloader = torch.utils.data.DataLoader(trainset, batch_size=batch_size, shuffle=True)

# Download and load the test data
testset = datasets.FashionMNIST('./data', download=True, train=False, transform=transform)
testloader = torch.utils.data.DataLoader(testset, batch_size=batch_size, shuffle=True)

Here we can see few of the images.

In [None]:
def imshow(img):
    img = img / 2 + 0.5     # unnormalize
    npimg = img.numpy()
    plt.imshow(np.transpose(npimg, (1, 2, 0)))
    plt.grid(False)
    plt.show()

classes = ('T-shirt/top', 'Trouser', 'Pullover', 'Dress',
           'Coat', 'Sandal', 'Shirt', 'Sneaker', 'Bag', 'Ankle boot')

# get some random training images
dataiter = iter(trainloader)
images, labels = next(dataiter)

# show images
imshow(torchvision.utils.make_grid(images))
# print labels
print(' '.join(f'{classes[labels[j]]:5s}' for j in range(batch_size)))


# Building a simple fully connected network - (Must be completed by students)

Here you should define your network with only linear or fully connected layers. Each image is 28x28 which is a total of 784 pixels, and there are 10 classes. You should include at least **two** hidden layer. You may use ReLU activations for the layers and return the logits or log-softmax from the forward pass. It's up to you how many layers you add and the size (or number of neurons) of those layers. Make sure that you import necessary modules.

In [None]:
# TODO: Define your network architecture here


# Train the network - (Must be completed by students)

Now you should create your network and train it. First you'll want to define the loss function, also called [the criterion](http://pytorch.org/docs/master/nn.html#loss-functions) ( something like `nn.CrossEntropyLoss`) and [the optimizer](http://pytorch.org/docs/master/optim.html) (typically `optim.SGD` or `optim.Adam`).

Then write the training code, preferably as a function. This is because we will be using the same training function to train different models. Remember the training pass is a fairly straightforward process:

* Make a forward pass through the network to get the logits. 
* Use the logits to calculate the loss
* Perform a backward pass through the network with `loss.backward()` to calculate the gradients.
* Take a step with the optimizer to update the weights

By adjusting the hyperparameters (hidden units or neurons, learning rate, batch_size and epochs etc), you should be able to get the training loss below 0.4.

In [None]:
# TODO: Create the network, define the criterion and optimizer


In [None]:
# TODO: Train the network here



Save this model by creating a function for saving model weights. Remeber to specify path where the model would be saved.

In [None]:
# TODO: Save the model weights here

The next few lines of codes loads some test images and displays the labels corresponding to those images. Load your saved model and show the predicted labels.

In [None]:
dataiter = iter(testloader)
testimages, testlabels = next(dataiter)

# print images
imshow(torchvision.utils.make_grid(images))
print('GroundTruth: ', ' '.join(f'{classes[labels[j]]:5s}' for j in range(4)))

In [None]:
# TODO: Check the predicted labels for the testimages

## Create a CNN model - (Must be completed by students). 

In this part, create a simple CNN using convolution, max or average pooling and fully connected layers. If you are following some online resources, cite those resources. At every step, provide details about the input channels, output channels and the shape of the input to be considered for full points. failure to provide such details would be penalized.

In [None]:
# TODO: Define your network architecture here

# Train your network here and save your model weights 

Use the function you defined for training, saving weights previously for the new model

In [None]:
# TODO: Create the network, define the criterion and optimizer

In [None]:
# TODO: Call the training function

In [None]:
# TODO: Save your model weights

# Test your model on the same testimages

In [None]:
# TODO: Check the predicted labels for the testimages

# Answer the following questions:
* Which of the two models perform better? Why?
* Does changing any hyperparameters (batch_size, learning rate, number of layers, neurons per layer) improve the model performance? why?