In [1]:
# Initialize Otter
import otter
grader = otter.Notebook("ps11.ipynb")

# Problem Set 11: Introduction to Pytorch
For the final problem set of the semester, you will learn to use the deep learning framework PyTorch. Instructions for installing it are located [here](https://pytorch.org/get-started/locally/). You will also need to install the `torchvision` module using pip.

A few notes on this problem set:
- Neural networks can potentially take a very long time to train. You are responsible for ensuring that your uploaded solution runs without timing out on the autograder. We have verified that it is possible to do this and receive full credit.
- Questions 1c and 2b are worth three points. You get one point if your network has >70% test accuracy; 2 points for >75%; and 3 points for >80%.

In [2]:
import torch
from torch import nn
from torch.nn import functional as F
from torch.utils.data import Dataset, DataLoader
from torchvision import datasets
from torchvision.transforms import ToTensor
from tqdm import tqdm
import matplotlib.pyplot as plt
rng_seed = 507
torch.manual_seed(rng_seed)

<torch._C.Generator at 0x7fe660c460f0>

We'll be using the Fashion MNIST dataset, which consists of 28x28 images that could be 10 different articles of clothing.

In [3]:
training_data = datasets.FashionMNIST(
    root="data",
    train=True,
    download=True,
    transform=ToTensor()
)

Run this cell to view a random sample from the training dataset.

In [4]:
labels_map = {
    0: "T-Shirt",
    1: "Trouser",
    2: "Pullover",
    3: "Dress",
    4: "Coat",
    5: "Sandal",
    6: "Shirt",
    7: "Sneaker",
    8: "Bag",
    9: "Ankle Boot",
}
figure = plt.figure(figsize=(8, 8))
cols, rows = 3, 3
# for i in range(1, cols * rows + 1):
#     sample_idx = torch.randint(len(training_data), size=(1,)).item()
#     img, label = training_data[sample_idx]
#     figure.add_subplot(rows, cols, i)
#     plt.title(labels_map[label])
#     plt.axis("off")
#     plt.imshow(255 - img.squeeze(), cmap="gray")
# plt.show()

<Figure size 576x576 with 0 Axes>

Here are some helper functions used throughout this assignment.

In [5]:
def train_loop(model, transform_fn, loss_fn, optimizer, dataloader, num_epochs):
    tbar = tqdm(range(num_epochs))
    for _ in tbar:
        loss_total = 0.
        for i, (x, y) in enumerate(dataloader):
            x = transform_fn(x)
            pred = model(x)
            loss = loss_fn(pred, y.squeeze(-1))
            ## Parameter updates
            model.zero_grad()
            loss.backward()
            optimizer.step()

            loss_total += loss.item()
        tbar.set_description(f"Train loss: {loss_total/len(dataloader)}")
        
    return loss_total/len(dataloader)

In [6]:
def calculate_test_accuracy(model, transform_fn, test_dataloader):
    y_true = []
    y_pred = []
    tf = nn.Flatten()
    for (xi, yi) in test_dataloader:
        xi = transform_fn(xi)
        pred = model(xi)
        yi_pred = pred.argmax(-1)
        y_true.append(yi)
        y_pred.append(yi_pred)
    y_true = torch.cat(y_true, dim = 0)
    y_pred = torch.cat(y_pred, dim = 0)

    accuracy = (y_true == y_pred).float().mean()
    return accuracy

# Question 1: MLP

**1(a)** On PS10 you implemented a multilayer-perceptron (MLP) using JAX. Now you will implement it using PyTorch, and train it to classify images.

Recall that an MLP consists of an input layer, an activation function, and another output layer. Write a class called `MultiClassMLP` that subclasses `nn.Module`. This module contains one attribute, `net`, which is an nn.Sequential object that is called on the `.forward(x)` method. 
Your task is to write the `__init__()` method to correctly construct `net`. 

For example, if `num_features=784, num_hidden=256, num_classes=10`:

```
>>> mlp = MultiClassMLP(28**2, 256, 10)
>>> mlp.net

Sequential(
  (0): Linear(in_features=784, out_features=256, bias=True)
  (1): Sigmoid()
  (2): Linear(in_features=256, out_features=10, bias=True)
  (3): LogSoftmax(dim=-1)
)
```

In [7]:
class MultiClassMLP(nn.Module):
    def __init__(self, num_features, num_hidden, num_classes):
        """
        Arguments:
            num_features: The number of features in the input.
            num_hidden: Number of hidden features in the hidden layer:
            num_classes: Number of possible classes in the output
        """
        super().__init__()
        self.net = nn.Sequential(
                nn.Linear(num_features, num_hidden,bias=True),
                nn.Sigmoid(),
                nn.Linear(num_hidden, num_classes,bias=True),
                nn.LogSoftmax(dim=-1)
                )
        
    def forward(self, x):
        return self.net(x)

In [8]:
mlp = MultiClassMLP(28**2, 256, 10)
isinstance(mlp, nn.Module)
isinstance(mlp.net, nn.Sequential)

True

In [9]:
grader.check("q1a")

<!-- BEGIN QUESTION -->

**1(b)** Construct a `DataLoader` object of the Fashion MNIST training dataset.

In [11]:
train_dataloader = torch.utils.data.DataLoader(
                                training_data, batch_size=128
                                )

In [12]:
train_dataloader

<torch.utils.data.dataloader.DataLoader at 0x7fe640ce2f10>

<!-- END QUESTION -->

**1(c)** Initialize a `MultiClassMLP` object called `mlp` and train it using the `train_loop()` function given at the beginning of the assignment (do not modify the `train_loop()` function). We will test your trained `mlp` object on unseen test data.

Hints:
-  You need to initialize a `torch.optim.Optimizer` object for gradient descent. The standard choice is `torch.optim.Adam` with a learning rate `1e-3`.
-  You need to flatten the Fashion MNIST dataset to use within the `MultiClassMLP`. This should be done with the `transform_fn` argument to `train_loop`. Try `nn.Flatten()`.
-  The output of `MultiClassMLP` are the log probabilities of each class. To test the accuracy of your model, you should use the negative log-likelihood loss, `nn.NLLLoss()`, as loss function.

In [13]:
mlp = MultiClassMLP(784,256,10)
mlp_optimizer = torch.optim.Adam(mlp.parameters(),lr=0.003) 
train_loop(model=mlp, transform_fn=nn.Flatten(), loss_fn=nn.NLLLoss(), optimizer=mlp_optimizer, 
           dataloader=train_dataloader, num_epochs=30)
       

Train loss: 0.1183923603549822: 100%|██████████| 30/30 [02:15<00:00,  4.50s/it] 


0.1183923603549822

In [14]:
print(mlp)

MultiClassMLP(
  (net): Sequential(
    (0): Linear(in_features=784, out_features=256, bias=True)
    (1): Sigmoid()
    (2): Linear(in_features=256, out_features=10, bias=True)
    (3): LogSoftmax(dim=-1)
  )
)


In [15]:
test_data = datasets.FashionMNIST(root="data", train=False, download=True, transform=ToTensor())
logistic_test_dataloader = DataLoader(test_data, batch_size=1000, shuffle=True, num_workers=0)
accuracy = calculate_test_accuracy(mlp, nn.Flatten(), logistic_test_dataloader)
accuracy >0.75

tensor(True)

In [16]:
grader.check("q1c")

# Question 2: ConvNets

**2(a)** Convolutional Neural Networks (CNNs) are neural networks that take advantage of spatial structure in input such as images. This often leads to better efficiency than MLPs.

Write a class called `MultiClassConvNet`, which adds convolutional layers to the MLP in Problem 1. Just like `MultiClassMLP`, your class should have a single attribute called `net` of type `nn.Sequential` that is called in the forward method.
`convnet.net` should have the following structure:
```
>>> convnet = MultiClassConvNet(
    side_length=28,
    conv_channels_1=64,
    conv_channels_2=32,
    linear_hidden=256,
    num_classes=10
)
>>> convnet.net
Sequential(
  (0): Conv2d(...)
  (1): MaxPool2d(...)
  (2): ReLU()
  (3): Conv2d(...)
  (4): MaxPool2d(...)
  (5): ReLU()
  (6): Flatten(...)
  (7): Linear(...)
  (8): ReLU()
  (9): Linear(..., out_features=10)
  (10): LogSoftmax(dim=-1)
)
```

There are various parameters that must be supplied to each layer. Your job is to experiment with them and understand how they affect classification accuracy. 

Hint: To calculate the size of `in_features` for the first `Linear` layer, you need to keep track of how each `Conv2d` and `MaxPool2d` change the image dimensions. We provide the function `conv_out_size` to help with this.

In [17]:
def conv_out_size(slen, kernel_size, stride):
    return int((slen - kernel_size) / stride + 1)

class MultiClassConvNet(torch.nn.Module):
    def __init__(self, side_length, conv_channels_1, conv_channels_2, linear_hidden, num_classes):
        """
        Arguments:
            side_length: Side-length of input images (assumed to be square)
            conv_channels_1: Number of channels output from first conv layer
            conv_channels_2: Number of channels output from second conv layer
            linear_hidden: Number of hidden units in linear layer
            num_classes: Number of classes in output
        """
        super().__init__()
        self.net=nn.Sequential(
                nn.Conv2d(1, conv_channels_1,kernel_size=3,stride=1,padding=0),
                nn.MaxPool2d(kernel_size=2,stride=2,padding=0),
                nn.ReLU(),
                nn.Conv2d(conv_channels_1,conv_channels_2,kernel_size=4,stride=1,padding=0),
                nn.MaxPool2d(kernel_size=2,stride=2,padding=0),
                nn.ReLU(),
                nn.Flatten(),
                nn.Linear(conv_channels_2*5*5,linear_hidden),
                nn.ReLU(),
                nn.Linear(linear_hidden, num_classes),
                nn.LogSoftmax(dim=-1)
                )
        
    def forward(self, x):
        return self.net(x)

In [18]:
convnet = MultiClassConvNet(side_length=28, conv_channels_1=64, conv_channels_2=32, linear_hidden=256, num_classes=10)

In [19]:
grader.check("q2a")

**2(b)** Initialize a `MultiClassConvNet` object called `convnet` and train it using the `train_loop` function as in Problem 1.

In [20]:
train_dataloader = torch.utils.data.DataLoader(
                                training_data, batch_size=30
                                )

<!-- BEGIN QUESTION -->



In [21]:
convnet = MultiClassConvNet(side_length=28, conv_channels_1=64, conv_channels_2=32, linear_hidden=256, num_classes=10)

convnet_optimizer = torch.optim.Adam(convnet.parameters(),lr=0.003) 

train_loop(model=convnet, transform_fn=torch.tensor, loss_fn=nn.NLLLoss(), optimizer=convnet_optimizer, 
           dataloader=train_dataloader, num_epochs=5)

  x = transform_fn(x)
Train loss: 0.2158765374980867: 100%|██████████| 5/5 [05:35<00:00, 67.03s/it] 


0.2158765374980867

In [22]:
grader.check("q2b")

<!-- END QUESTION -->



---

To double-check your work, the cell below will rerun all of the autograder tests.

In [23]:
grader.check_all()

q1a results: All test cases passed!

q1c results: All test cases passed!

q2a results: All test cases passed!

q2b results: All test cases passed!

## Submission

Make sure you have run all cells in your notebook in order before running the cell below, so that all images/graphs appear in the output. The cell below will generate a zip file for you to submit. **Please save before exporting!**

Upload this .zip file to Gradescope for grading.

In [25]:
# Save your notebook first, then run this cell to export your submission.
grader.export(pdf=False)