<a href="https://colab.research.google.com/github/wingated/cs474_labs_f2019/blob/master/DL_Lab3.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Lab 3: Intro to CNNs and DNNs

## Objectives

* Build and train a deep conv net
* Explore and implement various initialization techniques
* Implement a parameterized module in Pytorch
* Use a principled loss function

## Video Tutorial
[https://youtu.be/3TAuTcx-VCc](https://youtu.be/3TAuTcx-VCc)

## Deliverable
For this lab, you will submit an ipython notebook via learningsuite.
This is where you build your first deep neural network!

For this lab, we'll be combining several different concepts that we've covered during class,
including new layer types, initialization strategies, and an understanding of convolutions.

## Grading Standards:
* 20% Part 0: Successfully followed lab video and typed in code
* 20% Part 1: Re-implement Conv2D and CrossEntropy loss function
* 20% Part 2: Implement different initialization strategies
* 10% Part 3: Print parameters, plot train/test accuracy
* 10% Reach 85% validation accuracy from parts 1-3
* 10% Part 4: Convolution parameters quiz
* 10% Tidy and legible figures, including labeled axes where appropriate
___

### Part 0
Watch and follow video tutorial:

[https://youtu.be/3TAuTcx-VCc](https://youtu.be/3TAuTcx-VCc)

**TODO:**

* Watch tutorial

**DONE:**

In [3]:
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
from torch.utils.data import Dataset, DataLoader
import numpy as np
import matplotlib.pyplot as plt
from torchvision import transforms, utils, datasets
from tqdm import tqdm
from torch.nn.parameter import Parameter
import pdb

assert torch.backends.mps.is_available(), "You need to request a GPU from Runtime > Change Runtime"

In [5]:
from torch.nn.parameter import Parameter
import pdb

In [10]:
# Use the dataset class you created in lab2
class FashionMNISTProcessedDataset(Dataset):
    def __init__(self, root, train=True):
        self.data = datasets.FashionMNIST(root, train=train, transform=transforms.ToTensor(), download=True)
    
    def __getitem__(self, i):
        x, y = self.data[i]
        return x, y

    def __len__(self):
        return(len(self.data))

___

### Part 1
Re-implement a Conv2D module with parameters and a CrossEntropy loss function.

**TODO:**

* CrossEntropyLoss 
* Conv2D

**DONE:**

___

### Part 2
Implement a few initialization strategies which can include Xe initialization
(sometimes called Xavier), Orthogonal initialization, and uniform random.
You can specify which strategy you want to use with a parameter. 



Helpful links include:
*  [Orthogonal Initialization](https://hjweide.github.io/orthogonal-initialization-in-convolutional-layers) (or the original paper: http://arxiv.org/abs/1312.6120)
*  http://andyljones.tumblr.com/post/110998971763/an-explanation-of-xavier-initialization

**TODO:**
* Parameterize custom Conv2D for different initilization strategies
* Xe
* Orthogonal
* Uniform

**DONE:**



In [87]:
a = torch.rand(10,10,2)

In [89]:
a.dtype

torch.float32

In [131]:
class CrossEntropyLoss(nn.Module):
  pass

class Conv2d(nn.Module):
  def __init__(self, in_channels, out_channels, kernel_size, stride=1, padding=0, dilation=1, groups=1, bias=True, padding_mode='zeros', device=None, dtype=None):
    self.__dict__.update(locals()) # keys are arguments we pass in! that way we can just use a dot for any of the argument inputs when we want to reference them, ie self.in_channels
    super(Conv2d, self).__init__()

    self.weight = Parameter(torch.Tensor(out_channels,
                               in_channels, *kernel_size))
    self.bias = Parameter(torch.Tensor(out_channels))

    self.weight.data.uniform_(-1,1)
    self.bias.data.uniform_(0,0)

    self.weight.data = self.weight.data.to("mps")
    self.bias.data = self.bias.data.to("mps")

  def forward(self, x):
    return F.conv2d(x, self.weight, self.bias, self.stride, self.padding, self.dilation, self.groups)

  # def extra_repr(self) -> str:
  #   return super().extra_repr()

Numpy will produce float64 by default, but we need float32, so to get around this, do something like this:
\begin{align*}
    \verb+torch.from_numpy(np.random.rand(10,10).astype(np.float32))+
\end{align*}
and this will make sure the datatypes are all the same.

In [132]:
class ConvNetwork(nn.Module):
  def __init__(self, dataset):
    super(ConvNetwork, self).__init__()
    x, y = dataset[0]
    c, h, w = x.size()
    output = 10

    self.net = nn.Sequential(
      Conv2d(c, 10, (3,3), padding=(1,1)),
      Conv2d(10, output, (28,28), padding=(0,0)) # 28 x 28 indicates the image size
    )

  def forward(self, x):
    return self.net(x).squeeze(2).squeeze(2)

In [134]:
# Initialize device 
device = "mps"
# Initialize Datasets
train_dataset = FashionMNISTProcessedDataset("/tmp/fashionmnist", train=True)
val_dataset = FashionMNISTProcessedDataset("tmp/fashionmnist",train=False)
# Initialize DataLoaders
train_loader = DataLoader(train_dataset, batch_size=42, pin_memory=True)
val_loader = DataLoader(val_dataset, batch_size=42)
# Initialize Model
model = ConvNetwork(train_dataset)
model = model.to(device)
# Initialize Objective and Optimizer and other parameters
objective = nn.CrossEntropyLoss() #TODO: create Cross entropy loss
optimizer = optim.Adam(model.parameters(), lr=1e-4)
# Initialize empty train and validation loss lists
train_loss = []
val_loss = []
# Number of epochs to run through
num_epochs = 100

In [135]:
# model.net[0].weight.data.uniform_(-1,1)
# model.net[0].weight.data

# o, i, k1, k2 = model.net[0].weight.size()
# W = np.random.randn(*model.net[0].weight.data.size())

# model.net[0].weight.data = torch.from_numpy(W)

In [136]:
# Run your training and validation loop and collect stats
loop = tqdm(total=len(train_loader) * num_epochs, position=0)
for epoch in range(num_epochs):
    batch = 0
    for x, y_truth in train_loader:
        x, y_truth = x.to(device), y_truth.to(device)

        optimizer.zero_grad()

        y_pred = model(x)
        loss = objective(y_pred, y_truth)

        if epoch % 2 == 0 and batch == 0:
            val_loss_list = []
            train_loss.append(loss.item())
            for val_x, val_y_truth in val_loader:
                val_x, val_y_truth = val_x.to(device), val_y_truth.to(device)
                val_y_pred = model(val_x)
                val_loss_list.append(objective(val_y_pred, val_y_truth).item())
            val_loss.append(np.mean(val_loss_list))
        
        loop.set_description("epoch no.:" + str(epoch) + " batch no.:" + str(batch) + " loss:" + str(loss.item()) + " val_loss:" + str(val_loss[-1]))

        loss.backward()
        optimizer.step()

        batch += 1

loop.close()


  0%|          | 0/142900 [02:16<?, ?it/s]
epoch no.:2 batch no.:428 loss:8.282930374145508 val_loss:4.594259302735828:   0%|          | 0/142900 [00:30<?, ?it/s]  

KeyboardInterrupt: 


___

### Part 3
Print the number of parameters in your network and plot accuracy of your training and validation 
set over time. You should experiment with some deep networks and see if you can get a network 
with close to 1,000,000 parameters.

Once you've experimented with multiple network setups and the different initialization strategies, plot the best-performing experiment here. You should be able to exceed 85% accuracy on the validation set.

**TODO:**
* Experiment with Deep Networks
* Plot accuracy of training and validation set over time
* Print out number of parameters in the model 
* Plot experiment results with 85% or better validation accuracy

**DONE:**


In [3]:

# Go back up and try a few different networks and initialization strategies
# Plot loss if you want
# Plot accuracy



In [2]:
# Compute and print the number of parameters in the model


___

### Part 4
Learn about how convolution layers affect the shape of outputs, and answer the following quiz questions. Include these in a new markdown cell in your jupyter notebook.


*Using a Kernel size of 3×3 what should the settings of your 2d convolution be that results in the following mappings (first answer given to you)*

* (c=3, h=10, w=10) ⇒ (c=10, h=8, w=8) : (out_channels=10, kernel_size=(3, 3), padding=(0, 0))
* (c=3, h=10, w=10) ⇒ (c=22, h=10, w=10) : **Your answer in bold here**
* (c=3, h=10, w=10) ⇒ (c=65, h=12, w=12) : **Your answer in bold here**
* (c=3, h=10, w=10) ⇒ (c=7, h=20, w=20) : **Your answer in bold here**

*Using a Kernel size of 5×5:*)

* (c=3, h=10, w=10) ⇒ (c=10, h=8, w=8) : (out_channels=10, kernel_size=(5, 5), padding=(1, 1))
* (c=3, h=10, w=10) ⇒ (c=100, h=10, w=10) : **Your answer in bold here**
* (c=3, h=10, w=10) ⇒ (c=23, h=12, w=12) : **Your answer in bold here**
* (c=3, h=10, w=10) ⇒ (c=5, h=24, w=24) : **Your answer in bold here**

*Using Kernel size of 5×3:*

* (c=3, h=10, w=10) ⇒ (c=10, h=8, w=8) : **Your answer in bold here**
* (c=3, h=10, w=10) ⇒ (c=100, h=10, w=10) : **Your answer in bold here**
* (c=3, h=10, w=10) ⇒ (c=23, h=12, w=12) : **Your answer in bold here**
* (c=3, h=10, w=10) ⇒ (c=5, h=24, w=24) : **Your answer in bold here**

*Determine the kernel that requires the smallest padding size to make the following mappings possible:*

* (c=3, h=10, w=10) ⇒ (c=10, h=9, w=7) : **Your answer in bold here**
* (c=3, h=10, w=10) ⇒ (c=22, h=10, w=10) : **Your answer in bold here**

**TODO:**

* Answer all the questions above 

**DONE:**


In [1]:
# Write some test code for checking the answers for these problems (example shown in the video)
