<a href="https://colab.research.google.com/github/wingated/cs474_labs_f2019/blob/master/DL_Lab3.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Lab 3: Intro to CNNs and DNNs

## Objectives

* Build and train a deep conv net
* Explore and implement various initialization techniques
* Implement a parameterized module in Pytorch
* Use a principled loss function

## Video Tutorial
[https://youtu.be/3TAuTcx-VCc](https://youtu.be/3TAuTcx-VCc)

## Deliverable
For this lab, you will submit an ipython notebook via learningsuite.
This is where you build your first deep neural network!

For this lab, we'll be combining several different concepts that we've covered during class,
including new layer types, initialization strategies, and an understanding of convolutions.

## Grading Standards:
* 20% Part 0: Successfully followed lab video and typed in code
* 20% Part 1: Re-implement Conv2D and CrossEntropy loss function
* 20% Part 2: Implement different initialization strategies
* 10% Part 3: Print parameters, plot train/test accuracy
* 10% Reach 85% validation accuracy from parts 1-3
* 10% Part 4: Convolution parameters quiz
* 10% Tidy and legible figures, including labeled axes where appropriate
___

### Part 0
Watch and follow video tutorial:

[https://youtu.be/3TAuTcx-VCc](https://youtu.be/3TAuTcx-VCc)

**TODO:**

* Watch tutorial

**DONE:**

In [1]:
!pip3 install torch
!pip3 install torchvision
!pip3 install tqdm



You should consider upgrading via the 'c:\users\johnson\appdata\local\programs\python\python39\python.exe -m pip install --upgrade pip' command.




You should consider upgrading via the 'c:\users\johnson\appdata\local\programs\python\python39\python.exe -m pip install --upgrade pip' command.




You should consider upgrading via the 'c:\users\johnson\appdata\local\programs\python\python39\python.exe -m pip install --upgrade pip' command.


In [3]:
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
from torch.utils.data import Dataset, DataLoader
import numpy as np
import matplotlib.pyplot as plt
from torchvision import transforms, utils, datasets
from tqdm import tqdm
from torch.nn.parameter import Parameter
import pdb

assert torch.cuda.is_available(), "You need to request a GPU from Runtime > Change Runtime"

In [12]:
X = np.random.random((3*3*5, 7))
U, _, Vt = np.linalg.svd(X, full_matrices=False)

In [15]:
U.shape, Vt.shape

((45, 7), (7, 7))

In [18]:
U.T.dot(U).shape

(7, 7)

In [25]:
W=U.T.reshape(7,5,3,3)

In [28]:
(W[0,:]*W[2,:]).sum()

-7.632783294297951e-17

In [23]:
a=torch.tensor()
a.shape

torch.Size([3])

In [4]:
class LinearNetwork(nn.Module):
  def __init__(self,dataset, num_inner_neurons):
    super(LinearNetwork, self).__init__()
    x,y = dataset[0]
    c,h,w = x.size()
    out_dim = 10
    self.num_inner_neurons = num_inner_neurons

    self.net = nn.Sequential(nn.Linear(c*h*w, self.num_inner_neurons), nn.ReLU(), nn.Linear(self.num_inner_neurons, out_dim))

  def forward(self,x):
    n,c,h,w = x.size()
    flattened = x.view(n,c*h*w)
    return self.net(flattened)


class Conv2d(nn.Module):
  def __init__(self,in_channels, out_channels, kernel_size, stride=1, padding=0, dilation=1, groups=1, bias=True, padding_mode='zeros', device=None, dtype=None, init="xe"):
    self.__dict__.update(locals())
    super(Conv2d, self).__init__()
    self.weight = Parameter(torch.Tensor(out_channels, in_channels, kernel_size[0], kernel_size[1]))
    self.bias = Parameter(torch.Tensor(out_channels))
    if init=="uniform":
      self.weight.data._uniform(-1,1)
      self.bias.data[:] = 0
    elif init=="orth":
      X = np.random.random((kernel_size[0]*kernel_size[1]*in_channels, out_channels))
      U, _, Vt = np.linalg.svd(X, full_matrices=False)
      print(X.shape, U.shape)
      W = U.T.reshape(out_channels, in_channels, kernel_size[0], kernel_size[1])
      self.weight.data[:] = torch.tensor(W)
      self.bias.data[:] = 0
    elif init=="xe":
      var = 2/in_channels**2
      
      self.weight.data = torch.normal( )._normal(0,2/in_channels**2)
      self.bias.data[:] = 0
    else:
      raise ValueError("Unknown init type")


    


  def forward(self,x):
    return F.conv2d(x,self.weight, self.bias, self.stride, self.padding, self.dilation, self.groups)

  def extra_repr(self):
    return "?"


class ConvNetwork(nn.Module):
  def __init__(self, dataset,init="xe"):
    super(ConvNetwork,self).__init__()
    x,y = dataset[0]
    c,h,w = x.size()
    out_dim = 10

    self.nt = nn.Sequential( Conv2d(c,10,(3,3), padding=(1,1),init=init),
                             torch.nn.ReLU(),
                             Conv2d(10,out_dim,(28,28), padding=(0,0),init=init) )

  def forward(self,x):
    return self.net(x).squeeze(2).squeeze(2)


class ConvNetwork2(nn.Module):
  def __init__(self, dataset,init="xe"):
    super(ConvNetwork,self).__init__()
    x,y = dataset[0]
    c,h,w = x.size()
    out_dim = 10

    self.nt = nn.Sequential( Conv2d(c,10,(3,3), padding=(1,1),init=init),
                             torch.nn.ReLU(),
                             Conv2d(10,20,(3,3), padding=(1,1),init=init),
                             torch.nn.ReLU(),
                             Conv2d(20,out_dim,(28,28), padding=(0,0),init=init))
  def forward(self,x):
    return self.net(x).squeeze(2).squeeze(2)

class ConvNetwork3(nn.Module):
  def __init__(self, dataset,init="xe"):
    super(ConvNetwork,self).__init__()
    x,y = dataset[0]
    c,h,w = x.size()
    out_dim = 10

    self.nt = nn.Sequential( Conv2d(c,10,(3,3), padding=(1,1),init=init),
                             torch.nn.ReLU(),
                             Conv2d(10,10,(3,3), padding=(1,1),init=init),
                             torch.nn.MaxPool2d(2),
                             Conv2d(10,20,(3,3), padding=(1,1),init=init),
                             torch.nn.ReLU(),
                             Conv2d(20,out_dim,(14,14), padding=(0,0),init=init)
                              )

  def forward(self,x):
    return self.net(x).squeeze(2).squeeze(2)
  

    
class FMPDataset(Dataset):
  def __init__(self, root, train=True):
    self.data = datasets.FashionMNIST(root, train=train, transform= transforms.ToTensor(), download = True)

  def __getitem__(self,i):
    return self.data[i]

  def __len__(self):
    return len(self.data)

class CrossEntropyLoss(nn.Module):
  def __init__(self):
    super(CrossEntropyLoss, self).__init__()

  def forward(self, y_hat, y_truth):
    yhe = y_hat.exp()
    score = yhe/yhe.sum(dim=1, keepdim=True)
    score_of_correct = score[range(y_truth.size(0)), y_truth]
    return -torch.log(score_of_correct).mean()
    

In [6]:
train_dataset = FMPDataset("/tmp/fashionmnist", train=True)
val_dataset = FMPDataset("/tmp/fashionmnist", train=False)

bs = 42
#model = LinearNetwork(train_dataset,1000)
#model = model.cuda()
train_loader = DataLoader(train_dataset, batch_size = bs, pin_memory = True)
validation_loader = DataLoader(val_dataset, batch_size = bs)


#objective = torch.nn.CrossEntropyLoss()
myobjective = CrossEntropyLoss()


def do_it(model):
  model = model.cuda()
  optimizer= optim.SGD(model.parameters(), lr = 1e-4)
  losses = []
  validations = []

  num_epochs = 30
  loop =tqdm(total=len(train_loader)*num_epochs, position = 0)

  for epoch in range(num_epochs):
    for  batch, (x,y_truth) in enumerate(train_loader):
      x,y_truth = x.cuda(non_blocking=True), y_truth.cuda(non_blocking=True)

      optimizer.zero_grad()
      y_hat =  model(x)
      loss = myobjective(y_hat, y_truth)

      #assert loss - myobjective(y_hat, y_truth) < 1e-6, f"myloss {myobjective(y_hat, y_truth)} != loss {loss}"

      loss.backward()

      losses.append(loss.item())
      accuracy = 0

      #loop.set_description("batch:{} loss:{:.4f} val_loss:?".format(batch, loss.item()))
      loop.update(1)

      optimizer.step()

      if batch %1000 == 0:
        val = np.mean( [myobjective(model(x.cuda()), y.cuda()).item() for x,y in validation_loader])
        validations.append((len(losses),val))

      loop.set_description("batch:{} loss:{:.4f} val_loss:{:.4f}".format(batch, loss.item(), validations[-1][1]))

  loop.close()


In [31]:
do_it(LinearNetwork(train_dataset,1000))

batch:1428 loss:0.9209 val_loss:0.9089: 100%|██████████| 42870/42870 [07:28<00:00, 95.64it/s] 


In [7]:
do_it(ConvNetwork(train_dataset,init="orth"))

(9, 10) (9, 9)


ValueError: cannot reshape array of size 81 into shape (10,1,3,3)

In [8]:
do_it(ConvNetwork(train_dataset,init="xe"))

AttributeError: 'Tensor' object has no attribute '_normal'

In [None]:
do_it(ConvNetwork(train_dataset,init="uniform"))

___

### Part 1
Re-implement a Conv2D module with parameters and a CrossEntropy loss function.

**TODO:**

* CrossEntropyLoss 
* Conv2D

**DONE:**

___

### Part 2
Implement a few initialization strategies which can include Xe initialization
(sometimes called Xavier), Orthogonal initialization, and uniform random.
You can specify which strategy you want to use with a parameter. 



Helpful links include:
*  [Orthogonal Initialization](https://hjweide.github.io/orthogonal-initialization-in-convolutional-layers) (or the original paper: http://arxiv.org/abs/1312.6120)
*  http://andyljones.tumblr.com/post/110998971763/an-explanation-of-xavier-initialization

**TODO:**
* Parameterize custom Conv2D for different initilization strategies
* Xe
* Orthogonal
* Uniform

**DONE:**



In [7]:
y_hat.size()

torch.Size([24, 10])

In [7]:
y = torch.rand(3,4)

In [9]:
y.exp().log()-y

tensor([[ 0.0000e+00,  4.4238e-08, -5.5879e-08,  0.0000e+00],
        [-4.8429e-08,  1.4901e-08,  0.0000e+00,  0.0000e+00],
        [ 0.0000e+00,  5.9605e-08,  0.0000e+00, -5.9605e-08]])


___

### Part 3
Print the number of parameters in your network and plot accuracy of your training and validation 
set over time. You should experiment with some deep networks and see if you can get a network 
with close to 1,000,000 parameters.

Once you've experimented with multiple network setups and the different initialization strategies, plot the best-performing experiment here. You should be able to exceed 85% accuracy on the validation set.

**TODO:**
* Experiment with Deep Networks
* Plot accuracy of training and validation set over time
* Print out number of parameters in the model 
* Plot experiment results with 85% or better validation accuracy

**DONE:**


In [None]:

# Go back up and try a few different networks and initialization strategies
# Plot loss if you want
# Plot accuracy



In [None]:
# Compute and print the number of parameters in the model


___

### Part 4
Learn about how convolution layers affect the shape of outputs, and answer the following quiz questions. Include these in a new markdown cell in your jupyter notebook.


*Using a Kernel size of 3×3 what should the settings of your 2d convolution be that results in the following mappings (first answer given to you)*

* (c=3, h=10, w=10) ⇒ (c=10, h=8, w=8) : (out_channels=10, kernel_size=(3, 3), padding=(0, 0))
* (c=3, h=10, w=10) ⇒ (c=22, h=10, w=10) : **Your answer in bold here**
* (c=3, h=10, w=10) ⇒ (c=65, h=12, w=12) : **Your answer in bold here**
* (c=3, h=10, w=10) ⇒ (c=7, h=20, w=20) : **Your answer in bold here**

*Using a Kernel size of 5×5:*)

* (c=3, h=10, w=10) ⇒ (c=10, h=8, w=8) : (out_channels=10, kernel_size=(5, 5), padding=(1, 1))
* (c=3, h=10, w=10) ⇒ (c=100, h=10, w=10) : **Your answer in bold here**
* (c=3, h=10, w=10) ⇒ (c=23, h=12, w=12) : **Your answer in bold here**
* (c=3, h=10, w=10) ⇒ (c=5, h=24, w=24) : **Your answer in bold here**

*Using Kernel size of 5×3:*

* (c=3, h=10, w=10) ⇒ (c=10, h=8, w=8) : **Your answer in bold here**
* (c=3, h=10, w=10) ⇒ (c=100, h=10, w=10) : **Your answer in bold here**
* (c=3, h=10, w=10) ⇒ (c=23, h=12, w=12) : **Your answer in bold here**
* (c=3, h=10, w=10) ⇒ (c=5, h=24, w=24) : **Your answer in bold here**

*Determine the kernel that requires the smallest padding size to make the following mappings possible:*

* (c=3, h=10, w=10) ⇒ (c=10, h=9, w=7) : **Your answer in bold here**
* (c=3, h=10, w=10) ⇒ (c=22, h=10, w=10) : **Your answer in bold here**

**TODO:**

* Answer all the questions above 

**DONE:**


In [None]:
# Write some test code for checking the answers for these problems (example shown in the video)
