<a href="https://colab.research.google.com/github/ajfisch/deeplearning_bootcamp_2020/blob/master/advanced_vision_tutorial_2020.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

#Advanced Computer Vision Concepts with Fashion MNIST
In this tutorial, we'll take you through developing your own custom-CNNs, using pre-trained CNN models, and common computer vision best-practices.

Let's get started!

In [0]:
# http://pytorch.org/
from os import path
from wheel.pep425tags import get_abbr_impl, get_impl_ver, get_abi_tag
platform = '{}{}-{}'.format(get_abbr_impl(), get_impl_ver(), get_abi_tag())


!pip install torch torchvision
import torch
print(torch.__version__)
print(torch.cuda.is_available())

In [0]:
import argparse
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
from torchvision import datasets, transforms
from tqdm import tqdm
import matplotlib.pyplot as plt
import numpy as np


In [0]:
#@title Helper Function to display Images { display-mode: "form" }
def plot_images(images, cls_true):
    assert len(images) == len(cls_true) == 9
    
    # Create figure with 3x3 sub-plots.
    fig, axes = plt.subplots(3, 3, figsize=(4,4))
    fig.subplots_adjust(hspace=0.3, wspace=0.3)

    for i, ax in enumerate(axes.flat):
        # Plot image.
        ax.imshow(np.array(images[i], dtype='float').reshape((28,28))*255, cmap='binary')

        # Show true and predicted classes.

        xlabel = "True: {0}".format(cls_true[i])

        # Show the classes as the label on the x-axis.
        ax.set_xlabel(xlabel)
        
        # Remove ticks from the plot.
        ax.set_xticks([])
        ax.set_yticks([])
    
    # Ensure the plot is shown correctly with multiple plots
    # in a single Notebook cell.
    plt.show()

# The Task: Fashion-MNIST, Digit Classification
<img src="https://github.com/zalandoresearch/fashion-mnist/blob/master/doc/img/fashion-mnist-sprite.png?raw=true">

In this lab, we'll build a neural network to classify articles of clothing.



## Step 1: Loading Data and Preprocessing
Let's start by loading the data.
We're going to normalize our images to have 0 mean, and unit variance. We'll do this using some [torchvision](https://pytorch.org/docs/stable/torchvision/index.html) transforms. This generally helps stablize learning, and is common practice. 

In [0]:
# Img mean value of .13, and stdv of .31 were computed across entire train set
# in prior work
normalize_image = transforms.Compose([
                           transforms.ToTensor(),
                           transforms.Normalize((0.1307,), (0.3081,))
                ])

# Dataset is loaded fro torchvision
all_train = datasets.FashionMNIST('data', train=True, download=True, transform=normalize_image)

num_train = int(len(all_train)*.8)
train = [all_train[i] for i in range(num_train)]
dev = [all_train[i] for i in range(num_train,len(all_train))]
test = datasets.FashionMNIST('data', train=False, download=True, 
                      transform=normalize_image)
                           


### Review Question:
1. What functions does the FashionMNIST dataset object have to impelement?

In [0]:
all_train = datasets.FashionMNIST('data', train=True, download=True)
# images = [tr[0] for tr in all_train[:9]]
num_examples = 9 
images, labels = [], []
for i in range(num_examples):
  images.append(all_train[i][0])
  labels.append(all_train[i][1])
    
plot_images(images, labels)

In [0]:
train[0][0].size()

## Step 2: Building a model

All pytorch models should be implemented as instances of `nn.Module`. 

To build a model you need to:
a) define what parameters it'll need in it's `__init__` function
b) define the model's computation, using those parameters, in a forward function.


To keep things simple, lets define a simple linear classifer, like logistic regression. We'll experiment with more complex models soon.

In [0]:
class Model(nn.Module):
    def __init__(self):
        super(Model, self).__init__()
        # Learn weights for each pixel and perform linear regression
        self.fc = nn.Linear(3*28*28, 10)

    def forward(self, x):
        batch_size, num_channels, height, width = x.size()
        # Flatten image
        x = x.view(batch_size, -1)
        # Put it through linear classifier
        return self.fc(x)


## Step 3. Defining our training procedure

To train our model, let's introduce a couple new PyTorch ideas.

A [DataLoader](https://pytorch.org/docs/stable/data.html#torch.utils.data.DataLoader) is an iterator that goes over our entire dataset and selects batches. 
We'll be using this to iterate through our train/dev/test sets.

Let's intialize these now. 

An [Optimizer](https://pytorch.org/docs/stable/optim.html) defines an update rule. In class, we've discussed vanilla SGD, which is one method to compute the next weight, given the current weight and gradient. There are plently of other optimizers you can try from the pytorch library. 


In [0]:
# Training settings
epochs = 10
lr = .01
momentum = 0.5

batch_size = 32
train_loader = torch.utils.data.DataLoader(train, batch_size=batch_size, shuffle=True)
dev_loader = torch.utils.data.DataLoader(dev, batch_size=batch_size, shuffle=True)
test_loader = torch.utils.data.DataLoader(test, batch_size=batch_size, shuffle=True)


model = Model()
optimizer = optim.Adam(model.parameters(), lr=lr)


### Review Question:
1. What are the steps of training?
2. Will this be any different in vision vs NLP?

In [0]:
def train_epoch( model, train_loader, optimizer, epoch):
    model.train() # Set the nn.Module to train mode. 
    model = model.to('cuda')
    total_loss = 0
    correct = 0
    num_samples = len(train_loader.dataset)
    for batch_idx, (x, target) in enumerate(train_loader): #1) get batch
        x, target = x.to('cuda'), target.to('cuda')
        B, C, H, W = x.size()
        x = x.expand([B,3,H,W]).contiguous()
        # Reset gradient data to 0
        optimizer.zero_grad()
        # Get prediction for batch
        output = model(x)
        # 2) Compute loss
        loss = F.cross_entropy(output, target)
        #3) Do backprop
        loss.backward()
        #4) Update model
        optimizer.step()
        
        ## Do book-keeping to track accuracy and avg loss
        pred = output.max(1, keepdim=True)[1] # get the index of the max log-probability
        correct += pred.eq(target.view_as(pred)).sum().item()
        total_loss += loss.detach() # Don't keep computation graph 

    print('Train Epoch: {} \tLoss: {:.4f}, Accuracy: {}/{} ({:.0f}%)'.format(
            epoch, total_loss / num_samples, 
            correct, 
            num_samples,
            100. * correct / num_samples))


## Step 3.5 Define our evaluation loop
Similar to above, we'll also loop through our dev or test set, and compute our loss and accuracy. 
This lets us see how well our model is generalizing. 

In [0]:
def eval_epoch(model, test_loader, name):
    model = model.to('cuda')
    model.eval()
    test_loss = 0
    correct = 0
    for data, target in test_loader:
        data, target = data.to('cuda'), target.to('cuda')
        B, C, H, W = data.size()
        data = data.expand([B,3,H,W]).contiguous()
        output = model(data)
        test_loss += F.cross_entropy(output, target).item() # sum up batch loss
        pred = output.max(1, keepdim=True)[1] # get the index of the max log-probability
        correct += pred.eq(target.view_as(pred)).sum().item()

    test_loss /= len(test_loader.dataset)
    print('\n{} set: Average loss: {:.4f}, Accuracy: {}/{} ({:.0f}%)\n'.format(
        name,
        test_loss, 
        correct, 
        len(test_loader.dataset),
        100. * correct / len(test_loader.dataset)))


## Step 4: Training the model

In [0]:

for epoch in range(1, epochs + 1):
    train_epoch(model, train_loader, optimizer, epoch)
    eval_epoch(model,  dev_loader, "Dev")
    print("---")

# Step 5. Experiment with MLP
Let's try a more complex model.

In [0]:
class Model(nn.Module):
    def __init__(self):
        super(Model, self).__init__()
        self.fc1 = nn.Linear(3*28*28, 200)
        self.fc2 = nn.Linear(200, 200)
        self.fc3 = nn.Linear(200, 10)
        

    def forward(self, x):
        batch_size, num_channels, height, width = x.size()
        x = x.view(batch_size, -1)
        hidden = F.relu(self.fc1(x))
        hidden = F.relu(self.fc2(hidden))
        logit = self.fc3(hidden)
        return logit
    
model = Model()
optimizer = optim.Adam(model.parameters(), lr=lr)

for epoch in range(1, epochs + 1):
    train_epoch(model, train_loader, optimizer, epoch)
    eval_epoch(model,  dev_loader, "Dev")
    print("---")

# Step 6. Try a CNN


In [0]:
class Model(nn.Module):
    def __init__(self):
        super(Model, self).__init__()
        self.hidden_dim = 512
        self.conv1 = nn.Conv2d(3, self.hidden_dim // 4, kernel_size=3, stride=2)
        self.conv2 = nn.Conv2d(self.hidden_dim // 4, self.hidden_dim // 2, kernel_size=3, stride=2)
        self.conv3 = nn.Conv2d(self.hidden_dim // 2, self.hidden_dim, kernel_size=3, stride=1)
        self.fc = nn.Linear(self.hidden_dim, 10)
        
    def forward(self, x):
        batch_size, num_channels, height, width = x.size()
        
        hidden = F.relu(self.conv1(x))
        hidden = F.relu(self.conv2(hidden))
        hidden = F.relu(self.conv3(hidden))
        hidden = hidden.view((batch_size, self.hidden_dim, -1))
        hidden,_ = torch.max(hidden, dim=-1)
        logit = self.fc(hidden)
        return logit

model = Model()
optimizer = optim.Adam(model.parameters(), lr=lr)

for epoch in range(1, epochs + 1):
    train_epoch(model, train_loader, optimizer, epoch)
    eval_epoch(model,  dev_loader, "Dev")
    print("---")

# Try State-Of-The-Art BenchMarks:

In [0]:
import torchvision.models

model = torchvision.models.resnet18( pretrained=True)
model.fc = nn.Linear(512,10)
print(model)

In [0]:
batch_size = 64
train_loader = torch.utils.data.DataLoader(train, batch_size=batch_size, shuffle=True)
dev_loader = torch.utils.data.DataLoader(dev, batch_size=batch_size, shuffle=True)
test_loader = torch.utils.data.DataLoader(test, batch_size=batch_size, shuffle=True)

lr = 1e-3
epochs = 10
optimizer = optim.Adam(model.parameters(), lr=lr)

for epoch in range(1, epochs + 1):
    train_epoch(model, train_loader, optimizer, epoch)
    eval_epoch(model,  dev_loader, "Dev")
    print("---")

## Step 7. Explore further.
You can try different model architectures, different optimizers, learning rates and regularization strategies. Neural networks are incredibly flexibile, and so the space to do explore is enourmous.  Once you're done exploring, take your best model (i.e achieves best results on dev set) and run it on test!

In [0]:
eval_epoch(model,  test_loader, "Test")

### Discussion Questions:
1. We saw the dev performance sometimes goes down during training, why? What should we do about it?
2. Given another pretrained model, how do you retrofit to your task?
3. When should you not use a pretrained model?
4. Could we use RNN for vision tasks? What would that imply?
5. How we adapt a CNN to 3D images?
6. Given a new task, which model should you try first? Why?