Do not delete this cell. It defines custom LaTeX commands.
$$
\newcommand{\xb}{\boldsymbol{x}}
\newcommand{\wb}{\boldsymbol{w}}
\newcommand{\pb}{\boldsymbol{p}}
\newcommand{\1}{\mathbb{1}}
$$

# **Convolutions and Convolutional Neural Networks**

**Here you'll experiment with convolutions, on CPUs and GPUs, and with convolutional neural networks.**

In [None]:
%matplotlib inline
import matplotlib.pyplot as plt
import numpy as np
import skimage.data
import skimage.color
import scipy.misc
import scipy.signal
import time

import torch
from torch.autograd import Variable
import torch.nn.functional as F

from matplotlib import rcParams
rcParams['axes.grid'] = False

## **Convolutions with SciPy**

**Let's start by loading a simple image of coffee using scikit-image, converting it to grayscale, and viewing it.**

**You will likely get an error when you run the following line of code. This issue has to do with Google's Colaboratory environment. To fix it, just restart the runtime (`Runtime -> Restart Runtime`) and then run all of the code above again (`Runtime -> Run Before`).**

In [None]:
image = skimage.color.rgb2gray(skimage.data.coffee()).astype(np.float32)

**In the following cell, write code to print this image's `dtype`, `shape`, and minimum and maximum values.**

In [None]:
# TODO


**Let's view the image:**

In [None]:
plt.imshow(image)
plt.axis('image')
plt.set_cmap('gray')

**Now let's create a 15 x 15 averaging filter:**

In [None]:
kernel_shape = [15, 15]
kernel = np.ones(kernel_shape, dtype=np.float32) / np.prod(kernel_shape)

**In the following Markdown cell, answer: Why are we dividing by the product of `kernel_shape`'s elements here?**

**We can then apply the kernel to our image.**

In [None]:
image_smoothed = scipy.signal.convolve2d(image, kernel, mode='same')
kernel.shape

**Copy the previous line of code to the cell below and use IPython's `%timeit` magic to see how long this convolution takes.**

In [None]:
%timeit # TODO

**In the following Markdown cell, answer: Approximately how many milliseconds does it take for this 2-D convolution to complete?**

**In the following Markdown cell, answer: We specified `mode='same'` so that the output image has the same size as the input image. If we instead retained only *valid* outputs – those computed using only values within `image` and `kernel` – what would the shape of the output image be?**

**In the following Markdown cell, answer: Expanding on the previous question, suppose you convolve an image of shape `[HEIGHT, WIDTH]` with a kernel of smaller shape `[K_HEIGHT, K_WIDTH]`, where `K_HEIGHT` and `K_WIDTH` are odd. Then what is the shape of the output of the convolution if only *valid* outputs are retained?**

**Let's visualize the output of this convolution.**

In [None]:
plt.imshow(image_smoothed)
plt.axis('image')

**In the following Markdown cell, answer: Why is there an artificial dark border surrounding this output image (which is not present in the original image above)?**

**This is the result of using `mode='same'`. Here the original image is effectively padded with 0s so that a 'valid' convolution yields an output that has the same shape as the input image. These 0s are darker than the actual image, so when we include them in our averages, we see this artificial border.**

## **Convolutions with PyTorch (CPU only)**

In [None]:
image_ = Variable(torch.from_numpy(image))
kernel_ = Variable(torch.from_numpy(kernel))
kernel_.shape

**In the following Markdown cell, answer: Look up the documentation for `torch.nn.functional.conv2d`. What shape does it expect for `input`, and what shape does it expect for `weight`? (Note that in our usage, the argument `groups` is 1.)**

**In the following cell, write code to reshape `image_` and `kernel_` so that they can be passed to `torch.nn.functional.conv2d`.**

In [None]:
# TODO
# TODO

**Now let's define appropriate padding (so that our output image again remains the same size at the input image) and use PyTorch's `conv2d` to perform the convolution.**

In [None]:
padding = (kernel_shape[0] // 2, kernel_shape[1] // 2)
image_smoothed_ = F.conv2d(image_, kernel_, padding=padding)

**Copy the previous cell's code to the cell below and use IPython's `%timeit` magic to see how long this convolution takes in PyTorch.**

In [None]:
%timeit # TODO

**In the following Markdown cell, answer: Approximately how many milliseconds does it take for this 2-D convolution to complete?**

**In the following Markdown cell, answer: How much faster is PyTorch's implementation in comparison to SciPy's? (To answer this, just compute the ratio $T_\text{SciPy}$ / $T_\text{PyTorch}$.)**

**In the following Markdown cell, answer: Can you guess why PyTorch is faster here? (It's fine if you aren't sure; if so, just leave it blank.)**

**Again let's visualize the output to make sure it's what we expect.**

In [None]:
plt.imshow(image_smoothed_.data.numpy().squeeze())
plt.axis('image')

## **Convolutions with PyTorch (GPU)**

**Now let's move on to using CUDA in PyTorch, to leverage GPUs. (If you haven't heard of CUDA, take a quick look at https://en.wikipedia.org/wiki/CUDA.)**

In [None]:
assert torch.cuda.is_available()

**If the above `assert` fails, hit `Edit -> Notebook Settings` and make sure GPU acceleration is enabled.**

**We can then move our images on the GPU and apply the smoothing operation as a convolution.**

In [None]:
image_ = image_.cuda()
kernel_ = kernel_.cuda()

In [None]:
image_smoothed_ = F.conv2d(image_, kernel_, padding=padding)

**Copy the above code to the cell below and use IPython's `%timeit` magic to see how long this convolution takes in PyTorch using our GPU.**

In [None]:
%timeit # TODO

**In the following Markdown cell, answer: Approximately how many milliseconds does it take for this 2-D convolution to complete?**

**In the following Markdown cell, answer: How much faster is PyTorch's GPU implementation in comparison to SciPy's CPU implementation? And how much faster is PyTorch's GPU implementation than PyTorch's CPU implementation? (Answer these as done above, as $T_\text{PyTorch GPU}$ / $T_\text{SciPy}$ and $T_\text{PyTorch GPU}$ / $T_\text{PyTorch CPU}$.)**

**Now let's go on to convolve an RGB image (height x width x 3) with a kernel that's 15 x 15 x 3.**

In [None]:
image = skimage.data.coffee().astype(np.float32)
image /= image.max()
plt.imshow(image)

**In the following cell, write code to print this image's `dtype`, `shape`, and minimum and maximum values.**

In [None]:
# TODO

**Let's create a 3D kernel for convolution**

In [None]:
kernel_shape = [15, 15, 3]
kernel = np.ones(kernel_shape, dtype=np.float32) / np.prod(kernel_shape)

**Turn the image and kernel into tensors**

In [None]:
image_ = Variable(torch.from_numpy(image).cuda())
kernel_ = Variable(torch.from_numpy(kernel).cuda())

**In the following cell, write code to permute and reshape axes so that `image_` and `kernel_` have the shapes expected by `torch.nn.functional.conv2d`. (You can use `permute` and `unsqueeze` here.)**

In [None]:
# TODO
# TODO

**After the `permute`, we need to make our Variables contiguous. (`permute` changes the order in which we view memory, but avoids rearranging the order explicitly. Thus we need to explicitly reorder the memory so that future manipulations can operate as expected.)**

In [None]:
image_ = image_.contiguous()
kernel_ = kernel_.contiguous()

**In the following cell, write code to print the shape of `image_` and `kernel_`, and confirm they're what you expect.**

In [None]:
# TODO

**Convolve the image again with the kernel.**

In [None]:
output_ = F.conv2d(image_, kernel_, padding=padding)

**In the following cell, write code to print the `type` and `shape` of `output_.data`.**

In [None]:
# TODO

**In the following Markdown cell, answer: Why does the output have 1 output channel instead of 3?**

**Finally, let's visualize the result.**

In [None]:
plt.imshow(output_.data.cpu().numpy().squeeze())
plt.axis('image')

## **MNIST Classification with Extremely Simple CNNs**

**We can first setup the necessary environment and constants**

In [None]:
import torchvision

from pathlib import Path
HOME = Path.home()
MNIST_PATH = HOME / 'data' / 'mnist'

NUM_CLASSES = 10
CHANNELS = 1
HEIGHT = 28
WIDTH = 28

**We're going to load the official train set and never touch the true test set in these experiments, which consists of 10,000 separate examples. We'll instead split our training set into a set for training and a set for validation.**

In [None]:
official_mnist_train = torchvision.datasets.MNIST(str(MNIST_PATH), train=True, download=True)
official_train_images = official_mnist_train.train_data.numpy().astype(np.float32)
official_train_labels = official_mnist_train.train_labels.numpy().astype(np.int)

In [None]:
print(official_train_images.shape)
print(official_train_labels.shape)

**Let's view a few examples:**

In [None]:
example_images = np.concatenate(official_train_images[:10], axis=1)
example_labels = official_train_labels[:10]
print(example_labels)
plt.imshow(example_images)

**Here we'll split our training set into 55000 for training and the rest for validation.**

In [None]:
train_images, val_images = np.split(official_train_images, [55000])
train_labels, val_labels = np.split(official_train_labels, [55000])

In [None]:
print(train_images.shape, train_labels.shape)
print(val_images.shape, val_labels.shape)

**And we'll normalize our data in one of the simplest ways possible: centering and scaling on an image-by-image basis.**

In [None]:
def normalize_stats_image_by_image(images):
  mean = images.mean(axis=(1,2), keepdims=True)
  stdev = images.std(axis=(1,2), keepdims=True)
  return (images - mean) / stdev

In [None]:
train_images = normalize_stats_image_by_image(train_images)
val_images = normalize_stats_image_by_image(val_images)

**We can print the mean and stddev to make sure that normalization is done correctly.**

In [None]:
print(train_images[:3].mean(axis=(1, 2)))
print(train_images[:3].std(axis=(1, 2)))
print(val_images[:3].mean(axis=(1, 2)))
print(val_images[:3].std(axis=(1, 2)))

**We'll define a function to return a batch of examples. Since we assume GPU is available, we also move these images to the GPU.**

In [None]:
def batch(batch_size, training=True):
  """Create a batch of examples.
  
  This creates a batch of input images and a batch of corresponding
  ground-truth labels. We assume CUDA is available (with a GPU).
  
  Args:
    batch_size: An integer.
    training: A boolean. If True, grab examples from the training
      set; otherwise, grab them from the validation set.
  
  Returns:
    A tuple,
    input_batch: A Variable of floats with shape
      [batch_size, 1, height, width]
    label_batch: A Variable of ints with shape
      [batch_size].
  """
  if training:
    random_ind = np.random.choice(train_images.shape[0], size=batch_size, replace=False)
    input_batch = train_images[random_ind]
    label_batch = train_labels[random_ind]
  else:
    input_batch = val_images[:batch_size]
    label_batch = val_labels[:batch_size]
  
  input_batch = input_batch[:, np.newaxis, :, :]
  
  volatile = not training
  input_batch = Variable(torch.from_numpy(input_batch).cuda(), volatile=volatile)
  label_batch = Variable(torch.from_numpy(label_batch).cuda(), volatile=volatile)
  
  return input_batch, label_batch

**Below, you will define a `SimpleCNN` with some significant restrictions on the model class:**

**(1) Input to conv_final needs to be a single pixel (see comments where it is defined).** 

**(2) Only Convolutions and ReLUs can be used. In other words, do not use max pooling, do not use dropout, etc.**

**The purpose of this is to (1) gain competency with the basic settings for convolutions and (2) develop a practical sense for how important these basic settings are.**

**Target: Try to achieve better than 2% error.**

**Hint 1: You can use the `stride` argument in the convolutions.**

**Hint 2: This can easily be achieved in well under 5000 iterations using the same optimizer settings as below (Adam with a learning rate of 0.001).**

In [None]:
class SimpleCNN(torch.nn.Module):
  """A simple convolutional network.
  
  Map from inputs with shape [batch_size, 1, height, width] to
  outputs with shape [batch_size, 1].
  """
  
  def __init__(self):
    super().__init__()
    self.conv1 = torch.nn.Conv2d(1, 32, kernel_size=7, padding=7//2) # feel free to change these parameters.
    # TODO
    # (You may also need to modify conv_final.)
    # self.conv2 = 
    # self.conv3 = 
    # self.conv4 = 
    # self.conv5 = 
    
    # Here the input to conv_final should be a single pixel, as can be obtained
    # by pooling spatially over all pixels. The goal of conv_final is to map
    # from some number of channels to 10, one for each possible class.
    
    # Here, in_channel = 128, but feel free to change that. All other parameters for conv_final should remain the same.
    self.conv_final = torch.nn.Conv2d(128, 10, kernel_size=1)
    
  def forward(self, x):
    x = F.relu(self.conv1(x))
    # TODO
    

**And instantiate our model... notice again that we assume CUDA is available, and that moving all parameters to the GPU is as simple as running `model.cuda()`.**

In [None]:
model = SimpleCNN()
model.cuda()

In [None]:
optimizer = torch.optim.Adam(model.parameters(), lr=0.001)

**Helper function that performs the forward pass, backpropagation, and then gradient update steps for a single batch.**

In [None]:
def train_step(batch_size=128):
  
  model.train()

  input_batch, label_batch = batch(batch_size, training=True)
  output_batch = model(input_batch)
  
  loss = F.cross_entropy(output_batch, label_batch)
  _, pred_batch = torch.max(output_batch, dim=1)
  error_rate = 1.0 - (pred_batch == label_batch).float().mean()

  optimizer.zero_grad()
  loss.backward()
  
  optimizer.step()
  
  return loss.data, error_rate.data

**Evaluation function**

In [None]:
def val():
  
  model.eval()
  input_batch, label_batch = batch(val_images.shape[0], training=False)
  output_batch = model(input_batch)

  loss = F.cross_entropy(output_batch, label_batch)
  _, pred_batch = torch.max(output_batch, dim=1)
  error_rate = 1.0 - (pred_batch == label_batch).float().mean()
  
  return loss.data, error_rate.data

**Finally, let's train, and also plot loss and error rate as a function of iteration.**

In [None]:
# Let's make sure we always start from scratch (that is,
# without starting from parameters from a previous run).
for module in model.children():
  module.reset_parameters()

info = []
fig, ax = plt.subplots(2, 1, sharex=True)
num_steps = 5000
num_steps_per_val = 50
best_val_err = 1.0
for step in range(num_steps):
  train_loss, train_err = train_step()
  if step % num_steps_per_val == 0:
    val_loss, val_err = val()
    if val_err < best_val_err:
      best_val_err = val_err
      print('Step {:5d}: Obtained a best validation error of {:.3f}.'.format(step, best_val_err))
    info.append([step, train_loss, val_loss, train_err, val_err])
    x, y11, y12, y21, y22 = zip(*info)
    ax[0].plot(x, y11, x, y12)
    ax[0].legend(['Train loss', 'Val loss'])
    ax[1].plot(x, y21, x, y22)
    ax[1].legend(['Train err', 'Val err'])
    ax[1].set_ylim([0.0, 0.25])