<a href="https://colab.research.google.com/github/ShawnDhillon1/167-DL/blob/main/10.3.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Notebook 10.3: 2D Convolution**

This notebook investigates the 2D convolution operation.  It asks you to hand code the convolution so we can be sure that we are computing the same thing as in PyTorch.  The next notebook uses the convolutional layers in PyTorch directly.

Work through the cells below, running each cell in turn. In various places you will see the words "TODO". Follow the instructions at these places and make predictions about what is going to happen or write code to complete the functions.

Contact me at udlbookmail@gmail.com if you find any mistakes or have any suggestions.

In [1]:
import numpy as np
import torch
# Set to print in reasonable form
np.set_printoptions(precision=3, floatmode="fixed")
torch.set_printoptions(precision=3)

This routine performs convolution in PyTorch

In [2]:
# Perform convolution in PyTorch
def conv_pytorch(image, conv_weights, stride=1, pad =1):
  # Convert image and kernel to tensors
  image_tensor = torch.from_numpy(image) # (batchSize, channelsIn, imageHeightIn, =imageWidthIn)
  conv_weights_tensor = torch.from_numpy(conv_weights) # (channelsOut, channelsIn, kernelHeight, kernelWidth)
  # Do the convolution
  output_tensor = torch.nn.functional.conv2d(image_tensor, conv_weights_tensor, stride=stride, padding=pad)
  # Convert back from PyTorch and return
  return(output_tensor.numpy()) # (batchSize channelsOut imageHeightOut imageHeightIn)

First we'll start with the simplest 2D convolution.  Just one channel in and one channel out.  A single image in the batch.

In [3]:
# Perform convolution in numpy
def conv_numpy_1(image, weights, pad=1):

    # Perform zero padding
    if pad != 0:
        image = np.pad(image, ((0, 0), (0 ,0), (pad, pad), (pad, pad)),'constant')

    # Get sizes of image array and kernel weights
    batchSize,  channelsIn, imageHeightIn, imageWidthIn = image.shape
    channelsOut, channelsIn, kernelHeight, kernelWidth = weights.shape

    # Get size of output arrays
    imageHeightOut = np.floor(1 + imageHeightIn - kernelHeight).astype(int)
    imageWidthOut = np.floor(1 + imageWidthIn - kernelWidth).astype(int)

    # Create output
    out = np.zeros((batchSize, channelsOut, imageHeightOut, imageWidthOut), dtype=np.float32)

    # !!!!!! NOTE THERE IS A SUBTLETY HERE !!!!!!!!
    # I have padded the image with zeros above, so it is surrouned by a "ring" of zeros
    # That means that the image indexes are all off by one
    # This actually makes your code simpler

    for c_y in range(imageHeightOut):
      for c_x in range(imageWidthOut):
        for c_kernel_y in range(kernelHeight):
          for c_kernel_x in range(kernelWidth):
            # TODO -- Retrieve the image pixel and the weight from the convolution
            # Only one image in batch, one input channel and one output channel, so these indices should all be zero
            # Replace the two lines below
            this_pixel_value = 1.0
            this_weight = 1.0


            # Multiply these together and add to the output at this position
            out[0, 0, c_y, c_x] += np.sum(this_pixel_value * this_weight)

    return out

In [4]:
# Set random seed so we always get same answer
np.random.seed(1)
n_batch = 1
image_height = 4
image_width = 6
channels_in = 1
kernel_size = 3
channels_out = 1

# Create random input image
input_image= np.random.normal(size=(n_batch, channels_in, image_height, image_width))
# Create random convolution kernel weights
conv_weights = np.random.normal(size=(channels_out, channels_in, kernel_size, kernel_size))

# Perform convolution using PyTorch
conv_results_pytorch = conv_pytorch(input_image, conv_weights, stride=1, pad=1)
print("PyTorch Results")
print(conv_results_pytorch)

# Perform convolution in numpy
print("Your results")
conv_results_numpy = conv_numpy_1(input_image, conv_weights)
print(conv_results_numpy)

PyTorch Results
[[[[-0.929 -2.760  0.716  0.114  0.560 -0.387]
   [-1.515  0.283  1.008  0.466 -1.094  2.004]
   [-1.634  3.555 -2.154 -0.892 -1.856  2.299]
   [ 0.565 -0.947 -0.629  2.996 -1.811 -0.533]]]]
Your results
[[[[9.000 9.000 9.000 9.000 9.000 9.000]
   [9.000 9.000 9.000 9.000 9.000 9.000]
   [9.000 9.000 9.000 9.000 9.000 9.000]
   [9.000 9.000 9.000 9.000 9.000 9.000]]]]


Let's now add in the possibility of using different strides

In [5]:
# Perform convolution in numpy
def conv_numpy_2(image, weights, stride=1, pad=1):

    # Perform zero padding
    if pad != 0:
        image = np.pad(image, ((0, 0), (0 ,0), (pad, pad), (pad, pad)),'constant')

    # Get sizes of image array and kernel weights
    batchSize,  channelsIn, imageHeightIn, imageWidthIn = image.shape
    channelsOut, channelsIn, kernelHeight, kernelWidth = weights.shape

    # Get size of output arrays
    imageHeightOut = np.floor(1 + (imageHeightIn - kernelHeight) / stride).astype(int)
    imageWidthOut = np.floor(1 + (imageWidthIn - kernelWidth) / stride).astype(int)

    # Create output
    out = np.zeros((batchSize, channelsOut, imageHeightOut, imageWidthOut), dtype=np.float32)

    for c_y in range(imageHeightOut):
      for c_x in range(imageWidthOut):
        for c_kernel_y in range(kernelHeight):
          for c_kernel_x in range(kernelWidth):
            # TODO -- Retrieve the image pixel and the weight from the convolution
            # Only one image in batch, one input channel and one output channel, so these indices should all be zero
            # Replace the two lines below
            this_pixel_value = 1.0
            this_weight = 1.0


            # Multiply these together and add to the output at this position
            out[0, 0, c_y, c_x] += np.sum(this_pixel_value * this_weight)

    return out

In [10]:
# Set random seed so we always get same answer
np.random.seed(5)
n_batch = 1
image_height = 12
image_width = 10
channels_in = 1
kernel_size = 3
channels_out = 1
stride = 2

# Create random input image
input_image= np.random.normal(size=(n_batch, channels_in, image_height, image_width))
# Create random convolution kernel weights
conv_weights = np.random.normal(size=(channels_out, channels_in, kernel_size, kernel_size))

# Perform convolution using PyTorch
conv_results_pytorch = conv_pytorch(input_image, conv_weights, stride, pad=1)
print("PyTorch Results")
print(conv_results_pytorch)

# Perform convolution in numpy
print("Your results")
conv_results_numpy = conv_numpy_2(input_image, conv_weights, stride, pad=1)
print(conv_results_numpy)

PyTorch Results
[[[[ 2.491 -3.810  3.688 -5.666  2.376]
   [ 1.027  2.763 -1.846 -4.658 -0.816]
   [-1.933  3.386 -2.622 -8.267 -1.252]
   [ 2.441 -4.493  3.510 -2.429 10.071]
   [-7.415  2.632 -4.259  0.024 -0.152]
   [ 0.064  0.714  3.127 -5.161  0.015]]]]
Your results
[[[[9.000 9.000 9.000 9.000 9.000]
   [9.000 9.000 9.000 9.000 9.000]
   [9.000 9.000 9.000 9.000 9.000]
   [9.000 9.000 9.000 9.000 9.000]
   [9.000 9.000 9.000 9.000 9.000]
   [9.000 9.000 9.000 9.000 9.000]]]]


Now we'll introduce multiple input and output channels

In [7]:
# Perform convolution in numpy
def conv_numpy_3(image, weights, stride=1, pad=1):

    # Perform zero padding
    if pad != 0:
        image = np.pad(image, ((0, 0), (0 ,0), (pad, pad), (pad, pad)),'constant')

    # Get sizes of image array and kernel weights
    batchSize,  channelsIn, imageHeightIn, imageWidthIn = image.shape
    channelsOut, channelsIn, kernelHeight, kernelWidth = weights.shape

    # Get size of output arrays
    imageHeightOut = np.floor(1 + (imageHeightIn - kernelHeight) / stride).astype(int)
    imageWidthOut = np.floor(1 + (imageWidthIn - kernelWidth) / stride).astype(int)

    # Create output
    out = np.zeros((batchSize, channelsOut, imageHeightOut, imageWidthOut), dtype=np.float32)

    for c_y in range(imageHeightOut):
      for c_x in range(imageWidthOut):
        for c_channel_out in range(channelsOut):
          for c_channel_in in range(channelsIn):
            for c_kernel_y in range(kernelHeight):
              for c_kernel_x in range(kernelWidth):
                  # TODO -- Retrieve the image pixel and the weight from the convolution
                  # Only one image in batch so this index should be zero
                  # Replace the two lines below
                  this_pixel_value = 1.0
                  this_weight = 1.0

                  # Multiply these together and add to the output at this position
                  out[0, c_channel_out, c_y, c_x] += np.sum(this_pixel_value * this_weight)
    return out

In [9]:
# Set random seed so we always get same answer
np.random.seed(3)
n_batch = 1
image_height = 4
image_width = 6
channels_in = 5
kernel_size = 3
channels_out = 2

# Create random input image
input_image= np.random.normal(size=(n_batch, channels_in, image_height, image_width))
# Create random convolution kernel weights
conv_weights = np.random.normal(size=(channels_out, channels_in, kernel_size, kernel_size))

# Perform convolution using PyTorch
conv_results_pytorch = conv_pytorch(input_image, conv_weights, stride=1, pad=1)
print("PyTorch Results")
print(conv_results_pytorch)

# Perform convolution in numpy
print("Your results")
conv_results_numpy = conv_numpy_3(input_image, conv_weights, stride=1, pad=1)
print(conv_results_numpy)

PyTorch Results
[[[[  1.668  -1.370  -4.506  -4.427   2.117   3.741]
   [  0.831   7.274  -3.644  -7.844  -1.633  -1.190]
   [  3.960  -0.221  -1.455   0.909   3.149   4.979]
   [ -6.651  -0.460  -1.989  -4.566   2.490  -3.966]]

  [[  1.272   5.984   8.939   3.852   7.571  -2.872]
   [  1.776  -1.231 -13.235   4.913  -4.042  -5.306]
   [  0.852   1.450  -5.927  -2.641  -4.822   7.290]
   [ -3.734 -10.068  -1.031  -0.834  -1.489   2.699]]]]
Your results
[[[[45.000 45.000 45.000 45.000 45.000 45.000]
   [45.000 45.000 45.000 45.000 45.000 45.000]
   [45.000 45.000 45.000 45.000 45.000 45.000]
   [45.000 45.000 45.000 45.000 45.000 45.000]]

  [[45.000 45.000 45.000 45.000 45.000 45.000]
   [45.000 45.000 45.000 45.000 45.000 45.000]
   [45.000 45.000 45.000 45.000 45.000 45.000]
   [45.000 45.000 45.000 45.000 45.000 45.000]]]]


Now we'll do the full convolution with multiple images (batch size > 1), and multiple input channels, multiple output channels.

In [11]:
# Perform convolution in numpy
def conv_numpy_4(image, weights, stride=1, pad=1):

    # Perform zero padding
    if pad != 0:
        image = np.pad(image, ((0, 0), (0 ,0), (pad, pad), (pad, pad)),'constant')

    # Get sizes of image array and kernel weights
    batchSize,  channelsIn, imageHeightIn, imageWidthIn = image.shape
    channelsOut, channelsIn, kernelHeight, kernelWidth = weights.shape

    # Get size of output arrays
    imageHeightOut = np.floor(1 + (imageHeightIn - kernelHeight) / stride).astype(int)
    imageWidthOut = np.floor(1 + (imageWidthIn - kernelWidth) / stride).astype(int)

    # Create output
    out = np.zeros((batchSize, channelsOut, imageHeightOut, imageWidthOut), dtype=np.float32)

    for c_batch in range(batchSize):
      for c_y in range(imageHeightOut):
        for c_x in range(imageWidthOut):
          for c_channel_out in range(channelsOut):
            for c_channel_in in range(channelsIn):
              for c_kernel_y in range(kernelHeight):
                for c_kernel_x in range(kernelWidth):
                    # TODO -- Retrieve the image pixel and the weight from the convolution
                    # Replace the two lines below
                    this_pixel_value = 1.0
                    this_weight = 1.0



                    # Multiply these together and add to the output at this position
                    out[c_batch, c_channel_out, c_y, c_x] += np.sum(this_pixel_value * this_weight)
    return out

In [12]:
# Set random seed so we always get same answer
np.random.seed(2)
n_batch = 3
image_height = 4
image_width = 6
channels_in = 5
kernel_size = 3
channels_out = 2

# Create random input image
input_image= np.random.normal(size=(n_batch, channels_in, image_height, image_width))
# Create random convolution kernel weights
conv_weights = np.random.normal(size=(channels_out, channels_in, kernel_size, kernel_size))

# Perform convolution using PyTorch
conv_results_pytorch = conv_pytorch(input_image, conv_weights, stride=1, pad=1)
print("PyTorch Results")
print(conv_results_pytorch)

# Perform convolution in numpy
print("Your results")
conv_results_numpy = conv_numpy_4(input_image, conv_weights, stride=1, pad=1)
print(conv_results_numpy)

PyTorch Results
[[[[ 8.062e-01 -3.786e+00 -1.075e+01 -2.258e-01 -7.847e+00  1.674e+00]
   [ 1.966e-01 -8.558e+00  1.009e+01 -8.302e+00  1.185e+01  7.742e+00]
   [-7.061e+00  9.208e+00 -1.715e-01  7.060e+00 -2.804e+00 -1.008e+01]
   [ 2.709e-01  1.096e+00  1.808e+00  6.055e+00 -1.035e+01 -9.043e-02]]

  [[-2.195e+00  3.800e+00  3.770e+00 -4.008e-01 -2.155e+00 -1.616e+00]
   [ 4.218e-01  2.774e+00 -3.730e+00  5.771e+00 -1.281e+01 -4.113e+00]
   [ 2.439e+00 -4.663e+00  9.183e+00  2.609e+00  7.661e+00 -5.823e+00]
   [ 2.060e+00 -2.956e+00 -5.518e-02  3.499e+00 -8.731e+00  2.459e+00]]]


 [[[ 2.463e+00  3.771e+00 -2.656e+00 -3.177e+00  6.315e+00  2.460e+00]
   [ 1.398e+01  8.217e-02  1.021e+00 -3.030e+00 -4.402e+00  1.266e-01]
   [-5.394e+00  1.450e+00 -3.830e-01 -1.446e+00 -3.505e+00 -7.624e+00]
   [ 1.046e+00 -9.458e+00  3.869e+00 -1.060e+01  8.047e+00 -1.939e-01]]

  [[ 3.009e-01 -4.726e+00  3.535e+00 -2.766e+00 -1.904e+00 -6.620e+00]
   [-1.327e+00  9.387e+00 -4.522e+00  1.672e+01 -7.83