# Convolution 101

Welcome to "Convolution 101" - a hands-on practical to help you understand convolutions, a key concept in image processing and computer vision. In this colab, you'll implement and visualize hard-coded convolution kernels on a famous mandrill image. This practical experience will solidify your understanding and prepare you for future studies in neural networks and machine learning.

Convolutions are crucial for analyzing and manipulating images to extract features and patterns. We'll experiment with various kernels, observe their effects, and discuss their applications. Let's begin by downloading the mandrill image and diving into the exercises.

In [1]:
from PIL import Image
import torch
from torch import nn
from torch.nn import functional as F
import numpy as np
import matplotlib.pyplot as plt


def torch_to_jpg(tensor):
  # Useful function to pass from tensor image to numpy image to Pillow image
  if len(tensor.shape) == 3:
    tensor = tensor.permute(1, 2, 0)
  if tensor.shape[-1] == 1:
    tensor = tensor[..., 0]
  return Image.fromarray(tensor.numpy().astype(np.uint8))

def jpg_to_torch(jpg_image):
  img = torch.tensor(np.array(jpg_image)).float()
  return img.permute(2, 0, 1)  # In torch the order  is channel / height / width

Load the mandril.jpg image and convert the image to torch tensor

In [None]:
# FIXME

Always in deep learning, when working on images, text, audio, internal representation, etc. look at the shape!

In [None]:
# FIXME

A gray image can simply be made by averaging the channels, do it:

In [None]:
# FIXME

Now, let's try a random convolution kernel. We initialize its value using a "kaiming normal" which is the usual initialization in convolutional networks. We'll see later what it is exactly.

Remember in CNN, a "kernel" is actually several convolutions kernels. There is one $k\times k$ kernel per input channel and per output channel, thus we have $C_i \times C_o$ $k \times k$ kernels.

The final size is therefore $C_i \times C_o \times k \times k $.

In the following block, we define a $5 \times 5$ kernel. There are $C_i = 3$ input channels (i.e. RGB) which we will map onto $C_o = 3$ output channels:

In [5]:
w = # FIXME
w = # FIXME

In [None]:
# torch wants a "batch" of image, so we add a new dimension to have a batch of size 1

o = F.conv2d(
    img[None],
    w
)
torch_to_jpg(o[0])

Pretty colors right? But not very useful. What about handcrafting a  5×5  blurring kernel?

In [None]:
w = torch.zeros(3, 3, 5, 5)
w[0, 0, :, :] = # FIXME
w[1, 1, :, :] = # FIXME
w[2, 2, :, :] = # FIXME

o = F.conv2d(img[None], w)
torch_to_jpg(o[0])

Visualize the kernel values and interpret it. Remember, we don't want to mix channels!

In [None]:
# FIXME

Now do a identity kernel, that will do... nothing:

In [None]:
w = torch.zeros(3, 3, 5, 5)
w[0, 0, 2, 2] = # FIXME
w[1, 1, 2, 2] = # FIXME
w[2, 2, 2, 2] = # FIXME

o = F.conv2d(img[None], w)
torch_to_jpg(o[0])

Now try a edge detection kernel from https://en.wikipedia.org/wiki/Kernel_(image_processing). This kind of kernel was extremely useful before CNN, and can be found in many algorithms such a SIFT.

In [None]:
w = torch.zeros(1, 3, 3, 3)

mat = torch.tensor([
    # FIXME
])

w[0, 0, :, :] = mat
w[0, 1, :, :] = mat
w[0, 2, :, :] = mat

o = F.conv2d(img[None].mean(dim=0, keepdims=True), w)
o = 255 * (o - o.min()) / (o.max() - o.min())  # Rescale to [0, 255]
torch_to_jpg(o[0])

Let's replicate the effect of average pooling, which reduces the height and width of an image by half, using a convolution kernel. We'll first perform the operation using F.avg_pool2d, and then achieve the same result using F.conv2d for a convolution-based approach.

In [None]:
pooled = # FIXME
plt.figure(figsize=(13, 8))
plt.subplot(1, 2, 1)
plt.imshow(torch_to_jpg(pooled[0]))
plt.title(f"w/ avg pooling, {pooled[0].shape}")

w = torch.zeros(3, 3, 2, 2)
w[0, 0, :, :] = # FIXME
w[1, 1, :, :] = # FIXME
w[2, 2, :, :] = # FIXME
res = # FIXME (Call convolution)
res = 255 * (res - res.min()) / (res.max() - res.min())  # Rescale to [0, 255]

plt.subplot(1, 2, 2)
plt.imshow(torch_to_jpg(res[0]))
plt.title(f"w/ convolution, {res[0].shape}");