### Understanding Convolutional Neural Networks (CNNs)
In this notebook we will work with the convolution technique applied to 2D images.
Let us start by importing the required modules

In [None]:
import numpy as np
import matplotlib.pyplot as plt
from PIL import Image

CNNs are a class of neural networks used for tasks involving images, such as classification, detection, and segmentation.
They work by detecting patterns in images using filters (kernels) in a process called convolution.
Convolution is an operation that slides a small matrix (called a kernel or filter) over an image and computes element-wise multiplications.
This allows CNNs to detect edges, textures, and patterns.
A 2D image is given by a 2D array, each element representing the intensity of the pixel.

In [None]:
# Visualize a simple convolution operation
image = np.array([[2, 2, 2, 2, 2],
                  [2, 4, 4, 4, 2],
                  [2, 4, 8, 4, 2],
                  [2, 4, 4, 4, 2],
                  [2, 2, 2, 2, 2]])

Let us now visualize the image represented by this array

In [None]:
def visualize_image(image):
    
    .........................

visualize_image(image)

We can now define an array that will represent a filter or a kernel that acts on the image. Let us assume that the filter has 3x3 size (it can be smaller or bigger).

In [None]:
kernel = np.array([[1, 0, -1],
                   [1, 0, -1],
                   [1, 0, -1]])

And we can visualize the kernel as well

In [None]:
def visualize_kernel(kernel):
    
   .........................

visualize_kernel(kernel)

A convolution is the element-wise multiplication of the kernel and subarrays of the image and the result is an array with the same shape as the filter. Afterwards, the elements of this result are summed to produce a scalar.
When performing a convolution, two important parameters are the padding and the stride.
Padding refers to the practice of adding extra rows and columns around an image to control how the convolutional filter interacts with its edges. In our case, we will use the function np.pad to apply padding to the image:
image_padded = np.pad(image, pad_width=padding, mode='constant', constant_values=0)
pad_width=padding indicates the number of pixels to add around the edges of the image. For example:
- padding=1: Adds 1 row/column on all sides.
- padding=2: Adds 2 rows/columns on all sides.
Adding padding ensures preservation of spatial dimensions. Indeed, by padding the image, the output size can remain the same as the input size. It also ensures full coverage of edge pixels, since padding allows the kernel to process the edges of the image effectively.
Stride refers to the step size or the number of pixels the convolutional filter (kernel) moves during each step as it slides across the image. It controls how much the filter shifts when it is applied to the input image. Stride=1 ensures maximum overlap between neighbouring regions, but increases the size of the output, whereas Stride>1 reduces the overlap while decreasing the size of the output.

In [None]:
# Implementing Convolution Manually

def convolution2d(image, kernel, stride=1, padding=0):
    """
    Perform 2D convolution between an image and a kernel.
    """
    # Apply padding
    image_padded = np.pad(image, pad_width=padding, mode='constant', constant_values=0)
    
    # Compute the output dimensions
    kernel_height, kernel_width = kernel.shape
    output_height = ((image_padded.shape[0] - kernel_height) // stride) + 1
    output_width = ((image_padded.shape[1] - kernel_width) // stride) + 1
    
    # Initialize the output feature map
    output = np.zeros((output_height, output_width))
    
    # Perform convolution
    .........................
    
    return output

Now we can apply convolution of the example image

In [None]:
# Apply convolution on the example image
output_feature_map = .........................

And visualize the result

In [None]:
.........................

A commonly used technique in CNN is MaxPooling. It is a downsampling operation that reduces the spatial dimensions (height and width) of feature maps while retaining the most important information. The key idea is to condense information by selecting the maximum value from a small region (usually a square) of the feature map. Here, again, the two important parameters are the size of the region where the maximum will be computed and the stride.

In [None]:
def max_pooling(image, size=2, stride=2):
    """
    Perform max pooling on an image with a given pool size and stride.
    """
    output_height = (image.shape[0] - size) // stride + 1
    output_width = (image.shape[1] - size) // stride + 1
    pooled = np.zeros((output_height, output_width))
    
    .........................
    
    return pooled

Now we can apply MaxPooling to the feature map.

In [None]:
pooled_feature_map = .........................

And plot the result, which in this case is trivial.

In [None]:
.........................

After doing all the previous steps, we have all the elements to perform a CNN pipeline, by combining convolution and MaxPooling. We will also add an intermediate step consisting of a ReLU activation function. The basic pipeline includes: Convolution -> Activation (ReLU, using np.maximum) -> MaxPooling.

In [None]:
# Step 1: Convolution
feature_map = .........................

# Step 2: Activation Function (ReLU)
feature_map_relu = .........................

# Step 3: Max Pooling
pooled_output = .........................

And we can visualize all the steps of the pipeline

In [None]:
.........................

Apply the pipeline to a bigger synthetic image.

In [None]:
# Create a simple synthetic image
synthetic_image = np.array(.........................)

# Kernel for edge detection
edge_kernel = np.array([[1, 0, -1],
                        [1, 0, -1],
                        [1, 0, -1]])

# Apply convolution
synthetic_feature_map = .........................
synthetic_feature_map_relu = .........................
synthetic_pooled_output = .........................

# Visualize Results
.........................

We will now apply the pipeline to a realistic image. For this, first load the image.

In [None]:
def load_image(image_path, size=(128, 128)):
    """
    Load an image, resize it, and convert to grayscale.
    """
    image = Image.open(image_path).convert('L')  # Convert to grayscale
    image = image.resize(size)
    image_np = np.array(image)
    return image_np

And visualize it

In [None]:
def show_image(image, title="Image"):
    """
    Display a grayscale image.
    """
    plt.imshow(image, cmap='gray')
    plt.title(title)
    plt.axis('off')
    plt.show()

In [None]:
image_path = "cat.4150.jpg"
image = load_image(image_path)
show_image(image, title="Original Grayscale Image")

We will use three different kernels to study their impact on the image

In [None]:
# Example kernels (filters)
kernel_1 = np.array([[1, 0, -1],
                    [1, 0, -1],
                    [1, 0, -1]])

kernel_2 = np.array([[0, -1, 0],
                    [-1, 5, -1],
                    [0, -1, 0]])

kernel_3 = np.array([[1, 1, 1],
                    [1, 1, 1],
                    [1, 1, 1]]) / 9.0

Apply each of these kernels to the image and see what they do on it.

In [None]:
# Step 1: Convolution with kernel 1 (do the same for the other kernels)
feature_map_1 = .........................

# Step 2: Apply ReLU activation
relu_feature_map_1 = .........................

# Step 3: Max Pooling
pooled_relu_feature_map_1 = .........................

# Visualize the CNN-like behavior
.........................