In [1]:
import numpy as np
import pandas as pd

Now it’s time to discuss **pooling**, a downscaling operation that usually follows a convolutional layer. It is a pretty straghtforward operation.

## How Pooling Works
The pooling operation usually follows the convolution layer. Its task is to reduce the dimensionality of the result coming in from the convolutional layer by keeping what’s relevant and discarding the rest.

#### Region
The process is simple – you define an `n x n` **region** and **stride** size. The region represents a small matrix that slides over the image and works with individual pools. A **pool** is just a fancy word for a small matrix on the convolutional output from which, most commonly, the maximum value is kept. A good starting value for the region size is `2×2`.

#### Stride
The **stride** represents the number of pixels to the right the region moves after completing a single step. When the region reaches the end of the first row blocks, it moves down by a stride size and repeats the process. A good starting value for the stride is 2. Opting for a stride size lower than 2 doesn’t make much sense, as you’ll see shortly.

#### Max Pooling
The most common type of pooling is **Max Pooling**, which means only the highest value of a region is kept. You’ll sometimes encounter **Average Pooling**, but not nearly as often. Max pooling is a good place to start because it keeps the most activated pixels (ones with the highest values) and discards the rest. On the other hand, averaging would **even out the values**. You don’t want that most of the time.

#### Example

While we’re on the topic of how pooling works, let’s see what happens to a small `4×4` matrix when you apply max pooling to it. We’ll use a region size of `2×2` and the stride size of `1`:


<center><img src="img/maxpooling_1.png" alt="Max Pooling with the region size of 2×2 and the stride size of 1" width="650" height="320" /></center>
<p style="text-align: center; font-size: small;"><i><b>Figure 1.</b> Max Pooling with the region size of 2×2 and the stride size of 1</i></p>

A total of 9 pools was extracted from the input matrix, and only the largest value from each pool was kept. As a result, pooling reduced the dimensionality by a single pixel in height and width. That’s why opting for a stride size lower than 2 makes no sense, as pooling just barely reduced the dimensionality.

Let’s apply the pooling operation once again, but this time with a stride size of 2 pixels:


<center><img src="img/maxpooling_2.png" alt="Max Pooling with the region size of 2×2 and the stride size of 2" width="650" height="250" /></center>
<p style="text-align: center; font-size: small;"><i><b>Figure 2.</b> Max Pooling with the region size of 2×2 and the stride size of 2 </i></p>

Much better – we now had only four pools to work with, and we got rid of half the pixels in height and width.

## Implementation
Next, let’s see how to implement the pooling logic from scratch in Python. We'll build the MaxPool as a regular NN layer, with `forward()` and `backward()` functions.

In [None]:
class Layer:
    def __init__(self):
        self.input = None
        self.output = None

    def forward(self, input: np.ndarray) -> np.ndarray:
        # TODO: return output
        pass

    def backward(self, output_gradient: np.ndarray, learning_rate: float) -> np.ndarray:
        # TODO: update parameters and return input gradient
        pass

class MaxPool(Layer):
    def __init__(self, region_size: int, stride: int):
        self.region_size = region_size
        self.stride = stride

### Forward propagation

The forward propagation is pretty straightforward. We simply need 

In [None]:
class MaxPool(Layer):
    def __init__(self, pool_size: int, stride: int):
        self.pool_size = pool_size
        self.stride = stride
        self.max_val_indices = None
        self.input_shape = None

    def forward(self, input: np.ndarray) -> np.ndarray:
        # Get the dimensions of the input
        num_filters, h, w = input.shape

        # Define the output dimensions
        h_out = (h - self.pool_size) // self.stride + 1
        w_out = (w - self.pool_size) // self.stride + 1

        # Store the indices of the max values here. They are needed for the backward propagation.
        self.max_val_indices = np.zeros((num_filters, h_out, w_out, 2))
        self.input_shape = input.shape

        # Initialize the output
        output = np.zeros((num_filters, h_out, w_out))
        # Iterate over the input
        for h_out_idx in range(h_out):
            for w_out_idx in range(w_out):
                # Get the pool
                h_start = h_out_idx*self.stride
                h_end = h_start + self.pool_size
                w_start = w_out_idx*self.stride
                w_end = w_start + self.pool_size
                pool = input[:, h_start:h_end, w_start:w_end]
                # Get the max values for that pool (the number of max values == the number of filters)
                max_values = np.max(pool, axis=(1, 2))
                output[:, h_out_idx, w_out_idx] = max_values
                # NB: Remember the indices for these max values. These indices will be used for the backward propagation.
                self.set_max_val_indices(pool, h_out_idx, w_out_idx, max_values)
        # Return the output
        return output

### Backward propagation

Differently from convolution operations, we do not have to compute here **weights** and **bias** **derivatives** as there are no parameters in a pooling operation. Thus, the only derivative we need to compute is with respect to the input, $\frac{∂_Y}{∂_X}$. We know that the derivative w.r.t. the inputs will have the **same shape as the input**.

Max pooling during backpropagation works by passing the gradient to the specific element that was the maximum during the forward pass, while assigning **zero gradient to all other elements** within that pooling window. Essentially, the "winner" (the max value) from the forward pass acts as a router, directing the incoming gradient to its origin in the previous layer. This routing is facilitated by storing the location of the maximum value during the forward pass, which is then used to distribute the gradient in the backward pass.

##### Steps for Max Pooling Backward Propagation
* **Identify the "Winner"**: During the forward pass, the max pooling layer identifies the maximum value within each pooling window and stores the location (index) of that maximum value. 
* **Initialize Gradient Map**: Create a gradient map of the same size as the output feature map, but filled with zeros. 
* **Propagate Gradient**: Take the incoming gradient from the subsequent layer and place it only at the stored index within the gradient map, effectively routing it to the correct location. 
* **Zero Out Others**: All other elements in the gradient map remain zero, because their small changes would not affect the maximum value during the forward pass.
* **Sum Gradients (If Necessary)**: If multiple elements tied for the maximum value during the forward pass, the incoming gradient would be distributed among all the tied locations. 

In [None]:
class MaxPool(Layer):
    def __init__(self, pool_size: int, stride: int):
        self.pool_size = pool_size
        self.stride = stride
        self.max_val_indices = None
        self.input_shape = None

    def forward(self, input: np.ndarray) -> np.ndarray:
        # Get the dimensions of the input
        num_filters, h, w = input.shape

        # Define the output dimensions
        h_out = (h - self.pool_size) // self.stride + 1
        w_out = (w - self.pool_size) // self.stride + 1

        # Store the indices of the max values here. They are needed for the backward propagation.
        self.max_val_indices = np.zeros((num_filters, h_out, w_out, 2))
        self.input_shape = input.shape

        # Initialize the output
        output = np.zeros((num_filters, h_out, w_out))
        # Iterate over the input
        for h_out_idx in range(h_out):
            for w_out_idx in range(w_out):
                # Get the pool
                h_start = h_out_idx*self.stride
                h_end = h_start + self.pool_size
                w_start = w_out_idx*self.stride
                w_end = w_start + self.pool_size
                pool = input[:, h_start:h_end, w_start:w_end]
                # Get the max values for that pool (the number of max values == the number of filters)
                max_values = np.max(pool, axis=(1, 2))
                output[:, h_out_idx, w_out_idx] = max_values
                # NB: Remember the indices for these max values. These indices will be used for the backward propagation.
                self.set_max_val_indices(pool, h_out_idx, w_out_idx, max_values)
        # Return the output
        return output

    def set_max_val_indices(self, pool: np.ndarray, h_out_idx: int, w_out_idx: int, max_values: np.ndarray) -> np.ndarray:
        """ We can't use np.argmax since it axis parameter accepts only a single value, and not a tuple of values. """
        # For each filter
        for filter_idx in range(pool.shape[0]):
            # For each row
            for pool_h_idx in range(pool.shape[1]):
                # For each column
                for pool_w_idx in range(pool.shape[2]):
                    # IMPORTANT:We store the indices of the max values within a pool.
                    if pool[filter_idx, pool_h_idx, pool_w_idx] == max_values[filter_idx]:
                        self.max_val_indices[filter_idx, h_out_idx, w_out_idx, 0] = pool_h_idx
                        self.max_val_indices[filter_idx, h_out_idx, w_out_idx, 1] = pool_w_idx

    def backward(self, output_gradient: np.ndarray, learning_rate: float) -> np.ndarray:
        # Get the dimensions of the output
        num_filters, h_out, w_out = output_gradient.shape
        # Initialize the input gradient by setting all values to zero by default.
        input_gradient = np.zeros(self.input_shape)

        # Iterate over the output. For each value in the output matrix, we update a single value in the input matrix.
        for filter_idx in range(num_filters):
            for h_out_idx in range(h_out):
                for w_out_idx in range(w_out):
                    # Get the index of the maximum value within a pool.
                    max_val_pool_indices = self.max_val_indices[filter_idx, h_out_idx, w_out_idx]
                    pool_h_idx, pool_w_idx = max_val_pool_indices[0], max_val_pool_indices[1]
                    # Calculate the index of the maximum value in the input matrix.
                    input_h_idx = int(h_out_idx*self.stride + pool_h_idx)
                    input_w_idx = int(w_out_idx*self.stride + pool_w_idx)
                    # Update only the value at that index. All other values remain zero.
                    input_gradient[filter_idx, input_h_idx, input_w_idx] = output_gradient[filter_idx, h_out_idx, w_out_idx]
        # Return the input gradient
        return input_gradient

You can run the a CNN with MaxPooling by executing the `CnnWithPoolingDemo.py` file.

 * **NB**: The **random weight generation** at the beginning could harm the training results. Sometimes the starting point leads to a suboptimal **local bottom** and the model can't get out of it with current implementation. So, you might need to run the example few times until you observe the error converging to $0$.