<!DOCTYPE html>
<html lang="en">
<head>
    <meta charset="UTF-8">
    <meta name="viewport" content="width=device-width, initial-scale=1.0">
    <title>Explanation of the Code</title>
</head>
<body>
    <h1>Explanation of the Code</h1>
    <h2>1. Class Initialization:</h2>
    <ul>
        <li>
            The <code>__init__</code> method accepts the following parameters:
            <ul>
                <li><strong>pool_size:</strong> The height and width of the pooling window (typically 2x2 or 3x3).</li>
                <li><strong>stride:</strong> The step size to move the window. If not provided, it defaults to the same as the <code>pool_size</code>.</li>
                <li><strong>padding:</strong> Whether to apply padding ('same') for zero-padding or 'valid' for (no padding).</li>
            </ul>
        </li>
    </ul>

    <h2> n2. Padding:</h2>
    <ul>
        <li>
            If padding is needed (<code>'same'</code>), we compute the required amount of padding for both height and width based on the pooling window size and apply zero-padding to the image.
        </li>
    </ul>

    <h2>3. Forward Pass (Pooling Operation):</h2>
    <ul>
        <li>
            We iterate over each position in the output tensor and extract a region of the input tensor based on the pooling window size and stride.
        </li>
        <li>
            The maximum value within each region is recorded in the output tensor.
        </li>
    </ul>

    <h2>4. Example Usage:</h2>
    <ul>
        <li>
            We simulate a 55×55 image with 96 channels, replicating the 1st pooling layer of AlexNet, and apply a <code>MaxPooling2D</code> layer with a 3x3 pooling window and stride of 2.
        </li>
        <li>
            This results in down-sampling the image to a smaller size.
        </li>
    </ul>

    <h2>Key Differences from TensorFlow's MaxPooling2D</h2>
    <ul>
        <li>
            <strong>Training & Inference Modes:</strong> This implementation does not have separate behavior for training and inference like some other layers (though max pooling typically doesn't change in inference).
        </li>
        <li>
            <strong>Batch Processing:</strong> The code currently processes one image at a time. To handle a batch of images, you'd extend this to loop over the batch dimension.
        </li>
        <li>
            <strong>Efficient Computation:</strong> This implementation is written for clarity and learning purposes, and it may not be as optimized as TensorFlow's <code>MaxPooling2D</code>, which is highly optimized and runs on GPU.
        </li>
    </ul>

    <p>
        This custom implementation can be used for learning purposes or as part of a larger neural network framework.
    </p>
</body>
</html>


In [9]:
import numpy as np

class MaxPooling2D:
    def __init__(self, pool_size=(2, 2), stride=None, padding='valid'):
        """
        Initializes the MaxPooling2D layer.
        
        Parameters:
        - pool_size (tuple): Size of the pooling window (height, width).
        - stride (int or tuple): Step size for moving the pooling window.
        - padding (str): 'valid' (no padding) or 'same' (zero padding).
        """
        self.pool_size = pool_size
        self.stride = stride if stride is not None else pool_size  # Default to pool_size if stride is None
        self.padding = padding

    def apply_padding(self, image, pad_h, pad_w):
        """
        Pads the input image with zeros if necessary.
        
        Parameters:
        - image (np.array): The input image.
        - pad_h (int): Padding height.
        - pad_w (int): Padding width.
        
        Returns:
        - np.array: Padded image.
        """
        return np.pad(image, ((pad_h, pad_h), (pad_w, pad_w), (0, 0)), mode='constant', constant_values=0)

    def forward(self, image):
        """
        Performs the max pooling operation.
        
        Parameters:
        - image (np.array): Input image with shape (height, width, channels).
        
        Returns:
        - np.array: The pooled image.
        """
        # Image dimensions
        image_height, image_width, channels = image.shape
        pool_height, pool_width = self.pool_size

        # Padding if necessary
        if self.padding == 'same':
            pad_h = (pool_height - 1) // 2
            pad_w = (pool_width - 1) // 2
            image = self.apply_padding(image, pad_h, pad_w)
        elif self.padding == 'valid':
            pad_h, pad_w = 0, 0

        # Calculate output dimensions
        output_height = (image.shape[0] - pool_height) // self.stride[0] + 1
        output_width = (image.shape[1] - pool_width) // self.stride[1] + 1
        
        # Initialize the output array
        pooled_output = np.zeros((output_height, output_width, channels))

        # Perform pooling operation
        for h in range(output_height):
            for w in range(output_width):
                for c in range(channels):
                    # Extract the region of interest (ROI) for the current filter position
                    region = image[
                        h * self.stride[0]: h * self.stride[0] + pool_height,
                        w * self.stride[1]: w * self.stride[1] + pool_width,
                        c
                    ]
                    # Apply max pooling to the region
                    pooled_output[h, w, c] = np.max(region)

        return pooled_output

# Example usage
# Simulated image with batch size of 1, height=6, width=6, and 1 channel
image = np.random.randn(55, 55, 96)

# Instantiate the MaxPooling2D layer with pool_size=(2, 2), stride=(2, 2), and 'valid' padding
#max_pool = MaxPooling2D(pool_size=(2, 2), stride=(2, 2), padding='valid')
max_pool= MaxPooling2D(pool_size=(3,3), stride=(2,2), padding='valid')


# Apply the max pooling operation to the image
output = max_pool.forward(image)
print("Input Shape:\n",image.shape)
print("Output Shape of MaxPooling2D:\n", output.shape)


Input Shape:
 (55, 55, 96)
Output Shape of MaxPooling2D:
 (27, 27, 96)
