<a href="https://colab.research.google.com/github/Undasnr/DL-ML/blob/main/Ronny_SimpleConv2d_Assignment.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

**1. Creating a 2-D convolutional layer**

In [None]:
import numpy as np

class Conv2d:
    """
    A 2D convolutional layer implemented from scratch using NumPy.

    This class performs the forward and backward passes for a convolutional
    layer, following the mathematical formulas provided. It assumes a
    data format of (batch_size, channels, height, width) (NCHW).
    For simplicity, this implementation does not include support for
    padding or strides other than 1, as implied by the given formulas.
    """

    def __init__(self, in_channels, out_channels, filter_size):
        """
        Initializes the Conv2d layer with random weights and zero biases.

        Args:
            in_channels (int): The number of channels in the input array.
            out_channels (int): The number of channels in the output array (number of filters).
            filter_size (tuple): The height and width of the filter, e.g., (3, 3).
        """
        self.in_channels = in_channels
        self.out_channels = out_channels
        self.filter_h, self.filter_w = filter_size

        # Initializing weights (W) and biases (b).
        # Shape of W: (out_channels, in_channels, filter_h, filter_w)
        self.W = np.random.randn(out_channels, in_channels, self.filter_h, self.filter_w) * 0.01

        # Initializing biases to zeros. Shape: (out_channels,)
        self.b = np.zeros(out_channels)

        # Gradients for weights and biases, to be calculated during backward pass.
        self.dW = None
        self.db = None

        # Storing the input array (X) for use in the backward pass.
        self.X = None

    def forward(self, X):
        """
        Performs the forward propagation for the convolutional layer.

        Args:
            X (np.ndarray): The input array of shape (N, C, H, W).
                            N: batch size, C: in_channels, H: input height, W: input width.

        Returns:
            np.ndarray: The output array (A) after convolution, of shape (N, M, H_out, W_out).
                        N: batch size, M: out_channels, H_out, W_out: output dimensions.
        """
        # Storing the input for the backward pass
        self.X = X

        # Getting input dimensions
        N, C, H, W = X.shape

        # Calculating output dimensions.
        N_out_h = H - self.filter_h + 1
        N_out_w = W - self.filter_w + 1

        # Initializing the output array with zeros.
        A = np.zeros((N, self.out_channels, N_out_h, N_out_w))

        # Perform the convolution using nested loops.
        for n in range(N):
            for m in range(self.out_channels):
                for i in range(N_out_h):
                    for j in range(N_out_w):
                        # Extract the receptive field (region of the input being convolved)
                        receptive_field = self.X[n, :, i:i + self.filter_h, j:j + self.filter_w]

                        # Performing the element-wise multiplication and summation.
                        A[n, m, i, j] = np.sum(receptive_field * self.W[m]) + self.b[m]

        return A

    def backward(self, dA):
        """
        Performs the backpropagation for the convolutional layer.

        This method calculates the gradients for the weights (dW) and biases (db)
        and the error to be passed to the previous layer (dX).

        Args:
            dA (np.ndarray): The gradient from the subsequent layer, of the same
                             shape as the output of the forward pass.

        Returns:
            np.ndarray: The gradient to be passed to the previous layer, of the
                        same shape as the input of the forward pass.
        """
        # Getting dimensions of input and output arrays
        N, C_in, H_in, W_in = self.X.shape
        N_out, M_out, H_out, W_out = dA.shape

        # Initializing gradients to zeros
        self.dW = np.zeros(self.W.shape)
        self.db = np.zeros(self.b.shape)
        dX = np.zeros(self.X.shape)

        # Looping to calculate dW and db.
        for n in range(N):
            for m in range(M_out):
                for i in range(H_out):
                    for j in range(W_out):
                        # Extracting the receptive field from the original input
                        receptive_field = self.X[n, :, i:i + self.filter_h, j:j + self.filter_w]

                        # Adding the contribution of this specific output element to the gradients
                        self.dW[m] += dA[n, m, i, j] * receptive_field

                # The sum of all ∂L/∂a_i,j,m for a given output channel gives the bias gradient.
                self.db[m] = np.sum(dA[n, m])

        # Loop to calculate the error to be passed to the previous layer (dX).
        for n in range(N):
            for k in range(C_in):
                for i in range(H_in):
                    for j in range(W_in):
                        sum_val = 0
                        for m in range(M_out):
                            for s in range(self.filter_h):
                                for t in range(self.filter_w):
                                    # Checking for valid indices
                                    if 0 <= (i - s) < H_out and 0 <= (j - t) < W_out:
                                        sum_val += dA[n, m, i - s, j - t] * self.W[m, k, s, t]
                        dX[n, k, i, j] = sum_val

        return dX


**2. Experiments with 2D convolutional layers on small arrays**

In [None]:
import numpy as np

class Conv2d:
    """
    A 2D convolutional layer implemented from scratch using NumPy.

    This class performs the forward and backward passes for a convolutional
    layer, following the mathematical formulas provided. It assumes a
    data format of (batch_size, channels, height, width) (NCHW).
    For simplicity, this implementation does not include support for
    padding or strides other than 1, as implied by the given formulas.
    """

    def __init__(self, in_channels, out_channels, filter_size):
        """
        Initializes the Conv2d layer with random weights and zero biases.

        Args:
            in_channels (int): The number of channels in the input array.
            out_channels (int): The number of channels in the output array (number of filters).
            filter_size (tuple): The height and width of the filter, e.g., (3, 3).
        """
        self.in_channels = in_channels
        self.out_channels = out_channels
        self.filter_h, self.filter_w = filter_size

        # Initializing weights (W) and biases (b).
        # Weights are initialized with small random values to break symmetry.
        # Shape of W: (out_channels, in_channels, filter_h, filter_w)
        self.W = np.random.randn(out_channels, in_channels, self.filter_h, self.filter_w) * 0.01

        # Biases are initialized to zeros. Shape: (out_channels,)
        self.b = np.zeros(out_channels)

        # Gradients for weights and biases, to be calculated during backward pass.
        self.dW = None
        self.db = None

        # Storing the input array (X) for use in the backward pass.
        self.X = None

    def forward(self, X):
        """
        Performs the forward propagation for the convolutional layer.

        Args:
            X (np.ndarray): The input array of shape (N, C, H, W).
                            N: batch size, C: in_channels, H: input height, W: input width.

        Returns:
            np.ndarray: The output array (A) after convolution, of shape (N, M, H_out, W_out).
                        N: batch size, M: out_channels, H_out, W_out: output dimensions.
        """
        # Storing the input for the backward pass
        self.X = X

        # Getting input dimensions
        N, C, H, W = X.shape

        # Calculating output dimensions.
        N_out_h = H - self.filter_h + 1
        N_out_w = W - self.filter_w + 1

        # Initializing the output array with zeros.
        A = np.zeros((N, self.out_channels, N_out_h, N_out_w))

        # Performing the convolution using nested loops.
        for n in range(N):  # Loop over each sample in the batch
            for m in range(self.out_channels):  # Loop over each output channel (filter)
                for i in range(N_out_h):  # Loop over output height
                    for j in range(N_out_w):  # Loop over output width
                        # Extracting the receptive field (region of the input being convolved)
                        receptive_field = self.X[n, :, i:i + self.filter_h, j:j + self.filter_w]

                        # Performing the element-wise multiplication and summation.
                        A[n, m, i, j] = np.sum(receptive_field * self.W[m]) + self.b[m]

        return A

    def backward(self, dA):
        """
        Performs the backpropagation for the convolutional layer.

        This method calculates the gradients for the weights (dW) and biases (db)
        and the error to be passed to the previous layer (dX).

        Args:
            dA (np.ndarray): The gradient from the subsequent layer, of the same
                             shape as the output of the forward pass.

        Returns:
            np.ndarray: The gradient to be passed to the previous layer, of the
                        same shape as the input of the forward pass.
        """
        # Getting dimensions of input and output arrays
        N, C_in, H_in, W_in = self.X.shape
        N_out, M_out, H_out, W_out = dA.shape

        # Initializing gradients to zeros
        self.dW = np.zeros(self.W.shape)
        self.db = np.zeros(self.b.shape)
        dX = np.zeros(self.X.shape)

        # Loop to calculate dW and db.
        for n in range(N):
            for m in range(M_out):
                for i in range(H_out):
                    for j in range(W_out):
                        # Extracting the receptive field from the original input
                        receptive_field = self.X[n, :, i:i + self.filter_h, j:j + self.filter_w]

                        # Adding the contribution of this specific output element to the gradients
                        self.dW[m] += dA[n, m, i, j] * receptive_field

                # The sum of all ∂L/∂a_i,j,m for a given output channel gives the bias gradient.
                self.db[m] = np.sum(dA[n, m])

        # Loop to calculate the error to be passed to the previous layer (dX).
        for n in range(N):
            for k in range(C_in):
                for i in range(H_in):
                    for j in range(W_in):
                        sum_val = 0
                        for m in range(M_out):
                            for s in range(self.filter_h):
                                for t in range(self.filter_w):
                                    # Check for valid indices
                                    if 0 <= (i - s) < H_out and 0 <= (j - t) < W_out:
                                        sum_val += dA[n, m, i - s, j - t] * self.W[m, k, s, t]
                        dX[n, k, i, j] = sum_val

        return dX


# Problem 2: Testing with small arrays
print("--- Problem 2: Testing Forward and Backward Propagation ---")

# Input data for the forward pass
# Shape: (batch_size=1, in_channels=1, height=4, width=4)
x = np.array([[[[1, 2, 3, 4], [5, 6, 7, 8], [9, 10, 11, 12], [13, 14, 15, 16]]]])

# Weights (filters) for the convolutional layer
# Shape: (out_channels=2, in_channels=1, filter_h=3, filter_w=3)
w = np.array(
    [
        [[0.0, 0.0, 0.0], [0.0, 1.0, 0.0], [0.0, -1.0, 0.0]],
        [[0.0, 0.0, 0.0], [0.0, -1.0, 1.0], [0.0, 0.0, 0.0]],
    ]
)

# Biases (set to zeros for this test)
b = np.zeros(2)

# Creating an instance of the Conv2d layer
conv_layer = Conv2d(in_channels=1, out_channels=2, filter_size=(3, 3))

# Manually set the weights and biases to the specified values
conv_layer.W = w.reshape(2, 1, 3, 3)
conv_layer.b = b

# Forward Propagation
print("\n--- Forward Propagation ---")
output_a = conv_layer.forward(x)
print(f"Output (A):\n{output_a}")

# Expected output from the problem description
expected_output_a = np.array([[[-4, -4], [-4, -4]], [[1, 1], [1, 1]]])
print(f"\nExpected Output:\n{expected_output_a}")

# Checking if the calculated output matches the expected output
print(f"\nForward pass matches expected output: {np.allclose(output_a, expected_output_a)}")


# Backward Propagation
print("\n--- Backward Propagation ---")

# Error from the subsequent layer (dA)
# Shape: (batch_size=2, out_channels=2, H_out=2, W_out=2)
delta = np.array([[[-4, -4], [10, 11]], [[1, -7], [1, -11]]])
da_to_pass = delta.reshape(1, 2, 2, 2)

# Pass the gradient to the backward method
dx_to_pass = conv_layer.backward(da_to_pass)

print(f"Gradient to previous layer (dX):\n{dx_to_pass}")

# Manually calculating the expected dX
expected_dx = np.array([[[-5, 4, 0, 0], [13, 27, -4, 0], [1, 1, 10, 0], [0, 0, 0, 0]]])
print(f"\nExpected dX (based on manual calculation):\n{expected_dx}")

# Checking if the calculated dX matches the manually calculated dX
print(f"\nBackward pass matches expected dX: {np.allclose(dx_to_pass, expected_dx)}")

--- Problem 2: Testing Forward and Backward Propagation ---

--- Forward Propagation ---
Output (A):
[[[[-4. -4.]
   [-4. -4.]]

  [[ 1.  1.]
   [ 1.  1.]]]]

Expected Output:
[[[-4 -4]
  [-4 -4]]

 [[ 1  1]
  [ 1  1]]]

Forward pass matches expected output: True

--- Backward Propagation ---
Gradient to previous layer (dX):
[[[[  0.   0.   0.   0.]
   [  0.  -5.   4.  -7.]
   [  0.  13.  27. -11.]
   [  0. -10. -11.   0.]]]]

Expected dX (based on manual calculation):
[[[-5  4  0  0]
  [13 27 -4  0]
  [ 1  1 10  0]
  [ 0  0  0  0]]]

Backward pass matches expected dX: False


**3. Output size after 2-dimensional convolution**

In [None]:
import numpy as np

class Conv2d:
    """
    A 2D convolutional layer implemented from scratch using NumPy.

    This class performs the forward and backward passes for a convolutional
    layer, following the mathematical formulas provided. It assumes a
    data format of (batch_size, channels, height, width) (NCHW).
    For simplicity, this implementation does not include support for
    padding or strides other than 1, as implied by the given formulas.
    """

    def __init__(self, in_channels, out_channels, filter_size):
        """
        Initializes the Conv2d layer with random weights and zero biases.

        Args:
            in_channels (int): The number of channels in the input array.
            out_channels (int): The number of channels in the output array (number of filters).
            filter_size (tuple): The height and width of the filter, e.g., (3, 3).
        """
        self.in_channels = in_channels
        self.out_channels = out_channels
        self.filter_h, self.filter_w = filter_size

        # Initialize weights (W) and biases (b).
        # Weights are initialized with small random values to break symmetry.
        self.W = np.random.randn(out_channels, in_channels, self.filter_h, self.filter_w) * 0.01

        # Biases are initialized to zeros. Shape: (out_channels,)
        self.b = np.zeros(out_channels)

        # Gradients for weights and biases, to be calculated during backward pass.
        self.dW = None
        self.db = None

        # Storing the input array (X) for use in the backward pass.
        self.X = None

    def forward(self, X):
        """
        Performs the forward propagation for the convolutional layer.

        Args:
            X (np.ndarray): The input array of shape (N, C, H, W).
                            N: batch size, C: in_channels, H: input height, W: input width.

        Returns:
            np.ndarray: The output array (A) after convolution, of shape (N, M, H_out, W_out).
                        N: batch size, M: out_channels, H_out, W_out: output dimensions.
        """
        # Storing the input for the backward pass
        self.X = X

        # Getting input dimensions
        N, C, H, W = X.shape

        # Calculating output dimensions.
        N_out_h = H - self.filter_h + 1
        N_out_w = W - self.filter_w + 1

        # Initializing the output array with zeros.
        A = np.zeros((N, self.out_channels, N_out_h, N_out_w))

        # Performing the convolution using nested loops.
        for n in range(N):
            for m in range(self.out_channels):
                for i in range(N_out_h):
                    for j in range(N_out_w):
                        # Extracting the receptive field (region of the input being convolved)
                        receptive_field = self.X[n, :, i:i + self.filter_h, j:j + self.filter_w]

                        # Perform the element-wise multiplication and summation.
                        A[n, m, i, j] = np.sum(receptive_field * self.W[m]) + self.b[m]

        return A

    def backward(self, dA):
        """
        Performs the backpropagation for the convolutional layer.

        This method calculates the gradients for the weights (dW) and biases (db)
        and the error to be passed to the previous layer (dX).

        Args:
            dA (np.ndarray): The gradient from the subsequent layer, of the same
                             shape as the output of the forward pass.

        Returns:
            np.ndarray: The gradient to be passed to the previous layer, of the
                        same shape as the input of the forward pass.
        """
        # Getting dimensions of input and output arrays
        N, C_in, H_in, W_in = self.X.shape
        N_out, M_out, H_out, W_out = dA.shape

        # Initializing gradients to zeros
        self.dW = np.zeros(self.W.shape)
        self.db = np.zeros(self.b.shape)
        dX = np.zeros(self.X.shape)

        # Loop to calculate dW and db.
        for n in range(N):
            for m in range(M_out):
                for i in range(H_out):
                    for j in range(W_out):
                        # Extracting the receptive field from the original input
                        receptive_field = self.X[n, :, i:i + self.filter_h, j:j + self.filter_w]

                        # Adding the contribution of this specific output element to the gradients
                        self.dW[m] += dA[n, m, i, j] * receptive_field

                # The sum of all ∂L/∂a_i,j,m for a given output channel gives the bias gradient.
                self.db[m] = np.sum(dA[n, m])

        # Loop to calculate the error to be passed to the previous layer (dX).
        for n in range(N):
            for k in range(C_in):
                for i in range(H_in):
                    for j in range(W_in):
                        sum_val = 0
                        for m in range(M_out):
                            for s in range(self.filter_h):
                                for t in range(self.filter_w):
                                    # Check for valid indices
                                    if 0 <= (i - s) < H_out and 0 <= (j - t) < W_out:
                                        sum_val += dA[n, m, i - s, j - t] * self.W[m, k, s, t]
                        dX[n, k, i, j] = sum_val

        return dX

    @staticmethod
    def calculate_output_size(input_size, filter_size, padding, stride):
        """
        Calculates the output size of a 2D convolutional layer based on the
        input size, filter size, padding, and stride.

        Args:
            input_size (tuple): A tuple (input_h, input_w) representing the input size.
            filter_size (tuple): A tuple (filter_h, filter_w) for the filter size.
            padding (tuple): A tuple (padding_h, padding_w) for the padding.
            stride (tuple): A tuple (stride_h, stride_w) for the stride.

        Returns:
            tuple: A tuple (output_h, output_w) representing the output size.
        """
        input_h, input_w = input_size
        filter_h, filter_w = filter_size
        padding_h, padding_w = padding
        stride_h, stride_w = stride

        # Calculating the output height and width using the provided formulas.
        output_h = int((input_h + 2 * padding_h - filter_h) / stride_h) + 1
        output_w = int((input_w + 2 * padding_w - filter_w) / stride_w) + 1

        return output_h, output_w


# Problem 2: Testing with small arrays
print("--- Problem 2: Testing Forward and Backward Propagation ---")

# Input data for the forward pass
# Shape: (batch_size=1, in_channels=1, height=4, width=4)
x = np.array([[[[1, 2, 3, 4], [5, 6, 7, 8], [9, 10, 11, 12], [13, 14, 15, 16]]]])

# Weights (filters) for the convolutional layer
# Shape: (out_channels=2, in_channels=1, filter_h=3, filter_w=3)
w = np.array(
    [
        [[0.0, 0.0, 0.0], [0.0, 1.0, 0.0], [0.0, -1.0, 0.0]],
        [[0.0, 0.0, 0.0], [0.0, -1.0, 1.0], [0.0, 0.0, 0.0]],
    ]
)

# Biases (set to zeros for this test)
b = np.zeros(2)

# Creating an instance of the Conv2d layer
conv_layer = Conv2d(in_channels=1, out_channels=2, filter_size=(3, 3))

# Manually setting the weights and biases to the specified values
conv_layer.W = w.reshape(2, 1, 3, 3)
conv_layer.b = b

# Forward Propagation
print("\n--- Forward Propagation ---")
output_a = conv_layer.forward(x)
print(f"Output (A):\n{output_a}")

# Expected output from the problem description
expected_output_a = np.array([[[-4, -4], [-4, -4]], [[1, 1], [1, 1]]])
print(f"\nExpected Output:\n{expected_output_a}")

# Checking if the calculated output matches the expected output
print(f"\nForward pass matches expected output: {np.allclose(output_a, expected_output_a)}")


# Backward Propagation
print("\n--- Backward Propagation ---")

# Error from the subsequent layer (dA)
# Shape: (batch_size=2, out_channels=2, H_out=2, W_out=2)
delta = np.array([[[-4, -4], [10, 11]], [[1, -7], [1, -11]]])
da_to_pass = delta.reshape(1, 2, 2, 2)

# Passing the gradient to the backward method
dx_to_pass = conv_layer.backward(da_to_pass)

print(f"Gradient to previous layer (dX):\n{dx_to_pass}")

# Manually calculating the expected dX
expected_dx = np.array([[[-5, 4, 0, 0], [13, 27, -4, 0], [1, 1, 10, 0], [0, 0, 0, 0]]])
print(f"\nExpected dX (based on manual calculation):\n{expected_dx}")

# Checking if the calculated dX matches the manually calculated dX
print(f"\nBackward pass matches expected dX: {np.allclose(dx_to_pass, expected_dx)}")

--- Problem 2: Testing Forward and Backward Propagation ---

--- Forward Propagation ---
Output (A):
[[[[-4. -4.]
   [-4. -4.]]

  [[ 1.  1.]
   [ 1.  1.]]]]

Expected Output:
[[[-4 -4]
  [-4 -4]]

 [[ 1  1]
  [ 1  1]]]

Forward pass matches expected output: True

--- Backward Propagation ---
Gradient to previous layer (dX):
[[[[  0.   0.   0.   0.]
   [  0.  -5.   4.  -7.]
   [  0.  13.  27. -11.]
   [  0. -10. -11.   0.]]]]

Expected dX (based on manual calculation):
[[[-5  4  0  0]
  [13 27 -4  0]
  [ 1  1 10  0]
  [ 0  0  0  0]]]

Backward pass matches expected dX: False


**4. Creation of maximum pooling layer**

In [None]:
import numpy as np

class Conv2d:
    """
    A 2D convolutional layer implemented from scratch using NumPy.

    This class performs the forward and backward passes for a convolutional
    layer, following the mathematical formulas provided. It assumes a
    data format of (batch_size, channels, height, width) (NCHW).
    For simplicity, this implementation does not include support for
    padding or strides other than 1, as implied by the given formulas.
    """

    def __init__(self, in_channels, out_channels, filter_size):
        """
        Initializes the Conv2d layer with random weights and zero biases.

        Args:
            in_channels (int): The number of channels in the input array.
            out_channels (int): The number of channels in the output array (number of filters).
            filter_size (tuple): The height and width of the filter, e.g., (3, 3).
        """
        self.in_channels = in_channels
        self.out_channels = out_channels
        self.filter_h, self.filter_w = filter_size

        # Initialize weights (W) and biases (b).
        self.W = np.random.randn(out_channels, in_channels, self.filter_h, self.filter_w) * 0.01

        # Biases are initialized to zeros. Shape: (out_channels,)
        self.b = np.zeros(out_channels)

        # Gradients for weights and biases, to be calculated during backward pass.
        self.dW = None
        self.db = None

        # Storing the input array (X) for use in the backward pass.
        self.X = None

    def forward(self, X):
        """
        Performs the forward propagation for the convolutional layer.

        Args:
            X (np.ndarray): The input array of shape (N, C, H, W).
                            N: batch size, C: in_channels, H: input height, W: input width.

        Returns:
            np.ndarray: The output array (A) after convolution, of shape (N, M, H_out, W_out).
                        N: batch size, M: out_channels, H_out, W_out: output dimensions.
        """
        # Storing the input for the backward pass
        self.X = X

        # Getting input dimensions
        N, C, H, W = X.shape

        # Calculating output dimensions.
        N_out_h = H - self.filter_h + 1
        N_out_w = W - self.filter_w + 1

        # Initializing the output array with zeros.
        A = np.zeros((N, self.out_channels, N_out_h, N_out_w))

        # Performing the convolution using nested loops.
        for n in range(N):
            for m in range(self.out_channels):
                for i in range(N_out_h):
                    for j in range(N_out_w):
                        # Extracting the receptive field (region of the input being convolved)
                        receptive_field = self.X[n, :, i:i + self.filter_h, j:j + self.filter_w]

                        # Performing the element-wise multiplication and summation.
                        A[n, m, i, j] = np.sum(receptive_field * self.W[m]) + self.b[m]

        return A

    def backward(self, dA):
        """
        Performs the backpropagation for the convolutional layer.

        This method calculates the gradients for the weights (dW) and biases (db)
        and the error to be passed to the previous layer (dX).

        Args:
            dA (np.ndarray): The gradient from the subsequent layer, of the same
                             shape as the output of the forward pass.

        Returns:
            np.ndarray: The gradient to be passed to the previous layer, of the
                        same shape as the input of the forward pass.
        """
        # Getting dimensions of input and output arrays
        N, C_in, H_in, W_in = self.X.shape
        N_out, M_out, H_out, W_out = dA.shape

        # Initializing gradients to zeros
        self.dW = np.zeros(self.W.shape)
        self.db = np.zeros(self.b.shape)
        dX = np.zeros(self.X.shape)

        # Loop to calculate dW and db.
        for n in range(N):
            for m in range(M_out):
                for i in range(H_out):
                    for j in range(W_out):
                        # Extracting the receptive field from the original input
                        receptive_field = self.X[n, :, i:i + self.filter_h, j:j + self.filter_w]

                        # Adding the contribution of this specific output element to the gradients
                        self.dW[m] += dA[n, m, i, j] * receptive_field

                # The sum of all ∂L/∂a_i,j,m for a given output channel gives the bias gradient.
                self.db[m] = np.sum(dA[n, m])

        # Loop to calculate the error to be passed to the previous layer (dX).
        for n in range(N):
            for k in range(C_in):
                for i in range(H_in):
                    for j in range(W_in):
                        sum_val = 0
                        for m in range(M_out):
                            for s in range(self.filter_h):
                                for t in range(self.filter_w):
                                    # Checking for valid indices
                                    if 0 <= (i - s) < H_out and 0 <= (j - t) < W_out:
                                        sum_val += dA[n, m, i - s, j - t] * self.W[m, k, s, t]
                        dX[n, k, i, j] = sum_val

        return dX

    @staticmethod
    def calculate_output_size(input_size, filter_size, padding, stride):
        """
        Calculates the output size of a 2D convolutional layer based on the
        input size, filter size, padding, and stride.

        Args:
            input_size (tuple): A tuple (input_h, input_w) representing the input size.
            filter_size (tuple): A tuple (filter_h, filter_w) for the filter size.
            padding (tuple): A tuple (padding_h, padding_w) for the padding.
            stride (tuple): A tuple (stride_h, stride_w) for the stride.

        Returns:
            tuple: A tuple (output_h, output_w) representing the output size.
        """
        input_h, input_w = input_size
        filter_h, filter_w = filter_size
        padding_h, padding_w = padding
        stride_h, stride_w = stride

        # Calculating the output height and width using the provided formulas.
        output_h = int((input_h + 2 * padding_h - filter_h) / stride_h) + 1
        output_w = int((input_w + 2 * padding_w - filter_w) / stride_w) + 1

        return output_h, output_w


class MaxPool2D:
    """
    A 2D maximum pooling layer implemented from scratch using NumPy.

    This class performs downsampling by taking the maximum value in a
    specific window. It retains the indices of the maximum values for
    the backward pass.
    """
    def __init__(self, pool_size, stride):
        """
        Initializes the MaxPool2D layer.

        Args:
            pool_size (tuple): The height and width of the pooling window.
            stride (tuple): The stride for the pooling window.
        """
        self.pool_h, self.pool_w = pool_size
        self.stride_h, self.stride_w = stride

        # Storing the indices of the maximum values for the backward pass
        self.max_indices = None
        self.X = None

    def forward(self, X):
        """
        Performs the forward propagation for the max pooling layer.

        Args:
            X (np.ndarray): The input array of shape (N, C, H, W).

        Returns:
            np.ndarray: The output array after pooling.
        """
        self.X = X
        N, C, H, W = X.shape

        # Calculating output dimensions
        output_h = int((H - self.pool_h) / self.stride_h) + 1
        output_w = int((W - self.pool_w) / self.stride_w) + 1

        # Initializing output array and a mask to store the max indices
        A = np.zeros((N, C, output_h, output_w))
        self.max_indices = np.zeros_like(X, dtype=bool)

        for n in range(N):
            for c in range(C):
                for i in range(output_h):
                    for j in range(output_w):
                        # Defining the pooling region
                        h_start = i * self.stride_h
                        w_start = j * self.stride_w
                        h_end = h_start + self.pool_h
                        w_end = w_start + self.pool_w

                        region = self.X[n, c, h_start:h_end, w_start:w_end]

                        # Finding the maximum value in the region and its index
                        max_val = np.max(region)
                        max_val_idx = np.argmax(region)

                        # Storing the maximum value in the output array
                        A[n, c, i, j] = max_val

                        # Updating the mask with the correct indices
                        h_idx = h_start + max_val_idx // self.pool_w
                        w_idx = w_start + max_val_idx % self.pool_w

                        self.max_indices[n, c, h_idx, w_idx] = True

        return A

    def backward(self, dA):
        """
        Performs the backpropagation for the max pooling layer.

        The error is passed only to the neuron that had the maximum
        activation during the forward pass.

        Args:
            dA (np.ndarray): The gradient from the subsequent layer, of the
                             same shape as the output of the forward pass.

        Returns:
            np.ndarray: The gradient to be passed to the previous layer.
        """
        # Initializing the gradient to the previous layer with zeros
        dX = np.zeros_like(self.X)
        dX[self.max_indices] = dA.ravel()

        return dX


# Problem 2: Testing with small arrays
print("--- Problem 2: Testing Conv2d Forward and Backward Propagation ---")

# Input data for the forward pass
# Shape: (batch_size=1, in_channels=1, height=4, width=4)
x = np.array([[[[1, 2, 3, 4], [5, 6, 7, 8], [9, 10, 11, 12], [13, 14, 15, 16]]]])

# Weights (filters) for the convolutional layer
# Shape: (out_channels=2, in_channels=1, filter_h=3, filter_w=3)
w = np.array(
    [
        [[0.0, 0.0, 0.0], [0.0, 1.0, 0.0], [0.0, -1.0, 0.0]],
        [[0.0, 0.0, 0.0], [0.0, -1.0, 1.0], [0.0, 0.0, 0.0]],
    ]
)

# Biases (set to zeros for this test)
b = np.zeros(2)

# Creating an instance of the Conv2d layer
conv_layer = Conv2d(in_channels=1, out_channels=2, filter_size=(3, 3))

# Manually setting the weights and biases to the specified values
conv_layer.W = w.reshape(2, 1, 3, 3)
conv_layer.b = b

# Forward Propagation
print("\n--- Forward Propagation ---")
output_a = conv_layer.forward(x)
print(f"Output (A):\n{output_a}")

# Expected output from the problem description
expected_output_a = np.array([[[-4, -4], [-4, -4]], [[1, 1], [1, 1]]])
print(f"\nExpected Output:\n{expected_output_a}")

# Checking if the calculated output matches the expected output
print(f"\nForward pass matches expected output: {np.allclose(output_a, expected_output_a)}")


# Backward Propagation
print("\n--- Backward Propagation ---")

# Error from the subsequent layer (dA)
# Shape: (batch_size=2, out_channels=2, H_out=2, W_out=2)
delta = np.array([[[-4, -4], [10, 11]], [[1, -7], [1, -11]]])
da_to_pass = delta.reshape(1, 2, 2, 2)

# Passing the gradient to the backward method
dx_to_pass = conv_layer.backward(da_to_pass)

print(f"Gradient to previous layer (dX):\n{dx_to_pass}")

# Manually calculating the expected dX
expected_dx = np.array([[[-5, 4, 0, 0], [13, 27, -4, 0], [1, 1, 10, 0], [0, 0, 0, 0]]])
print(f"\nExpected dX (based on manual calculation):\n{expected_dx}")

# Checking if the calculated dX matches the manually calculated dX
print(f"\nBackward pass matches expected dX: {np.allclose(dx_to_pass, expected_dx)}")


# Problem 4: Testing MaxPool2D
print("\n\n--- Problem 4: Testing MaxPool2D Layer ---")

# Creating a sample input for pooling
# Shape: (batch_size=1, channels=1, height=4, width=4)
pool_x = np.array([[[[1, 2, 3, 4],
                     [5, 6, 7, 8],
                     [9, 10, 11, 12],
                     [13, 14, 15, 16]]]])

# Defining pooling parameters
pool_size = (2, 2)
stride = (2, 2)

# Creating a MaxPool2D instance
max_pool_layer = MaxPool2D(pool_size=pool_size, stride=stride)

# Forward Pass for Pooling
print("\n--- MaxPool2D Forward Pass ---")
pooled_output = max_pool_layer.forward(pool_x)
print(f"Input for pooling:\n{pool_x[0, 0]}")
print(f"\nPooled Output:\n{pooled_output[0, 0]}")

# Expected pooled output
expected_pooled_output = np.array([[[6, 8], [14, 16]]])
print(f"\nExpected Pooled Output:\n{expected_pooled_output[0, 0]}")
print(f"\nForward pass matches expected output: {np.allclose(pooled_output, expected_pooled_output)}")


# Backward Pass for Pooling
print("\n--- MaxPool2D Backward Pass ---")
# Creating a sample gradient to pass back
# Shape: (1, 1, 2, 2)
pooled_grad = np.array([[[[1, 2], [3, 4]]]])
grad_to_pass = max_pool_layer.backward(pooled_grad)
print(f"Gradient passed to previous layer (dX):\n{grad_to_pass[0, 0]}")

# Manually calculating the expected gradient
expected_grad = np.array([[[[0, 0, 0, 0],
                           [0, 1, 0, 2],
                           [0, 0, 0, 0],
                           [0, 3, 0, 4]]]])
print(f"\nExpected dX:\n{expected_grad[0, 0]}")

# Checking if the calculated dX matches the expected dX
print(f"\nBackward pass matches expected dX: {np.allclose(grad_to_pass, expected_grad)}")

--- Problem 2: Testing Conv2d Forward and Backward Propagation ---

--- Forward Propagation ---
Output (A):
[[[[-4. -4.]
   [-4. -4.]]

  [[ 1.  1.]
   [ 1.  1.]]]]

Expected Output:
[[[-4 -4]
  [-4 -4]]

 [[ 1  1]
  [ 1  1]]]

Forward pass matches expected output: True

--- Backward Propagation ---
Gradient to previous layer (dX):
[[[[  0.   0.   0.   0.]
   [  0.  -5.   4.  -7.]
   [  0.  13.  27. -11.]
   [  0. -10. -11.   0.]]]]

Expected dX (based on manual calculation):
[[[-5  4  0  0]
  [13 27 -4  0]
  [ 1  1 10  0]
  [ 0  0  0  0]]]

Backward pass matches expected dX: False


--- Problem 4: Testing MaxPool2D Layer ---

--- MaxPool2D Forward Pass ---
Input for pooling:
[[ 1  2  3  4]
 [ 5  6  7  8]
 [ 9 10 11 12]
 [13 14 15 16]]

Pooled Output:
[[ 6.  8.]
 [14. 16.]]

Expected Pooled Output:
[6 8]

Forward pass matches expected output: True

--- MaxPool2D Backward Pass ---
Gradient passed to previous layer (dX):
[[0 0 0 0]
 [0 1 0 2]
 [0 0 0 0]
 [0 3 0 4]]

Expected dX:
[[0 0 0 

**5. Creating average pooling**

In [None]:
import numpy as np

class Conv2d:
    """
    A 2D convolutional layer implemented from scratch using NumPy.

    This class performs the forward and backward passes for a convolutional
    layer, following the mathematical formulas provided. It assumes a
    data format of (batch_size, channels, height, width) (NCHW).
    For simplicity, this implementation does not include support for
    padding or strides other than 1, as implied by the given formulas.
    """

    def __init__(self, in_channels, out_channels, filter_size):
        """
        Initializes the Conv2d layer with random weights and zero biases.

        Args:
            in_channels (int): The number of channels in the input array.
            out_channels (int): The number of channels in the output array (number of filters).
            filter_size (tuple): The height and width of the filter, e.g., (3, 3).
        """
        self.in_channels = in_channels
        self.out_channels = out_channels
        self.filter_h, self.filter_w = filter_size

        # Initialize weights (W) and biases (b).
        self.W = np.random.randn(out_channels, in_channels, self.filter_h, self.filter_w) * 0.01

        # Biases are initialized to zeros. Shape: (out_channels,)
        self.b = np.zeros(out_channels)

        # Gradients for weights and biases, to be calculated during backward pass.
        self.dW = None
        self.db = None

        # Storing the input array (X) for use in the backward pass.
        self.X = None

    def forward(self, X):
        """
        Performs the forward propagation for the convolutional layer.

        Args:
            X (np.ndarray): The input array of shape (N, C, H, W).
                            N: batch size, C: in_channels, H: input height, W: input width.

        Returns:
            np.ndarray: The output array (A) after convolution, of shape (N, M, H_out, W_out).
                        N: batch size, M: out_channels, H_out, W_out: output dimensions.
        """
        # Storing the input for the backward pass
        self.X = X

        # Getting input dimensions
        N, C, H, W = X.shape

        # Calculating output dimensions.
        N_out_h = H - self.filter_h + 1
        N_out_w = W - self.filter_w + 1

        # Initializing the output array with zeros.
        A = np.zeros((N, self.out_channels, N_out_h, N_out_w))

        # Performing the convolution using nested loops.
        for n in range(N):
            for m in range(self.out_channels):
                for i in range(N_out_h):
                    for j in range(N_out_w):
                        # Extracting the receptive field (region of the input being convolved)
                        receptive_field = self.X[n, :, i:i + self.filter_h, j:j + self.filter_w]

                        # Perform the element-wise multiplication and summation.
                        A[n, m, i, j] = np.sum(receptive_field * self.W[m]) + self.b[m]

        return A

    def backward(self, dA):
        """
        Performs the backpropagation for the convolutional layer.

        This method calculates the gradients for the weights (dW) and biases (db)
        and the error to be passed to the previous layer (dX).

        Args:
            dA (np.ndarray): The gradient from the subsequent layer, of the same
                             shape as the output of the forward pass.

        Returns:
            np.ndarray: The gradient to be passed to the previous layer, of the
                        same shape as the input of the forward pass.
        """
        # Getting dimensions of input and output arrays
        N, C_in, H_in, W_in = self.X.shape
        N_out, M_out, H_out, W_out = dA.shape

        # Initializing gradients to zeros
        self.dW = np.zeros(self.W.shape)
        self.db = np.zeros(self.b.shape)
        dX = np.zeros(self.X.shape)

        # Loop to calculate dW and db using the provided formulas.
        for n in range(N):
            for m in range(M_out):
                for i in range(H_out):
                    for j in range(W_out):
                        # Extracting the receptive field from the original input
                        receptive_field = self.X[n, :, i:i + self.filter_h, j:j + self.filter_w]

                        # Add the contribution of this specific output element to the gradients
                        self.dW[m] += dA[n, m, i, j] * receptive_field

                # The sum of all ∂L/∂a_i,j,m for a given output channel gives the bias gradient.
                self.db[m] = np.sum(dA[n, m])

        # Loop to calculate the error to be passed to the previous layer (dX).
        for n in range(N):
            for k in range(C_in):
                for i in range(H_in):
                    for j in range(W_in):
                        sum_val = 0
                        for m in range(M_out):
                            for s in range(self.filter_h):
                                for t in range(self.filter_w):
                                    # Check for valid indices
                                    if 0 <= (i - s) < H_out and 0 <= (j - t) < W_out:
                                        sum_val += dA[n, m, i - s, j - t] * self.W[m, k, s, t]
                        dX[n, k, i, j] = sum_val

        return dX

    @staticmethod
    def calculate_output_size(input_size, filter_size, padding, stride):
        """
        Calculates the output size of a 2D convolutional layer based on the
        input size, filter size, padding, and stride.

        Args:
            input_size (tuple): A tuple (input_h, input_w) representing the input size.
            filter_size (tuple): A tuple (filter_h, filter_w) for the filter size.
            padding (tuple): A tuple (padding_h, padding_w) for the padding.
            stride (tuple): A tuple (stride_h, stride_w) for the stride.

        Returns:
            tuple: A tuple (output_h, output_w) representing the output size.
        """
        input_h, input_w = input_size
        filter_h, filter_w = filter_size
        padding_h, padding_w = padding
        stride_h, stride_w = stride

        # Calculating the output height and width.
        output_h = int((input_h + 2 * padding_h - filter_h) / stride_h) + 1
        output_w = int((input_w + 2 * padding_w - filter_w) / stride_w) + 1

        return output_h, output_w


class MaxPool2D:
    """
    A 2D maximum pooling layer implemented from scratch using NumPy.

    This class performs downsampling by taking the maximum value in a
    specific window. It retains the indices of the maximum values for
    the backward pass.
    """
    def __init__(self, pool_size, stride):
        """
        Initializes the MaxPool2D layer.

        Args:
            pool_size (tuple): The height and width of the pooling window.
            stride (tuple): The stride for the pooling window.
        """
        self.pool_h, self.pool_w = pool_size
        self.stride_h, self.stride_w = stride

        # Storing the indices of the maximum values for the backward pass
        self.max_indices = None
        self.X = None

    def forward(self, X):
        """
        Performs the forward propagation for the max pooling layer.

        Args:
            X (np.ndarray): The input array of shape (N, C, H, W).

        Returns:
            np.ndarray: The output array after pooling.
        """
        self.X = X
        N, C, H, W = X.shape

        # Calculating output dimensions
        output_h = int((H - self.pool_h) / self.stride_h) + 1
        output_w = int((W - self.pool_w) / self.stride_w) + 1

        # Initializing output array and a mask to store the max indices
        A = np.zeros((N, C, output_h, output_w))
        self.max_indices = np.zeros_like(X, dtype=bool)

        for n in range(N):
            for c in range(C):
                for i in range(output_h):
                    for j in range(output_w):
                        # Defining the pooling region
                        h_start = i * self.stride_h
                        w_start = j * self.stride_w
                        h_end = h_start + self.pool_h
                        w_end = w_start + self.pool_w

                        region = self.X[n, c, h_start:h_end, w_start:w_end]

                        # Finding the maximum value in the region and its index
                        max_val = np.max(region)
                        max_val_idx = np.argmax(region)

                        # Storing the maximum value in the output array
                        A[n, c, i, j] = max_val

                        # Updating the mask with the correct indices
                        h_idx = h_start + max_val_idx // self.pool_w
                        w_idx = w_start + max_val_idx % self.pool_w

                        self.max_indices[n, c, h_idx, w_idx] = True

        return A

    def backward(self, dA):
        """
        Performs the backpropagation for the max pooling layer.

        The error is passed only to the neuron that had the maximum
        activation during the forward pass.

        Args:
            dA (np.ndarray): The gradient from the subsequent layer, of the
                             same shape as the output of the forward pass.

        Returns:
            np.ndarray: The gradient to be passed to the previous layer.
        """
        # Initializing the gradient to the previous layer with zeros
        dX = np.zeros_like(self.X)
        dX[self.max_indices] = dA.ravel()

        return dX

class AveragePool2D:
    """
    A 2D average pooling layer implemented from scratch using NumPy.

    This class performs downsampling by taking the average value in a
    specific window.
    """
    def __init__(self, pool_size, stride):
        """
        Initializes the AveragePool2D layer.

        Args:
            pool_size (tuple): The height and width of the pooling window.
            stride (tuple): The stride for the pooling window.
        """
        self.pool_h, self.pool_w = pool_size
        self.stride_h, self.stride_w = stride

        # Storing the input for the backward pass
        self.X = None

    def forward(self, X):
        """
        Performs the forward propagation for the average pooling layer.

        Args:
            X (np.ndarray): The input array of shape (N, C, H, W).

        Returns:
            np.ndarray: The output array after pooling.
        """
        self.X = X
        N, C, H, W = X.shape

        # Calculating output dimensions
        output_h = int((H - self.pool_h) / self.stride_h) + 1
        output_w = int((W - self.pool_w) / self.stride_w) + 1

        # Initializing output array
        A = np.zeros((N, C, output_h, output_w))

        for n in range(N):
            for c in range(C):
                for i in range(output_h):
                    for j in range(output_w):
                        # Defining the pooling region
                        h_start = i * self.stride_h
                        w_start = j * self.stride_w
                        h_end = h_start + self.pool_h
                        w_end = w_start + self.pool_w

                        region = self.X[n, c, h_start:h_end, w_start:w_end]

                        # Calculating the average of the region
                        avg_val = np.mean(region)

                        # Storing the average value in the output array
                        A[n, c, i, j] = avg_val

        return A

    def backward(self, dA):
        """
        Performs the backpropagation for the average pooling layer.

        The error is distributed equally to all neurons in the pooling
        region, since the forward pass is a sum divided by the number of elements.

        Args:
            dA (np.ndarray): The gradient from the subsequent layer.

        Returns:
            np.ndarray: The gradient to be passed to the previous layer.
        """
        # Initializing the gradient to the previous layer with zeros, ensuring it's a float type
        dX = np.zeros(self.X.shape, dtype=np.float64)
        N, C, H_out, W_out = dA.shape

        # Calculating the number of elements in the pooling window
        pool_size = self.pool_h * self.pool_w

        for n in range(N):
            for c in range(C):
                for i in range(H_out):
                    for j in range(W_out):
                        distributed_grad = dA[n, c, i, j] / pool_size

                        # Defining the region in the original input
                        h_start = i * self.stride_h
                        w_start = j * self.stride_w
                        h_end = h_start + self.pool_h
                        w_end = w_start + self.pool_w

                        # Adding the distributed gradient to the corresponding region in dX
                        dX[n, c, h_start:h_end, w_start:w_end] += distributed_grad

        return dX


# Problem 2: Testing with small arrays
print("--- Problem 2: Testing Conv2d Forward and Backward Propagation ---")

# Input data for the forward pass
# Shape: (batch_size=1, in_channels=1, height=4, width=4)
x = np.array([[[[1, 2, 3, 4], [5, 6, 7, 8], [9, 10, 11, 12], [13, 14, 15, 16]]]])

# Weights (filters) for the convolutional layer
# Shape: (out_channels=2, in_channels=1, filter_h=3, filter_w=3)
w = np.array(
    [
        [[0.0, 0.0, 0.0], [0.0, 1.0, 0.0], [0.0, -1.0, 0.0]],
        [[0.0, 0.0, 0.0], [0.0, -1.0, 1.0], [0.0, 0.0, 0.0]],
    ]
)

# Biases (set to zeros for this test)
b = np.zeros(2)

# Creating an instance of the Conv2d layer
conv_layer = Conv2d(in_channels=1, out_channels=2, filter_size=(3, 3))

# Manually setting the weights and biases to the specified values
conv_layer.W = w.reshape(2, 1, 3, 3)
conv_layer.b = b

# Forward Propagation
print("\n--- Forward Propagation ---")
output_a = conv_layer.forward(x)
print(f"Output (A):\n{output_a}")

# Expected output from the problem description
expected_output_a = np.array([[[-4, -4], [-4, -4]], [[1, 1], [1, 1]]])
print(f"\nExpected Output:\n{expected_output_a}")

# Checking if the calculated output matches the expected output
print(f"\nForward pass matches expected output: {np.allclose(output_a, expected_output_a)}")


# Backward Propagation
print("\n--- Backward Propagation ---")

# Error from the subsequent layer (dA)
# Shape: (batch_size=2, out_channels=2, H_out=2, W_out=2)
delta = np.array([[[-4, -4], [10, 11]], [[1, -7], [1, -11]]])
da_to_pass = delta.reshape(1, 2, 2, 2)

# Passing the gradient to the backward method
dx_to_pass = conv_layer.backward(da_to_pass)

print(f"Gradient to previous layer (dX):\n{dx_to_pass}")

# Manually calculating the expected dX
expected_dx = np.array([[[-5, 4, 0, 0], [13, 27, -4, 0], [1, 1, 10, 0], [0, 0, 0, 0]]])
print(f"\nExpected dX (based on manual calculation):\n{expected_dx}")

# Checking if the calculated dX matches the manually calculated dX
print(f"\nBackward pass matches expected dX: {np.allclose(dx_to_pass, expected_dx)}")


# Problem 4: Testing MaxPool2D
print("\n\n--- Problem 4: Testing MaxPool2D Layer ---")

# Creating a sample input for pooling
# Shape: (batch_size=1, channels=1, height=4, width=4)
pool_x = np.array([[[[1, 2, 3, 4],
                     [5, 6, 7, 8],
                     [9, 10, 11, 12],
                     [13, 14, 15, 16]]]])

# Defining pooling parameters
pool_size = (2, 2)
stride = (2, 2)

# Creating a MaxPool2D instance
max_pool_layer = MaxPool2D(pool_size=pool_size, stride=stride)

# Forward Pass for Pooling
print("\n--- MaxPool2D Forward Pass ---")
pooled_output = max_pool_layer.forward(pool_x)
print(f"Input for pooling:\n{pool_x[0, 0]}")
print(f"\nPooled Output:\n{pooled_output[0, 0]}")

# Expected pooled output
expected_pooled_output = np.array([[[6, 8], [14, 16]]])
print(f"\nExpected Pooled Output:\n{expected_pooled_output[0, 0]}")
print(f"\nForward pass matches expected output: {np.allclose(pooled_output, expected_pooled_output)}")


# Backward Pass for Pooling
print("\n--- MaxPool2D Backward Pass ---")
# Creating a sample gradient to pass back
# Shape: (1, 1, 2, 2)
pooled_grad = np.array([[[[1, 2], [3, 4]]]])
grad_to_pass = max_pool_layer.backward(pooled_grad)
print(f"Gradient passed to previous layer (dX):\n{grad_to_pass[0, 0]}")

# Manually calculating the expected gradient
expected_grad = np.array([[[[0, 0, 0, 0],
                           [0, 1, 0, 2],
                           [0, 0, 0, 0],
                           [0, 3, 0, 4]]]])
print(f"\nExpected dX:\n{expected_grad[0, 0]}")

# Checking if the calculated dX matches the expected dX
print(f"\nBackward pass matches expected dX: {np.allclose(grad_to_pass, expected_grad)}")


# Problem 5: Testing AveragePool2D
print("\n\n--- Problem 5: Testing AveragePool2D Layer ---")

# Creating a sample input for average pooling
# Shape: (batch_size=1, channels=1, height=4, width=4)
avg_pool_x = np.array([[[[1, 2, 3, 4],
                        [5, 6, 7, 8],
                        [9, 10, 11, 12],
                        [13, 14, 15, 16]]]])

# Defining pooling parameters
avg_pool_size = (2, 2)
avg_stride = (2, 2)

# Creating an AveragePool2D instance
avg_pool_layer = AveragePool2D(pool_size=avg_pool_size, stride=avg_stride)

# Forward Pass for Average Pooling
print("\n--- AveragePool2D Forward Pass ---")
avg_pooled_output = avg_pool_layer.forward(avg_pool_x)
print(f"Input for pooling:\n{avg_pool_x[0, 0]}")
print(f"\nAverage Pooled Output:\n{avg_pooled_output[0, 0]}")

# Expected average pooled output
expected_avg_output = np.array([[[3.5, 5.5], [11.5, 13.5]]])
print(f"\nExpected Average Pooled Output:\n{expected_avg_output[0, 0]}")
print(f"\nForward pass matches expected output: {np.allclose(avg_pooled_output, expected_avg_output)}")


# Backward Pass for Average Pooling
print("\n--- AveragePool2D Backward Pass ---")
# Creating a sample gradient to pass back
# Shape: (1, 1, 2, 2)
avg_pooled_grad = np.array([[[[1, 2], [3, 4]]]])
avg_grad_to_pass = avg_pool_layer.backward(avg_pooled_grad)
print(f"Gradient passed to previous layer (dX):\n{avg_grad_to_pass[0, 0]}")

# Manually calculating the expected gradient
expected_avg_grad = np.array([[[[0.25, 0.25, 0.5, 0.5],
                              [0.25, 0.25, 0.5, 0.5],
                              [0.75, 0.75, 1.0, 1.0],
                              [0.75, 0.75, 1.0, 1.0]]]])
print(f"\nExpected dX:\n{expected_avg_grad[0, 0]}")

# Checking if the calculated dX matches the expected dX
print(f"\nBackward pass matches expected dX: {np.allclose(avg_grad_to_pass, expected_avg_grad)}")

--- Problem 2: Testing Conv2d Forward and Backward Propagation ---

--- Forward Propagation ---
Output (A):
[[[[-4. -4.]
   [-4. -4.]]

  [[ 1.  1.]
   [ 1.  1.]]]]

Expected Output:
[[[-4 -4]
  [-4 -4]]

 [[ 1  1]
  [ 1  1]]]

Forward pass matches expected output: True

--- Backward Propagation ---
Gradient to previous layer (dX):
[[[[  0.   0.   0.   0.]
   [  0.  -5.   4.  -7.]
   [  0.  13.  27. -11.]
   [  0. -10. -11.   0.]]]]

Expected dX (based on manual calculation):
[[[-5  4  0  0]
  [13 27 -4  0]
  [ 1  1 10  0]
  [ 0  0  0  0]]]

Backward pass matches expected dX: False


--- Problem 4: Testing MaxPool2D Layer ---

--- MaxPool2D Forward Pass ---
Input for pooling:
[[ 1  2  3  4]
 [ 5  6  7  8]
 [ 9 10 11 12]
 [13 14 15 16]]

Pooled Output:
[[ 6.  8.]
 [14. 16.]]

Expected Pooled Output:
[6 8]

Forward pass matches expected output: True

--- MaxPool2D Backward Pass ---
Gradient passed to previous layer (dX):
[[0 0 0 0]
 [0 1 0 2]
 [0 0 0 0]
 [0 3 0 4]]

Expected dX:
[[0 0 0 

**6. Smoothing**

In [None]:
import numpy as np

class Conv2d:
    """
    A 2D convolutional layer implemented from scratch using NumPy.

    This class performs the forward and backward passes for a convolutional
    layer, following the mathematical formulas provided. It assumes a
    data format of (batch_size, channels, height, width) (NCHW).
    For simplicity, this implementation does not include support for
    padding or strides other than 1, as implied by the given formulas.
    """

    def __init__(self, in_channels, out_channels, filter_size):
        """
        Initializes the Conv2d layer with random weights and zero biases.

        Args:
            in_channels (int): The number of channels in the input array.
            out_channels (int): The number of channels in the output array (number of filters).
            filter_size (tuple): The height and width of the filter, e.g., (3, 3).
        """
        self.in_channels = in_channels
        self.out_channels = out_channels
        self.filter_h, self.filter_w = filter_size

        # Initializing weights (W) and biases (b).
        self.W = np.random.randn(out_channels, in_channels, self.filter_h, self.filter_w) * 0.01

        # Biases are initialized to zeros. Shape: (out_channels,)
        self.b = np.zeros(out_channels)

        # Gradients for weights and biases, to be calculated during backward pass.
        self.dW = None
        self.db = None

        # Storing the input array (X) for use in the backward pass.
        self.X = None

    def forward(self, X):
        """
        Performs the forward propagation for the convolutional layer.

        Args:
            X (np.ndarray): The input array of shape (N, C, H, W).
                            N: batch size, C: in_channels, H: input height, W: input width.

        Returns:
            np.ndarray: The output array (A) after convolution, of shape (N, M, H_out, W_out).
                        N: batch size, M: out_channels, H_out, W_out: output dimensions.
        """
        # Storing the input for the backward pass
        self.X = X

        # Getting input dimensions
        N, C, H, W = X.shape

        # Calculating output dimensions.
        N_out_h = H - self.filter_h + 1
        N_out_w = W - self.filter_w + 1

        # Initializing the output array with zeros.
        A = np.zeros((N, self.out_channels, N_out_h, N_out_w))

        # Performing the convolution using nested loops.
        for n in range(N):
            for m in range(self.out_channels):
                for i in range(N_out_h):
                    for j in range(N_out_w):
                        # Extracting the receptive field (region of the input being convolved)
                        receptive_field = self.X[n, :, i:i + self.filter_h, j:j + self.filter_w]

                        # Performing the element-wise multiplication and summation.
                        A[n, m, i, j] = np.sum(receptive_field * self.W[m]) + self.b[m]

        return A

    def backward(self, dA):
        """
        Performs the backpropagation for the convolutional layer.

        This method calculates the gradients for the weights (dW) and biases (db)
        and the error to be passed to the previous layer (dX).

        Args:
            dA (np.ndarray): The gradient from the subsequent layer, of the same
                             shape as the output of the forward pass.

        Returns:
            np.ndarray: The gradient to be passed to the previous layer, of the
                        same shape as the input of the forward pass.
        """
        # Getting dimensions of input and output arrays
        N, C_in, H_in, W_in = self.X.shape
        N_out, M_out, H_out, W_out = dA.shape

        # Initializing gradients to zeros
        self.dW = np.zeros(self.W.shape)
        self.db = np.zeros(self.b.shape)
        dX = np.zeros(self.X.shape)

        # Loop to calculate dW and db using the provided formulas.
        for n in range(N):
            for m in range(M_out):
                for i in range(H_out):
                    for j in range(W_out):
                        # Extracting the receptive field from the original input
                        receptive_field = self.X[n, :, i:i + self.filter_h, j:j + self.filter_w]

                        # Adding the contribution of this specific output element to the gradients
                        self.dW[m] += dA[n, m, i, j] * receptive_field

                # The sum of all ∂L/∂a_i,j,m for a given output channel gives the bias gradient.
                self.db[m] = np.sum(dA[n, m])

        # Loop to calculate the error to be passed to the previous layer (dX).
        for n in range(N):
            for k in range(C_in):
                for i in range(H_in):
                    for j in range(W_in):
                        sum_val = 0
                        for m in range(M_out):
                            for s in range(self.filter_h):
                                for t in range(self.filter_w):
                                    # Checking for valid indices
                                    if 0 <= (i - s) < H_out and 0 <= (j - t) < W_out:
                                        sum_val += dA[n, m, i - s, j - t] * self.W[m, k, s, t]
                        dX[n, k, i, j] = sum_val

        return dX

    @staticmethod
    def calculate_output_size(input_size, filter_size, padding, stride):
        """
        Calculates the output size of a 2D convolutional layer based on the
        input size, filter size, padding, and stride.

        Args:
            input_size (tuple): A tuple (input_h, input_w) representing the input size.
            filter_size (tuple): A tuple (filter_h, filter_w) for the filter size.
            padding (tuple): A tuple (padding_h, padding_w) for the padding.
            stride (tuple): A tuple (stride_h, stride_w) for the stride.

        Returns:
            tuple: A tuple (output_h, output_w) representing the output size.
        """
        input_h, input_w = input_size
        filter_h, filter_w = filter_size
        padding_h, padding_w = padding
        stride_h, stride_w = stride

        # Calculating the output height and width using the provided formulas.
        output_h = int((input_h + 2 * padding_h - filter_h) / stride_h) + 1
        output_w = int((input_w + 2 * padding_w - filter_w) / stride_w) + 1

        return output_h, output_w


class MaxPool2D:
    """
    A 2D maximum pooling layer implemented from scratch using NumPy.

    This class performs downsampling by taking the maximum value in a
    specific window. It retains the indices of the maximum values for
    the backward pass.
    """
    def __init__(self, pool_size, stride):
        """
        Initializes the MaxPool2D layer.

        Args:
            pool_size (tuple): The height and width of the pooling window.
            stride (tuple): The stride for the pooling window.
        """
        self.pool_h, self.pool_w = pool_size
        self.stride_h, self.stride_w = stride

        # Storing the indices of the maximum values for the backward pass
        self.max_indices = None
        self.X = None

    def forward(self, X):
        """
        Performs the forward propagation for the max pooling layer.

        Args:
            X (np.ndarray): The input array of shape (N, C, H, W).

        Returns:
            np.ndarray: The output array after pooling.
        """
        self.X = X
        N, C, H, W = X.shape

        # Calculating output dimensions
        output_h = int((H - self.pool_h) / self.stride_h) + 1
        output_w = int((W - self.pool_w) / self.stride_w) + 1

        # Initializing output array and a mask to store the max indices
        A = np.zeros((N, C, output_h, output_w))
        self.max_indices = np.zeros_like(X, dtype=bool)

        for n in range(N):
            for c in range(C):
                for i in range(output_h):
                    for j in range(output_w):
                        # Defining the pooling region
                        h_start = i * self.stride_h
                        w_start = j * self.stride_w
                        h_end = h_start + self.pool_h
                        w_end = w_start + self.pool_w

                        region = self.X[n, c, h_start:h_end, w_start:w_end]

                        # Finding the maximum value in the region and its index
                        max_val = np.max(region)
                        max_val_idx = np.argmax(region)

                        # Store the maximum value in the output array
                        A[n, c, i, j] = max_val

                        # Updating the mask with the correct indices
                        h_idx = h_start + max_val_idx // self.pool_w
                        w_idx = w_start + max_val_idx % self.pool_w

                        self.max_indices[n, c, h_idx, w_idx] = True

        return A

    def backward(self, dA):
        """
        Performs the backpropagation for the max pooling layer.

        The error is passed only to the neuron that had the maximum
        activation during the forward pass.

        Args:
            dA (np.ndarray): The gradient from the subsequent layer, of the
                             same shape as the output of the forward pass.

        Returns:
            np.ndarray: The gradient to be passed to the previous layer.
        """
        # Initializing the gradient to the previous layer with zeros
        dX = np.zeros_like(self.X)
        dX[self.max_indices] = dA.ravel()

        return dX

class AveragePool2D:
    """
    A 2D average pooling layer implemented from scratch using NumPy.

    This class performs downsampling by taking the average value in a
    specific window.
    """
    def __init__(self, pool_size, stride):
        """
        Initializes the AveragePool2D layer.

        Args:
            pool_size (tuple): The height and width of the pooling window.
            stride (tuple): The stride for the pooling window.
        """
        self.pool_h, self.pool_w = pool_size
        self.stride_h, self.stride_w = stride

        # Storing the input for the backward pass
        self.X = None

    def forward(self, X):
        """
        Performs the forward propagation for the average pooling layer.

        Args:
            X (np.ndarray): The input array of shape (N, C, H, W).

        Returns:
            np.ndarray: The output array after pooling.
        """
        self.X = X
        N, C, H, W = X.shape

        # Calculating output dimensions
        output_h = int((H - self.pool_h) / self.stride_h) + 1
        output_w = int((W - self.pool_w) / self.stride_w) + 1

        # Initializing output array
        A = np.zeros((N, C, output_h, output_w))

        for n in range(N):
            for c in range(C):
                for i in range(output_h):
                    for j in range(output_w):
                        # Defining the pooling region
                        h_start = i * self.stride_h
                        w_start = j * self.stride_w
                        h_end = h_start + self.pool_h
                        w_end = w_start + self.pool_w

                        region = self.X[n, c, h_start:h_end, w_start:w_end]

                        # Calculating the average of the region
                        avg_val = np.mean(region)

                        # Storing the average value in the output array
                        A[n, c, i, j] = avg_val

        return A

    def backward(self, dA):
        """
        Performs the backpropagation for the average pooling layer.

        The error is distributed equally to all neurons in the pooling
        region, since the forward pass is a sum divided by the number of elements.

        Args:
            dA (np.ndarray): The gradient from the subsequent layer.

        Returns:
            np.ndarray: The gradient to be passed to the previous layer.
        """
        # Initializing the gradient to the previous layer with zeros, ensuring it's a float type
        dX = np.zeros(self.X.shape, dtype=np.float64)
        N, C, H_out, W_out = dA.shape

        # Calculating the number of elements in the pooling window
        pool_size = self.pool_h * self.pool_w

        for n in range(N):
            for c in range(C):
                for i in range(H_out):
                    for j in range(W_out):
                        # The error is divided evenly across all elements
                        distributed_grad = dA[n, c, i, j] / pool_size

                        # Defining the region in the original input
                        h_start = i * self.stride_h
                        w_start = j * self.stride_w
                        h_end = h_start + self.pool_h
                        w_end = w_start + self.pool_w

                        # Adding the distributed gradient to the corresponding region in dX
                        dX[n, c, h_start:h_end, w_start:w_end] += distributed_grad

        return dX

class Flatten:
    """
    A flattening layer that reshapes a multi-dimensional input into a
    one-dimensional vector. This is typically used to transition from
    convolutional layers to fully-connected layers.
    """
    def __init__(self):
        """
        Initializes the Flatten layer. It doesn't have any parameters,
        but it needs to store the input shape for the backward pass.
        """
        # Stores the original shape of the input array.
        self.input_shape = None

    def forward(self, X):
        """
        Performs the forward propagation for the flattening layer.

        Args:
            X (np.ndarray): The input array of shape (N, C, H, W).
                            N: batch size, C: channels, H: height, W: width.

        Returns:
            np.ndarray: The flattened output array of shape (N, C*H*W).
        """
        # Storing the original shape for the backward pass
        self.input_shape = X.shape

        # Reshaping the input array to a 2D array
        flattened_output = X.reshape(X.shape[0], -1)

        return flattened_output

    def backward(self, dA):
        """
        Performs the backpropagation for the flattening layer.

        It reshapes the gradient from the subsequent layer back into
        the original input shape.

        Args:
            dA (np.ndarray): The gradient from the subsequent layer,
                             of shape (N, C*H*W).

        Returns:
            np.ndarray: The reshaped gradient to be passed to the previous
                        layer, of the same shape as the original input.
        """
        # Reshape the incoming gradient back to the original input shape
        reshaped_grad = dA.reshape(self.input_shape)

        return reshaped_grad


# Problem 2: Testing with small arrays
print("--- Problem 2: Testing Conv2d Forward and Backward Propagation ---")

# Input data for the forward pass
# Shape: (batch_size=1, in_channels=1, height=4, width=4)
x = np.array([[[[1, 2, 3, 4], [5, 6, 7, 8], [9, 10, 11, 12], [13, 14, 15, 16]]]])

# Weights (filters) for the convolutional layer
# Shape: (out_channels=2, in_channels=1, filter_h=3, filter_w=3)
w = np.array(
    [
        [[0.0, 0.0, 0.0], [0.0, 1.0, 0.0], [0.0, -1.0, 0.0]],
        [[0.0, 0.0, 0.0], [0.0, -1.0, 1.0], [0.0, 0.0, 0.0]],
    ]
)

# Biases (set to zeros for this test)
b = np.zeros(2)

# Creating an instance of the Conv2d layer
conv_layer = Conv2d(in_channels=1, out_channels=2, filter_size=(3, 3))

# Manually setting the weights and biases to the specified values
conv_layer.W = w.reshape(2, 1, 3, 3)
conv_layer.b = b

# Forward Propagation
print("\n--- Forward Propagation ---")
output_a = conv_layer.forward(x)
print(f"Output (A):\n{output_a}")

# Expected output from the problem description
expected_output_a = np.array([[[-4, -4], [-4, -4]], [[1, 1], [1, 1]]])
print(f"\nExpected Output:\n{expected_output_a}")

# Checking if the calculated output matches the expected output
print(f"\nForward pass matches expected output: {np.allclose(output_a, expected_output_a)}")


# Backward Propagation
print("\n--- Backward Propagation ---")

# Error from the subsequent layer (dA)
# Shape: (batch_size=2, out_channels=2, H_out=2, W_out=2)
delta = np.array([[[-4, -4], [10, 11]], [[1, -7], [1, -11]]])
da_to_pass = delta.reshape(1, 2, 2, 2)

# Passing the gradient to the backward method
dx_to_pass = conv_layer.backward(da_to_pass)

print(f"Gradient to previous layer (dX):\n{dx_to_pass}")

# Manually calculating the expected dX
expected_dx = np.array([[[-5, 4, 0, 0], [13, 27, -4, 0], [1, 1, 10, 0], [0, 0, 0, 0]]])
print(f"\nExpected dX (based on manual calculation):\n{expected_dx}")

# Checking if the calculated dX matches the manually calculated dX
print(f"\nBackward pass matches expected dX: {np.allclose(dx_to_pass, expected_dx)}")


# Problem 4: Testing MaxPool2D
print("\n\n--- Problem 4: Testing MaxPool2D Layer ---")

# Creating a sample input for pooling
# Shape: (batch_size=1, channels=1, height=4, width=4)
pool_x = np.array([[[[1, 2, 3, 4],
                     [5, 6, 7, 8],
                     [9, 10, 11, 12],
                     [13, 14, 15, 16]]]])

# Defining pooling parameters
pool_size = (2, 2)
stride = (2, 2)

# Creating a MaxPool2D instance
max_pool_layer = MaxPool2D(pool_size=pool_size, stride=stride)

# Forward Pass for Pooling
print("\n--- MaxPool2D Forward Pass ---")
pooled_output = max_pool_layer.forward(pool_x)
print(f"Input for pooling:\n{pool_x[0, 0]}")
print(f"\nPooled Output:\n{pooled_output[0, 0]}")

# Expected pooled output
expected_pooled_output = np.array([[[6, 8], [14, 16]]])
print(f"\nExpected Pooled Output:\n{expected_pooled_output[0, 0]}")
print(f"\nForward pass matches expected output: {np.allclose(pooled_output, expected_pooled_output)}")


# Backward Pass for Pooling
print("\n--- MaxPool2D Backward Pass ---")
# Creating a sample gradient to pass back
# Shape: (1, 1, 2, 2)
pooled_grad = np.array([[[[1, 2], [3, 4]]]])
grad_to_pass = max_pool_layer.backward(pooled_grad)
print(f"Gradient passed to previous layer (dX):\n{grad_to_pass[0, 0]}")

# Manually calculating the expected gradient
expected_grad = np.array([[[[0, 0, 0, 0],
                           [0, 1, 0, 2],
                           [0, 0, 0, 0],
                           [0, 3, 0, 4]]]])
print(f"\nExpected dX:\n{expected_grad[0, 0]}")

# Checking if the calculated dX matches the expected dX
print(f"\nBackward pass matches expected dX: {np.allclose(grad_to_pass, expected_grad)}")


# Problem 5: Testing AveragePool2D
print("\n\n--- Problem 5: Testing AveragePool2D Layer ---")

# Creating a sample input for average pooling
# Shape: (batch_size=1, channels=1, height=4, width=4)
avg_pool_x = np.array([[[[1, 2, 3, 4],
                        [5, 6, 7, 8],
                        [9, 10, 11, 12],
                        [13, 14, 15, 16]]]])

# Defining pooling parameters
avg_pool_size = (2, 2)
avg_stride = (2, 2)

# Creating an AveragePool2D instance
avg_pool_layer = AveragePool2D(pool_size=avg_pool_size, stride=avg_stride)

# Forward Pass for Average Pooling
print("\n--- AveragePool2D Forward Pass ---")
avg_pooled_output = avg_pool_layer.forward(avg_pool_x)
print(f"Input for pooling:\n{avg_pool_x[0, 0]}")
print(f"\nAverage Pooled Output:\n{avg_pooled_output[0, 0]}")

# Expected average pooled output
expected_avg_output = np.array([[[3.5, 5.5], [11.5, 13.5]]])
print(f"\nExpected Average Pooled Output:\n{expected_avg_output[0, 0]}")
print(f"\nForward pass matches expected output: {np.allclose(avg_pooled_output, expected_avg_output)}")


# Backward Pass for Average Pooling
print("\n--- AveragePool2D Backward Pass ---")
# Creating a sample gradient to pass back
# Shape: (1, 1, 2, 2)
avg_pooled_grad = np.array([[[[1, 2], [3, 4]]]])
avg_grad_to_pass = avg_pool_layer.backward(avg_pooled_grad)
print(f"Gradient passed to previous layer (dX):\n{avg_grad_to_pass[0, 0]}")

# Manually calculate the expected gradient
expected_avg_grad = np.array([[[[0.25, 0.25, 0.5, 0.5],
                              [0.25, 0.25, 0.5, 0.5],
                              [0.75, 0.75, 1.0, 1.0],
                              [0.75, 0.75, 1.0, 1.0]]]])
print(f"\nExpected dX:\n{expected_avg_grad[0, 0]}")

# Checking if the calculated dX matches the expected dX
print(f"\nBackward pass matches expected dX: {np.allclose(avg_grad_to_pass, expected_avg_grad)}")


# Problem 6: Testing Flatten Layer
print("\n\n--- Problem 6: Testing Flatten Layer ---")

# Creating a sample input for flattening
# Shape: (batch_size=1, channels=2, height=2, width=2)
flatten_x = np.array([[[[1, 2], [3, 4]], [[5, 6], [7, 8]]]])

# Creating a Flatten instance
flatten_layer = Flatten()

# Forward Pass for Flattening
print("\n--- Flatten Forward Pass ---")
flattened_output = flatten_layer.forward(flatten_x)
print(f"Original shape: {flatten_x.shape}")
print(f"Flattened output shape: {flattened_output.shape}")
print(f"Flattened output:\n{flattened_output}")

# Expected flattened output
expected_flattened_output = np.array([[1, 2, 3, 4, 5, 6, 7, 8]])
print(f"\nExpected Flattened Output:\n{expected_flattened_output}")
print(f"\nForward pass matches expected output: {np.allclose(flattened_output, expected_flattened_output)}")


# Backward Pass for Flattening
print("\n--- Flatten Backward Pass ---")
# Creating a sample gradient to pass back
# Shape: (1, 8) - same as flattened output
flattened_grad = np.array([[1, 2, 3, 4, 5, 6, 7, 8]])
reshaped_grad = flatten_layer.backward(flattened_grad)
print(f"Original gradient shape: {flattened_grad.shape}")
print(f"Reshaped gradient shape: {reshaped_grad.shape}")
print(f"Reshaped gradient to previous layer (dX):\n{reshaped_grad}")

# The expected reshaped gradient is the same as the original input
expected_reshaped_grad = flatten_x
print(f"\nExpected Reshaped Gradient:\n{expected_reshaped_grad}")

# Checking if the calculated dX matches the expected dX
print(f"\nBackward pass matches expected dX: {np.allclose(reshaped_grad, expected_reshaped_grad)}")

--- Problem 2: Testing Conv2d Forward and Backward Propagation ---

--- Forward Propagation ---
Output (A):
[[[[-4. -4.]
   [-4. -4.]]

  [[ 1.  1.]
   [ 1.  1.]]]]

Expected Output:
[[[-4 -4]
  [-4 -4]]

 [[ 1  1]
  [ 1  1]]]

Forward pass matches expected output: True

--- Backward Propagation ---
Gradient to previous layer (dX):
[[[[  0.   0.   0.   0.]
   [  0.  -5.   4.  -7.]
   [  0.  13.  27. -11.]
   [  0. -10. -11.   0.]]]]

Expected dX (based on manual calculation):
[[[-5  4  0  0]
  [13 27 -4  0]
  [ 1  1 10  0]
  [ 0  0  0  0]]]

Backward pass matches expected dX: False


--- Problem 4: Testing MaxPool2D Layer ---

--- MaxPool2D Forward Pass ---
Input for pooling:
[[ 1  2  3  4]
 [ 5  6  7  8]
 [ 9 10 11 12]
 [13 14 15 16]]

Pooled Output:
[[ 6.  8.]
 [14. 16.]]

Expected Pooled Output:
[6 8]

Forward pass matches expected output: True

--- MaxPool2D Backward Pass ---
Gradient passed to previous layer (dX):
[[0 0 0 0]
 [0 1 0 2]
 [0 0 0 0]
 [0 3 0 4]]

Expected dX:
[[0 0 0 

**7. Learning and estimation**

In [None]:
!pip install python-mnist

import numpy as np
import random
import os
import requests
import gzip
from mnist import MNIST

# Layer Implementations (from previous problems)

class Conv2d:
    """
    A 2D convolutional layer implemented from scratch using NumPy.
    """
    def __init__(self, in_channels, out_channels, filter_size):
        self.in_channels = in_channels
        self.out_channels = out_channels
        self.filter_h, self.filter_w = filter_size

        self.W = np.random.randn(out_channels, in_channels, self.filter_h, self.filter_w) * 0.01
        self.b = np.zeros(out_channels)

        self.dW = None
        self.db = None
        self.X = None

    def forward(self, X):
        self.X = X
        N, C, H, W = X.shape
        N_out_h = H - self.filter_h + 1
        N_out_w = W - self.filter_w + 1
        A = np.zeros((N, self.out_channels, N_out_h, N_out_w))

        for n in range(N):
            for m in range(self.out_channels):
                for i in range(N_out_h):
                    for j in range(N_out_w):
                        receptive_field = self.X[n, :, i:i + self.filter_h, j:j + self.filter_w]
                        A[n, m, i, j] = np.sum(receptive_field * self.W[m]) + self.b[m]

        return A

    def backward(self, dA):
        N, C_in, H_in, W_in = self.X.shape
        N_out, M_out, H_out, W_out = dA.shape

        self.dW = np.zeros(self.W.shape)
        self.db = np.zeros(self.b.shape)
        dX = np.zeros(self.X.shape)

        for n in range(N):
            for m in range(M_out):
                for i in range(H_out):
                    for j in range(W_out):
                        receptive_field = self.X[n, :, i:i + self.filter_h, j:j + self.filter_w]
                        self.dW[m] += dA[n, m, i, j] * receptive_field
                self.db[m] = np.sum(dA[n, m])

        for n in range(N):
            for k in range(C_in):
                for i in range(H_in):
                    for j in range(W_in):
                        sum_val = 0
                        for m in range(M_out):
                            for s in range(self.filter_h):
                                for t in range(self.filter_w):
                                    if 0 <= (i - s) < H_out and 0 <= (j - t) < W_out:
                                        sum_val += dA[n, m, i - s, j - t] * self.W[m, k, s, t]
                        dX[n, k, i, j] = sum_val

        return dX


class ReLU:
    """
    Rectified Linear Unit (ReLU) activation function.
    """
    def __init__(self):
        self.X = None

    def forward(self, X):
        self.X = X
        return np.maximum(0, X)

    def backward(self, dA):
        dX = dA.copy()
        dX[self.X <= 0] = 0
        return dX


class MaxPool2D:
    """
    A 2D maximum pooling layer.
    """
    def __init__(self, pool_size, stride):
        self.pool_h, self.pool_w = pool_size
        self.stride_h, self.stride_w = stride
        self.max_indices = None
        self.X = None

    def forward(self, X):
        self.X = X
        N, C, H, W = X.shape
        output_h = int((H - self.pool_h) / self.stride_h) + 1
        output_w = int((W - self.pool_w) / self.stride_w) + 1
        A = np.zeros((N, C, output_h, output_w))
        self.max_indices = np.zeros_like(X, dtype=bool)

        for n in range(N):
            for c in range(C):
                for i in range(output_h):
                    for j in range(output_w):
                        h_start = i * self.stride_h
                        w_start = j * self.stride_w
                        h_end = h_start + self.pool_h
                        w_end = w_start + self.pool_w

                        region = self.X[n, c, h_start:h_end, w_start:w_end]
                        max_val = np.max(region)
                        max_val_idx = np.argmax(region)
                        A[n, c, i, j] = max_val

                        h_idx = h_start + max_val_idx // self.pool_w
                        w_idx = w_start + max_val_idx % self.pool_w
                        self.max_indices[n, c, h_idx, w_idx] = True

        return A

    def backward(self, dA):
        dX = np.zeros_like(self.X)
        dX[self.max_indices] = dA.ravel()
        return dX


class Flatten:
    """
    A flattening layer that reshapes a multi-dimensional input into a one-dimensional vector.
    """
    def __init__(self):
        self.input_shape = None

    def forward(self, X):
        self.input_shape = X.shape
        flattened_output = X.reshape(X.shape[0], -1)
        return flattened_output

    def backward(self, dA):
        reshaped_grad = dA.reshape(self.input_shape)
        return reshaped_grad


class Dense:
    """
    A fully-connected (dense) layer.
    """
    def __init__(self, input_size, output_size):
        self.input_size = input_size
        self.output_size = output_size
        self.W = np.random.randn(input_size, output_size) * 0.01
        self.b = np.zeros(output_size)
        self.X = None
        self.dW = None
        self.db = None

    def forward(self, X):
        self.X = X
        return np.dot(X, self.W) + self.b

    def backward(self, dA):
        self.dW = np.dot(self.X.T, dA)
        self.db = np.sum(dA, axis=0)
        dX = np.dot(dA, self.W.T)
        return dX


# Loss Function
class CrossEntropyLoss:
    """
    Cross-Entropy Loss function with Softmax.
    """
    def __init__(self):
        self.y_true = None
        self.logits = None
        self.softmax_output = None

    def forward(self, logits, y_true):
        self.y_true = y_true
        self.logits = logits
        # Softmax for numerical stability
        exp_logits = np.exp(logits - np.max(logits, axis=1, keepdims=True))
        self.softmax_output = exp_logits / np.sum(exp_logits, axis=1, keepdims=True)

        # Calculating loss
        loss = -np.sum(y_true * np.log(self.softmax_output + 1e-9)) / logits.shape[0]
        return loss

    def backward(self):
        return (self.softmax_output - self.y_true)


# Optimizer
class SGD:
    """
    Stochastic Gradient Descent optimizer.
    """
    def __init__(self, learning_rate=0.01):
        self.learning_rate = learning_rate

    def update(self, layers):
        for layer in layers:
            if hasattr(layer, 'W') and hasattr(layer, 'dW'):
                layer.W -= self.learning_rate * layer.dW
                layer.b -= self.learning_rate * layer.db


# Model and Training Loop
class LeNet:
    """
    A modern implementation of the LeNet-5 architecture.
    """
    def __init__(self):
        self.layers = [
            Conv2d(in_channels=1, out_channels=6, filter_size=(5, 5)),
            ReLU(),
            MaxPool2D(pool_size=(2, 2), stride=(2, 2)),
            Conv2d(in_channels=6, out_channels=16, filter_size=(5, 5)),
            ReLU(),
            MaxPool2D(pool_size=(2, 2), stride=(2, 2)),
            Flatten(),
            Dense(input_size=16 * 4 * 4, output_size=120),
            ReLU(),
            Dense(input_size=120, output_size=84),
            ReLU(),
            Dense(input_size=84, output_size=10)
        ]

    def forward(self, X):
        for layer in self.layers:
            X = layer.forward(X)
        return X

    def backward(self, dA):
        for layer in reversed(self.layers):
            dA = layer.backward(dA)
        return dA


class Trainer:
    """
    A class to handle the training and evaluation loop.
    """
    def __init__(self, model, optimizer, loss_function, epochs=3, batch_size=64):
        self.model = model
        self.optimizer = optimizer
        self.loss_function = loss_function
        self.epochs = epochs
        self.batch_size = batch_size
        self.log_loss = []
        self.log_acc = []

    def fit(self, X_train, y_train_one_hot, X_val, y_val):
        num_batches = len(X_train) // self.batch_size
        print("Starting training...")

        for epoch in range(self.epochs):
            epoch_loss = 0

            # Shuffling data for each epoch
            indices = list(range(len(X_train)))
            random.shuffle(indices)

            for i in range(num_batches):
                # Getting a random batch
                batch_indices = indices[i * self.batch_size:(i + 1) * self.batch_size]
                X_batch = X_train[batch_indices]
                y_batch_one_hot = y_train_one_hot[batch_indices]

                # Forward pass
                output = self.model.forward(X_batch)
                loss = self.loss_function.forward(output, y_batch_one_hot)
                epoch_loss += loss

                # Backward pass
                dA = self.loss_function.backward()
                self.model.backward(dA)

                # Updating weights
                self.optimizer.update(self.model.layers)

            avg_loss = epoch_loss / num_batches
            self.log_loss.append(avg_loss)

            # Evaluate on validation set
            val_preds = self.predict(X_val)
            val_accuracy = np.mean(val_preds == y_val)
            self.log_acc.append(val_accuracy)

            print(f"Epoch {epoch+1}/{self.epochs}, Average Loss: {avg_loss:.4f}, Validation Accuracy: {val_accuracy * 100:.2f}%")

        print("\nTraining complete.")

    def predict(self, X):
        output = self.model.forward(X)
        return np.argmax(output, axis=1)


# Data Preparation and Download
def download_mnist_data():
    """
    Downloads the MNIST dataset files if they don't exist.
    """
    base_url = "https://ossci-datasets.s3.amazonaws.com/mnist/"
    files = ["train-images-idx3-ubyte.gz", "train-labels-idx1-ubyte.gz",
             "t10k-images-idx3-ubyte.gz", "t10k-labels-idx1-ubyte.gz"]

    data_dir = "./data"
    os.makedirs(data_dir, exist_ok=True)

    for file_name in files:
        file_path = os.path.join(data_dir, file_name)
        if not os.path.exists(file_path):
            print(f"Downloading {file_name}...")
            url = base_url + file_name
            try:
                response = requests.get(url, stream=True)
                response.raise_for_status()
                with open(file_path, "wb") as f:
                    for chunk in response.iter_content(chunk_size=8192):
                        f.write(chunk)
                print(f"Downloaded {file_name}.")
            except requests.exceptions.RequestException as e:
                print(f"Error downloading {file_name}: {e}")
                raise FileNotFoundError(f"Could not download {file_name}")

def load_data():
    """
    Loads MNIST data, ensuring files are downloaded first.
    """
    download_mnist_data()

    data_dir = "./data"
    files = ["train-images-idx3-ubyte", "train-labels-idx1-ubyte",
             "t10k-images-idx3-ubyte", "t10k-labels-idx1-ubyte"]

    for file_name in files:
        zipped_path = os.path.join(data_dir, file_name + ".gz")
        unzipped_path = os.path.join(data_dir, file_name)
        if not os.path.exists(unzipped_path):
            print(f"Unzipping {file_name}.gz...")
            with gzip.open(zipped_path, 'rb') as f_in:
                with open(unzipped_path, 'wb') as f_out:
                    f_out.write(f_in.read())
            print(f"Unzipped {file_name}.gz.")

    mndata = MNIST(data_dir)
    images, labels = mndata.load_training()
    test_images, test_labels = mndata.load_testing()

    X_train = np.array(images).reshape(-1, 1, 28, 28) / 255.0
    y_train = np.array(labels)
    X_test = np.array(test_images).reshape(-1, 1, 28, 28) / 255.0
    y_test = np.array(test_labels)

    y_train_one_hot = np.zeros((y_train.size, 10))
    y_train_one_hot[np.arange(y_train.size), y_train] = 1

    return X_train, y_train_one_hot, y_train, X_test, y_test


# Main Training Script
if __name__ == "__main__":
    # Load data
    print("Loading and preparing MNIST data...")
    X_train, y_train_one_hot, y_train, X_test, y_test = load_data()
    print("Data loaded successfully.")

    # Instantiate model, loss, and optimizer
    model = LeNet()
    criterion = CrossEntropyLoss()
    optimizer = SGD(learning_rate=0.01)

    # Instantiate and run the trainer
    trainer = Trainer(
        model=model,
        optimizer=optimizer,
        loss_function=criterion,
        epochs=3,
        batch_size=64
    )

    trainer.fit(X_train, y_train_one_hot, X_test, y_test)

Loading and preparing MNIST data...
Data loaded successfully.
Starting training...
Epoch 1/3, Average Loss: 2.3036, Validation Accuracy: 10.28%


**8. LeNet**

In [None]:
import numpy as np
import random
import os
import requests
import gzip
from mnist import MNIST

# Layer Implementations (from previous problems)

class Conv2d:
    """
    A 2D convolutional layer implemented from scratch using NumPy.
    """
    def __init__(self, in_channels, out_channels, filter_size):
        self.in_channels = in_channels
        self.out_channels = out_channels
        self.filter_h, self.filter_w = filter_size

        self.W = np.random.randn(out_channels, in_channels, self.filter_h, self.filter_w) * 0.01
        self.b = np.zeros(out_channels)

        self.dW = None
        self.db = None
        self.X = None

    def forward(self, X):
        self.X = X
        N, C, H, W = X.shape
        N_out_h = H - self.filter_h + 1
        N_out_w = W - self.filter_w + 1
        A = np.zeros((N, self.out_channels, N_out_h, N_out_w))

        for n in range(N):
            for m in range(self.out_channels):
                for i in range(N_out_h):
                    for j in range(N_out_w):
                        receptive_field = self.X[n, :, i:i + self.filter_h, j:j + self.filter_w]
                        A[n, m, i, j] = np.sum(receptive_field * self.W[m]) + self.b[m]

        return A

    def backward(self, dA):
        N, C_in, H_in, W_in = self.X.shape
        N_out, M_out, H_out, W_out = dA.shape

        self.dW = np.zeros(self.W.shape)
        self.db = np.zeros(self.b.shape)
        dX = np.zeros(self.X.shape)

        for n in range(N):
            for m in range(M_out):
                for i in range(H_out):
                    for j in range(W_out):
                        receptive_field = self.X[n, :, i:i + self.filter_h, j:j + self.filter_w]
                        self.dW[m] += dA[n, m, i, j] * receptive_field
                self.db[m] = np.sum(dA[n, m])

        for n in range(N):
            for k in range(C_in):
                for i in range(H_in):
                    for j in range(W_in):
                        sum_val = 0
                        for m in range(M_out):
                            for s in range(self.filter_h):
                                for t in range(self.filter_w):
                                    if 0 <= (i - s) < H_out and 0 <= (j - t) < W_out:
                                        sum_val += dA[n, m, i - s, j - t] * self.W[m, k, s, t]
                        dX[n, k, i, j] = sum_val

        return dX


class ReLU:
    """
    Rectified Linear Unit (ReLU) activation function.
    """
    def __init__(self):
        self.X = None

    def forward(self, X):
        self.X = X
        return np.maximum(0, X)

    def backward(self, dA):
        dX = dA.copy()
        dX[self.X <= 0] = 0
        return dX


class MaxPool2D:
    """
    A 2D maximum pooling layer.
    """
    def __init__(self, pool_size, stride):
        self.pool_h, self.pool_w = pool_size
        self.stride_h, self.stride_w = stride
        self.max_indices = None
        self.X = None

    def forward(self, X):
        self.X = X
        N, C, H, W = X.shape
        output_h = int((H - self.pool_h) / self.stride_h) + 1
        output_w = int((W - self.pool_w) / self.stride_w) + 1
        A = np.zeros((N, C, output_h, output_w))
        self.max_indices = np.zeros_like(X, dtype=bool)

        for n in range(N):
            for c in range(C):
                for i in range(output_h):
                    for j in range(output_w):
                        h_start = i * self.stride_h
                        w_start = j * self.stride_w
                        h_end = h_start + self.pool_h
                        w_end = w_start + self.pool_w

                        region = self.X[n, c, h_start:h_end, w_start:w_end]
                        max_val = np.max(region)
                        max_val_idx = np.argmax(region)
                        A[n, c, i, j] = max_val

                        h_idx = h_start + max_val_idx // self.pool_w
                        w_idx = w_start + max_val_idx % self.pool_w
                        self.max_indices[n, c, h_idx, w_idx] = True

        return A

    def backward(self, dA):
        dX = np.zeros_like(self.X)
        dX[self.max_indices] = dA.ravel()
        return dX


class Flatten:
    """
    A flattening layer that reshapes a multi-dimensional input into a one-dimensional vector.
    """
    def __init__(self):
        self.input_shape = None

    def forward(self, X):
        self.input_shape = X.shape
        flattened_output = X.reshape(X.shape[0], -1)
        return flattened_output

    def backward(self, dA):
        reshaped_grad = dA.reshape(self.input_shape)
        return reshaped_grad


class Dense:
    """
    A fully-connected (dense) layer.
    """
    def __init__(self, input_size, output_size):
        self.input_size = input_size
        self.output_size = output_size
        self.W = np.random.randn(input_size, output_size) * 0.01
        self.b = np.zeros(output_size)
        self.X = None
        self.dW = None
        self.db = None

    def forward(self, X):
        self.X = X
        return np.dot(X, self.W) + self.b

    def backward(self, dA):
        self.dW = np.dot(self.X.T, dA)
        self.db = np.sum(dA, axis=0)
        dX = np.dot(dA, self.W.T)
        return dX


# Loss Function
class CrossEntropyLoss:
    """
    Cross-Entropy Loss function with Softmax.
    """
    def __init__(self):
        self.y_true = None
        self.logits = None
        self.softmax_output = None

    def forward(self, logits, y_true):
        self.y_true = y_true
        self.logits = logits
        # Softmax for numerical stability
        exp_logits = np.exp(logits - np.max(logits, axis=1, keepdims=True))
        self.softmax_output = exp_logits / np.sum(exp_logits, axis=1, keepdims=True)

        # Calculating loss
        loss = -np.sum(y_true * np.log(self.softmax_output + 1e-9)) / logits.shape[0]
        return loss

    def backward(self):
        # The backward pass for Softmax and Cross-Entropy combined is
        # the softmax output minus the one-hot encoded true labels.
        return (self.softmax_output - self.y_true)


# Optimizer
class SGD:
    """
    Stochastic Gradient Descent optimizer.
    """
    def __init__(self, learning_rate=0.01):
        self.learning_rate = learning_rate

    def update(self, layers):
        for layer in layers:
            if hasattr(layer, 'W') and hasattr(layer, 'dW'):
                layer.W -= self.learning_rate * layer.dW
                layer.b -= self.learning_rate * layer.db


# Model and Training Loop
class LeNet:
    """
    A modern implementation of the LeNet-5 architecture.
    """
    def __init__(self):
        self.layers = [
            Conv2d(in_channels=1, out_channels=6, filter_size=(5, 5)),
            ReLU(),
            MaxPool2D(pool_size=(2, 2), stride=(2, 2)),
            Conv2d(in_channels=6, out_channels=16, filter_size=(5, 5)),
            ReLU(),
            MaxPool2D(pool_size=(2, 2), stride=(2, 2)),
            Flatten(),
            Dense(input_size=16 * 4 * 4, output_size=120),
            ReLU(),
            Dense(input_size=120, output_size=84),
            ReLU(),
            Dense(input_size=84, output_size=10)
        ]

    def forward(self, X):
        for layer in self.layers:
            X = layer.forward(X)
        return X

    def backward(self, dA):
        for layer in reversed(self.layers):
            dA = layer.backward(dA)
        return dA


class Trainer:
    """
    A class to handle the training and evaluation loop.
    """
    def __init__(self, model, optimizer, loss_function, epochs=3, batch_size=64):
        self.model = model
        self.optimizer = optimizer
        self.loss_function = loss_function
        self.epochs = epochs
        self.batch_size = batch_size
        self.log_loss = []
        self.log_acc = []

    def fit(self, X_train, y_train_one_hot, X_val, y_val):
        num_batches = len(X_train) // self.batch_size
        print("Starting training...")

        for epoch in range(self.epochs):
            epoch_loss = 0

            # Shuffle data for each epoch
            indices = list(range(len(X_train)))
            random.shuffle(indices)

            for i in range(num_batches):
                # Get a random batch
                batch_indices = indices[i * self.batch_size:(i + 1) * self.batch_size]
                X_batch = X_train[batch_indices]
                y_batch_one_hot = y_train_one_hot[batch_indices]

                # Forward pass
                output = self.model.forward(X_batch)
                loss = self.loss_function.forward(output, y_batch_one_hot)
                epoch_loss += loss

                # Backward pass
                dA = self.loss_function.backward()
                self.model.backward(dA)

                # Update weights
                self.optimizer.update(self.model.layers)

            avg_loss = epoch_loss / num_batches
            self.log_loss.append(avg_loss)

            # Evaluate on validation set
            val_preds = self.predict(X_val)
            val_accuracy = np.mean(val_preds == y_val)
            self.log_acc.append(val_accuracy)

            print(f"Epoch {epoch+1}/{self.epochs}, Average Loss: {avg_loss:.4f}, Validation Accuracy: {val_accuracy * 100:.2f}%")

        print("\nTraining complete.")

    def predict(self, X):
        output = self.model.forward(X)
        return np.argmax(output, axis=1)


# Data Preparation and Download
def download_mnist_data():
    """
    Downloads the MNIST dataset files if they don't exist.
    """
    base_url = "https://ossci-datasets.s3.amazonaws.com/mnist/"
    files = ["train-images-idx3-ubyte.gz", "train-labels-idx1-ubyte.gz",
             "t10k-images-idx3-ubyte.gz", "t10k-labels-idx1-ubyte.gz"]

    data_dir = "./data"
    os.makedirs(data_dir, exist_ok=True)

    for file_name in files:
        file_path = os.path.join(data_dir, file_name)
        if not os.path.exists(file_path):
            print(f"Downloading {file_name}...")
            url = base_url + file_name
            try:
                response = requests.get(url, stream=True)
                response.raise_for_status()
                with open(file_path, "wb") as f:
                    for chunk in response.iter_content(chunk_size=8192):
                        f.write(chunk)
                print(f"Downloaded {file_name}.")
            except requests.exceptions.RequestException as e:
                print(f"Error downloading {file_name}: {e}")
                raise FileNotFoundError(f"Could not download {file_name}")

def load_data():
    """
    Loads MNIST data, ensuring files are downloaded first.
    """
    download_mnist_data()

    data_dir = "./data"
    files = ["train-images-idx3-ubyte", "train-labels-idx1-ubyte",
             "t10k-images-idx3-ubyte", "t10k-labels-idx1-ubyte"]

    for file_name in files:
        zipped_path = os.path.join(data_dir, file_name + ".gz")
        unzipped_path = os.path.join(data_dir, file_name)
        if not os.path.exists(unzipped_path):
            print(f"Unzipping {file_name}.gz...")
            with gzip.open(zipped_path, 'rb') as f_in:
                with open(unzipped_path, 'wb') as f_out:
                    f_out.write(f_in.read())
            print(f"Unzipped {file_name}.gz.")

    mndata = MNIST(data_dir)
    images, labels = mndata.load_training()
    test_images, test_labels = mndata.load_testing()

    X_train = np.array(images).reshape(-1, 1, 28, 28) / 255.0
    y_train = np.array(labels)
    X_test = np.array(test_images).reshape(-1, 1, 28, 28) / 255.0
    y_test = np.array(test_labels)

    y_train_one_hot = np.zeros((y_train.size, 10))
    y_train_one_hot[np.arange(y_train.size), y_train] = 1

    return X_train, y_train_one_hot, y_train, X_test, y_test


# Main Training Script
if __name__ == "__main__":
    # Load data
    print("Loading and preparing MNIST data...")
    X_train, y_train_one_hot, y_train, X_test, y_test = load_data()
    print("Data loaded successfully.")

    # Instantiate model, loss, and optimizer
    model = LeNet()
    criterion = CrossEntropyLoss()
    optimizer = SGD(learning_rate=0.01)

    # Instantiate and run the trainer
    trainer = Trainer(
        model=model,
        optimizer=optimizer,
        loss_function=criterion,
        epochs=3,
        batch_size=64
    )

    trainer.fit(X_train, y_train_one_hot, X_test, y_test)

**9. Survey of famous image recognition models**

AlexNet (2012)

AlexNet was a groundbreaking model that won the ImageNet Large Scale Visual Recognition Challenge (ILSVRC) in 2012. Its victory demonstrated that deep convolutional neural networks (CNNs) were a powerful tool for large-scale image recognition. It was a significant jump from the previous best models and ignited the deep learning revolution in computer vision.

Here are some of its key features:

Deeper Architecture: It was much deeper than LeNet, with five convolutional layers followed by three fully connected layers.

ReLU Activation: It used the Rectified Linear Unit (ReLU) activation function, which was found to train much faster than the sigmoid or tanh functions used in earlier networks.

Dropout: To prevent overfitting, the network used dropout layers, which randomly "turned off" neurons during training.

GPU Utilization: The network was so large that it was split across two GPUs during training, highlighting the need for parallel processing in deep learning.

VGG16 (2014)

Developed by the Visual Geometry Group (VGG) at the University of Oxford, VGG16 was the runner-up in the ILSVRC in 2014. While it didn't win the competition, its simple and uniform architecture made it incredibly popular. The 16 in its name refers to the number of weight layers.

Key features:

Simplicity and Uniformity: Instead of large, varying filter sizes, VGG16's key innovation was using a stack of small 3x3 convolutional filters throughout the entire network. The idea was that stacking multiple small filters could achieve the same receptive field as a single larger filter, but with more non-linearity and fewer parameters.

Deeper and More Layers: VGG16 pushed the depth of CNNs even further, with 13 convolutional layers and 3 fully connected layers.

Transfer Learning: Due to its straightforward architecture and excellent performance, VGG16 quickly became a standard for transfer learning, where a pre-trained model is used as a feature extractor for other image-related tasks.

**10. Calculation of output size and number of parameters**

1. Calculation for Layer 1

Input Size: 144 x 144, 3 channels

Filter Size: 3 x 3, 6 channels

Stride: 1

Padding: None

Output Size Calculation:
The formula for the output size without padding is:

Output=(
Stride
Input−Filter
​
 )+1
Height: (
1
144−3
​
 )+1=141+1=142

Width: (
1
144−3
​
 )+1=141+1=142

Channels: The number of output channels is equal to the number of filters, which is 6.

Therefore, the output size is 142 x 142, 6 channels.

Number of Parameters Calculation:
The formula for the number of parameters (including bias) is:

Parameters=(Filter width×Filter height×Input channels+1)×Output channels
(3×3×3+1)×6=(27+1)×6=28×6=168

Therefore, the number of parameters is 168.

---

2. Calculation for Layer 2

Input Size: 60 x 60, 24 channels

Filter Size: 3 x 3, 48 channels

Stride: 1

Padding: None

Output Size Calculation:

Height: (
1
60−3
​
 )+1=57+1=58

Width: (
1
60−3
​
 )+1=57+1=58

Channels: The number of output channels is 48.

Therefore, the output size is 58 x 58, 48 channels.

Number of Parameters Calculation:

(3×3×24+1)×48=(216+1)×48=217×48=10,416

Therefore, the number of parameters is 10,416.

---

3. Calculation for Layer 3

Input Size: 20 x 20, 10 channels

Filter Size: 3 x 3, 20 channels

Stride: 2

Padding: None

Output Size Calculation:

Height: (
2
20−3
​
 )+1=
2
17
​
 +1=8.5+1. In this case, the result is not an integer. The framework will often round down, effectively ignoring the last row or column of pixels that don't fit the filter and stride.

New Height: (⌊
2
20−3
​
 ⌋)+1=⌊8.5⌋+1=8+1=9

New Width: (⌊
2
20−3
​
 ⌋)+1=⌊8.5⌋+1=8+1=9

Channels: The number of output channels is 20.

Therefore, the output size is 9 x 9, 20 channels. This is an example of why it's not ideal to use these settings. The model will not be able to process the entire input image.

Number of Parameters Calculation:

(3×3×10+1)×20=(90+1)×20=91×20=1,820

Therefore, the number of parameters is 1,820.

**11. Survey on filter size**

Why 3x3 Filters are Commonly Used

The use of multiple small filters, like 3x3, instead of a single large filter, like 7x7, has become a standard practice in modern CNN architectures. This is primarily for two reasons:

Deeper Networks and More Non-linearity: A single 7x7 convolutional layer can be replaced by three consecutive 3x3 layers. Each 3x3 layer includes a non-linear activation function (like ReLU). By stacking these layers, the network can learn more complex, abstract features. The three 3x3 layers together have a receptive field size equivalent to one 7x7 layer, but they introduce more non-linearity, which significantly improves the model's ability to learn and classify complex patterns.

Reduced Parameters and Computation: This is the most significant advantage. A single 7x7 filter has 49 parameters. In contrast, three 3x3 filters have 3×3×3=27 parameters. While this is a simplified example, the parameter count scales with the number of input and output channels. Using multiple smaller filters drastically reduces the total number of parameters, which decreases memory requirements and speeds up both training and inference.

---

The Role of a 1x1 Filter

A 1x1 filter is also known as a "pointwise convolution" because it doesn't operate spatially like a typical filter. Its main purpose is to manipulate the channel dimension of the data.

When a 1x1 filter is applied, it performs a weighted sum across all the input channels for each individual pixel location. This allows it to:

Reduce Channel Dimensionality: By using fewer 1x1 filters than the number of input channels, you can compress the feature map. For instance, if you have 256 channels, you could use a 1x1 convolution with 64 filters to reduce the channels to 64, thereby significantly cutting down the number of parameters for subsequent layers.

Increase Channel Dimensionality: Conversely, you can also use 1x1 filters to expand the number of channels, projecting the feature map into a higher-dimensional space.

Introduce Non-linearity: Just like other convolutional layers, a 1x1 filter is typically followed by a non-linear activation function, adding more expressive power to the network.

---
Data Flow Through the LeNet-5 Model

Based on the LeNet-5 code in the document, here's a quick look at how the data flows through the convolutional and pooling layers:

Input: The initial image tensor has a shape of (Batch Size, 1, 28, 28).

Conv2d: The first convolutional layer applies 6 filters of size 5x5, which reduces the spatial dimensions. The output tensor is now (Batch Size, 6, 24, 24).

MaxPool2D: The first max-pooling layer with a 2x2 filter and stride of 2 halves the height and width. The tensor becomes (Batch Size, 6, 12, 12).

Conv2d: The second convolutional layer with 16 filters of size 5x5 again reduces the spatial dimensions. The tensor is now (Batch Size, 16, 8, 8).

MaxPool2D: The second max-pooling layer halves the height and width once more. The tensor becomes (Batch Size, 16, 4, 4).

Flatten: This layer converts the multi-dimensional tensor into a single vector. The dimensions are multiplied together, resulting in a shape of (Batch Size, 256), which is then fed into the fully-connected layers.

Each step systematically transforms the input data, extracting increasingly abstract features until it's ready for the final classification layers.


