<a href="https://colab.research.google.com/github/drpetros11111/AI_Sciencs/blob/CNN/CNN_convolution_and_pooling_(2).ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [None]:
import numpy as np

In [None]:
def f_padd(I,p):
    numRows = I.shape[0]
    numCols = I.shape[1]
    zeroRows = np.zeros((p,numCols))
    I = np.vstack((zeroRows,I))
    I = np.vstack((I,zeroRows))
    zeroCols = np.zeros((numRows+2*p,p))
    I = np.hstack((zeroCols,I))
    I = np.hstack((I,zeroCols))
    return I

# Add Padding Function
This function, f_padd, adds padding to an image I with a specified padding width p.

Padding is often used in image processing to maintain the dimensions of an image after applying operations like convolution.

Let's break down the function step by step:


---
## Function Breakdown
Function Definition and Input Parameters

    def f_padd(I, p):

##I

The input image (a 2D array).

---


##p

The padding width (number of rows/columns of zeros to add around the image).



---


##Get the Dimensions of the Input Image

    numRows = I.shape[0]
    numCols = I.shape[1]

##numRows

Number of rows in the input image.

##numCols

Number of columns in the input image.
Create Rows of Zeros for Padding

    zeroRows = np.zeros((p, numCols))

---
##zeroRows

A 2D array of zeros with p rows and the same number of columns as the input image.

This will be used to pad the top and bottom of the image.

Add Zero Rows to the Top and Bottom of the Image

    I = np.vstack((zeroRows, I))
    I = np.vstack((I, zeroRows))

---
##np.vstack

Stacks arrays vertically (row-wise).

The first np.vstack adds zeroRows to the top of the image.

The second np.vstack adds zeroRows to the bottom of the image.

----
##Create Columns of Zeros for Padding

    zeroCols = np.zeros((numRows + 2 * p, p))


##Shape of zeroCols
The shape of the zeroCols array is determined by the tuple (numRows + 2 * p, p).

    numRows + 2 * p

numRows is the number of rows in the original image.

    2 * p

accounts for the padding added to the top and bottom of the image.

Therefore, numRows + 2 * p is the total number of rows in the image after adding the top and bottom padding.

    p

##p

is the number of columns of zeros to be added on each side of the image.

Therefore, p is the width of the padding columns to be added to the left and right sides.

##Creating the Array of Zeros

    zeroCols = np.zeros((numRows + 2 * p, p))

np.zeros creates an array filled with zeros.

The shape of this array is

    (numRows + 2 * p, p)

meaning it has numRows + 2 * p rows and p columns.

##Visual Example
Suppose the original image I has dimensions 3 x 3 (i.e., numRows = 3 and numCols = 3), and we want to add a padding width p = 1.

The number of rows in zeroCols will be 3 + 2 * 1 = 5.

The number of columns in zeroCols will be 1.

So, zeroCols will look like this:

    zeroCols = [[0],
               [0],
               [0],
               [0],
               [0]]

##Applying zeroCols to the Image
After creating zeroCols, it will be added to the left and right sides of the padded image (which already has top and bottom padding):

    I = np.hstack((zeroCols, I))  # Adds `zeroCols` to the left side
    I = np.hstack((I, zeroCols))  # Adds `zeroCols` to the right side

##Here's how it works step-by-step

Original Image with Top and Bottom
##Padding:

    [[0, 0, 0],
    [1, 2, 3],
    [4, 5, 6],
    [7, 8, 9],
    [0, 0, 0]]

##After Adding Left Padding:

    [[0, 0, 0, 0],
    [0, 1, 2, 3],
    [0, 4, 5, 6],
    [0, 7, 8, 9],
    [0, 0, 0, 0]]

##After Adding Right Padding:

    [[0, 0, 0, 0, 0],
    [0, 1, 2, 3, 0],
    [0, 4, 5, 6, 0],
    [0, 7, 8, 9, 0],
    [0, 0, 0, 0, 0]]

##Summary

The line

    zeroCols = np.zeros((numRows + 2 * p, p))

creates a vertical strip of zeros that matches the height of the image after top and bottom padding and has a width of p columns.

This allows us to pad the left and right sides of the image, completing the symmetrical padding on all four sides.

-----
##zeroCols

A 2D array of zeros with numRows + 2 * p rows (the height of the image after adding the zero rows) and p columns. This will be used to pad the left and right of the image.

Add Zero Columns to the Left and Right of the Image


    I = np.hstack((zeroCols, I))
    I = np.hstack((I, zeroCols))

----
##np.hstack: Stacks arrays horizontally (column-wise)

The first

    np.hstack #adds zeroCols to the left of the image.

The second

    np.hstack #adds zeroCols to the right of the image.

----
###Return the Padded Image

    return I

-----
##Summary

The function f_padd effectively adds a border of zeros around the input image I with a width specified by p.

The result is an image with additional rows and columns of zeros, which can be useful for various image processing tasks. Here's an example to illustrate:

---
##Example

Given an input image I:

    I = [[1, 2, 3],
        [4, 5, 6],
        [7, 8, 9]]

If p = 1, the function adds a border of zeros around this image:

    I = [[0, 0, 0, 0, 0],
        [0, 1, 2, 3, 0],
        [0, 4, 5, 6, 0],
        [0, 7, 8, 9, 0],
        [0, 0, 0, 0, 0]]

This padded image has a border of zeros of width p added to the top, bottom, left, and right of the original image.

In [None]:
def f_conv2d(I,K,p):
    fSize = K.shape[0]
    I2 = f_padd(I,p)
    numRows = I2.shape[0]
    numCols = I2.shape[1]

    C = np.zeros((numRows-2*p,numCols-2*p))

    for i in range(numRows-fSize+1):
        for j in range(numCols-fSize+1):
            A = I2[i:i+fSize,j:j+fSize]
            C[i,j] = (A.flatten()*K.flatten()).sum()
    return C


# Convolution Function
This snippet defines a function f_conv2d that performs a 2D convolution operation on an input image I using a kernel K with padding p.

The function outputs the result of the convolution.

Here's a detailed explanation of each part of the code:



---
## Function Definition

    def f_conv2d(I, K, p):

This line defines the function f_conv2d which takes three parameters:

###I

the input image (a 2D numpy array).

###K

the kernel (a smaller 2D numpy array).

###p

the padding size (an integer).

----
## Filter Size and Padded Image

    fSize = K.shape[0]
    I2 = f_padd(I, p)

###fSize = K.shape[0]

This line gets the size of the kernel (assuming it's a square matrix) by accessing the number of rows (or columns, since it's square).

###I2 = f_padd(I, p)

This line calls the f_padd function (assumed to be defined elsewhere) to pad the input image I with p rows and columns of zeros around the border, resulting in a padded image I2.

----
###Shape of the Padded Image

    numRows = I2.shape[0]
    numCols = I2.shape[1]

numRows and numCols store the dimensions of the padded image I2.

----
##Initialize Output Matrix

    C = np.zeros((numRows-2*p, numCols-2*p))

###Explanation

    C = np.zeros((numRows-2*p, numCols-2*p))


###C = np.zeros((numRows-2*p, numCols-2*p))

The line initializes the output matrix C, which will store the results of the convolution operation.

####Shape Calculation

numRows and numCols are the dimensions of the padded image I2.

2*p accounts for the padding added to both sides (top and bottom for rows, left and right for columns).

    numRows - 2*p and numCols - 2*p
    
give the dimensions of the original image I before padding.

##Why Subtract 2*p?
When you pad an image with p rows/columns of zeros, you add p zeros to each side:

The total number of additional rows is

    2*p (i.e., p rows at the top + p rows at the bottom).

The total number of additional columns is 2*p (i.e., p columns on the left + p columns on the right).


###Ensuring the Correct Output Size
The original image I has dimensions (originalNumRows, originalNumCols).


After padding, the padded image I2 has dimensions (originalNumRows + 2*p, originalNumCols + 2*p).


During convolution, the valid output size should match the original image size because the padding ensures the kernel fits within the image boundaries, producing an output of the same dimensions as the input.

Thus, the output matrix C is initialized to have the same dimensions as the original image, which is   

    (numRows - 2*p, numCols - 2*p).

##Visual Example
Let's say your original image I is 3x3, and you pad it with p = 1.

The padded image I2 will be 5x5 (3 + 2*1).

To store the convolution results, you need a matrix C of the same size as the original image (3x3), hence C = np.zeros((5-2*1, 5-2*1)) becomes C = np.zeros((3, 3)).

The line

    C = np.zeros((numRows-2*p, numCols-2*p))

ensures that the output matrix C has the correct dimensions to store the results of the convolution operation, excluding the padded borders.

This way, C will have the same dimensions as the original input image I.
###C

the output matrix (or convolved image) is initialized with zeros. Its size is reduced by 2*p in both dimensions because the padding doesn't contribute to the valid convolution region.

----
##Perform Convolution

    for i in range(numRows - fSize + 1):
        for j in range(numCols - fSize + 1):
            A = I2[i:i+fSize, j:j+fSize]
            C[i, j] = (A.flatten() * K.flatten()).sum()

----
##The outer for loop

iterates over the rows of the padded image I2, stopping at

    numRows - fSize + 1

to ensure the kernel K doesn't go out of bounds.

###Why Add 1?
The +1 in numRows - fSize + 1 accounts for the starting position being inclusive.

Without +1, the filter would not cover all valid starting positions.

 For a 2x2 filter on a 5x5 image, it can start from position 0, 1, 2, or 3, which is 4 valid positions (5 - 2 + 1 = 4).

 numRows - fSize + 1 ensures the loop covers all valid positions from top-left to bottom-right.

The kernel slides across every possible position where it fits entirely within the padded image.

The padding ensures the kernel can also slide over the boundary regions of the original image.

Adding +1 in numRows - fSize + 1 ensures that we include the last valid position, considering the inclusive nature of Python's range function.

----
###The inner for loop

iterates over the columns, similarly stopping at numCols - fSize + 1.

###A = I2[i:i+fSize, j:j+fSize]

This line extracts a submatrix A from the padded image I2 starting at (i, j) with the same size as the kernel K.
---

    C[i, j] = (A.flatten() * K.flatten()).sum()

This line performs element-wise multiplication of the flattened versions of A and K, then sums the result to get a single scalar value, which is assigned to the corresponding position (i, j) in the output matrix C.

----
##Return the Convolution Result

    return C

The function returns the convolved image C.

-----
##Summary
The function f_conv2d performs the following steps:

Pads the input image I with p zeros on all sides.

Initializes an output matrix C to store the results of the convolution.

Iterates over the padded image to perform the convolution operation, extracting submatrices of the same size as the kernel, performing element-wise multiplication, summing the results, and storing them in the corresponding positions in C.

Returns the convolved image C.

----
##Example Usage


    import numpy as np

# Define a simple input image and kernel
    I = np.array([[1, 2, 3],
                 [4, 5, 6],
                 [7, 8, 9]])

    K = np.array([[1, 0],
                 [0, -1]])

# Define padding size
    p = 1

# Perform the convolution
    result = f_conv2d(I, K, p)

    print(result)

This code snippet should help you understand how the convolution operation is performed using the f_conv2d function.

In [None]:
def f_ReLU(C):
    C[C<0] = 0
    return C

# RELU Function Definition

    def f_ReLU(C):
       C[C < 0] = 0
       return C



---


## Purpose of the Function
This function implements the ReLU (Rectified Linear Unit) activation function.

The ReLU function is commonly used in neural networks, particularly in deep learning, because it introduces non-linearity to the model and helps mitigate the vanishing gradient problem.


---


## How the Function Works

###Input:

The function takes a single input parameter C, which is typically a NumPy array.

This array represents the output of a layer in a neural network before applying the activation function.

----
##ReLU Activation

The ReLU function sets all negative values in the input array C to zero. This is achieved using the line C[C < 0] = 0.

C < 0 creates a boolean array where each element is True if the corresponding element in C is less than zero, and False otherwise.

C[C < 0] = 0 uses this boolean array to index into C and set all elements where the condition is True to zero.

----
##Return

The modified array C, with all negative values replaced by zeros, is then returned.

----
##Example
Let's look at a simple example to see how it works:

    import numpy as np

# Example input array
    C = np.array([[1, -2, 3], [-4, 5, -6]])

# Applying the ReLU function
    C_relu = f_ReLU(C)

    print(C_relu)

----
##Output

    [[1 0 3]
    [0 5 0]]

-----
##Explanation of the Example
###Input Array C:

    [[ 1, -2,  3],
    [-4,  5, -6]]

-----
##Applying ReLU

Positive values remain unchanged.

Negative values are set to zero.

-----
##Output Array C_relu

    [[1, 0, 3],
    [0, 5, 0]]

-----
##Summary
ReLU Activation Function: The function f_ReLU is a straightforward implementation of the ReLU activation function.

It modifies the input array C in-place by setting all negative values to zero and returns the modified array.

Use in Neural Networks: ReLU is widely used in neural networks because it helps to introduce non-linearity, which is crucial for learning complex patterns, and it mitigates the vanishing gradient problem by ensuring that gradients do not become too small during backpropagation.

This function is essential in the context of neural networks and deep learning, where activation functions play a critical role in the performance and convergence of the model.

In [None]:
def f_sigmoid(f,w,bf):
    x = w.dot(f)+bf
    y_hat = 1/(1+np.exp(-x))
    return y_hat

# Sigmoid Function Definition

    def f_sigmoid(f, w, bf):
       x = w.dot(f) + bf
       y_hat = 1 / (1 + np.exp(-x))
       return y_hat

------
## Purpose of the Function
This function calculates the output of a single-layer neural network using the sigmoid activation function.

The sigmoid function maps any real-valued number into the range (0, 1), making it useful for binary classification problems.


---


## Parameters
###f

This is typically a vector representing the input features.

###w

This is a weight vector that is used to scale the input features.

###bf

This is a bias term that is added after the weighted sum of the input features.

-----
##Steps in the Function
###Weighted Sum with Bias

    x = w.dot(f) + bf

###w.dot(f)

This performs the dot product between the weight vector w and the input feature vector f. The dot product is a single number that represents the weighted sum of the input features.

###+ bf

This adds the bias term bf to the weighted sum. The bias helps in adjusting the output along with the weights.

-----
##Sigmoid Activation

    y_hat = 1 / (1 + np.exp(-x))

###np.exp(-x)

This calculates the exponential of -x.

###1 / (1 + np.exp(-x))

This applies the sigmoid function to x. The sigmoid function squashes the input x into a value between 0 and 1.

----
##Return

    return y_hat

The function returns y_hat, which is the output of the sigmoid activation function.

This value is interpreted as the probability or confidence level of the input belonging to a certain class (e.g., class 1 in binary classification).

-----
##Example
Let's look at a simple example to see how it works:

    import numpy as np

# Example input
    f = np.array([0.5, 1.5])  # Input feature vector
    w = np.array([0.3, 0.7])  # Weight vector
    bf = 0.1                  # Bias term

# Applying the sigmoid function
    y_hat = f_sigmoid(f, w, bf)

    print(y_hat)

#Output

0.7858349830425586

#Explanation of the Example
Input Feature Vector f: [0.5, 1.5]

Weight Vector w: [0.3, 0.7]

Bias Term bf: 0.1

Weighted Sum Calculation:

w.dot(f) = 0.3*0.5 + 0.7*1.5 = 0.15 + 1.05 = 1.2

Adding bias: x = 1.2 + 0.1 = 1.3

Sigmoid Activation Calculation:

np.exp(-1.3) ≈ 0.27253179

y_hat = 1 / (1 + 0.27253179) ≈ 0.7858349830425586

Output y_hat: 0.7858349830425586

This value indicates a high confidence that the input belongs to the positive class (close to 1).

----
##Summary
Function Purpose: The f_sigmoid function calculates the output of a single-layer neural network using the sigmoid activation function.

Weighted Sum with Bias: It first computes the weighted sum of the input features and adds the bias term.

Sigmoid Activation: It then applies the sigmoid activation function to squash the result into a range between 0 and 1.

Output: The output is the activated value, representing the predicted probability or confidence level for the input.

In [None]:
def f_forwardPass(I,K,b,w,bf):
    p = int(K.shape[0]/2)
    C = f_conv2d(I,K,p)
    C = C+b
    C = f_ReLU(C)
    S = f_pool(C)
    f = S.flatten()
    y_hat = f_sigmoid(f,w,bf)
    return C,f,y_hat

# Convolutional Function Definition

This function performs a forward pass through a convolutional neural network (CNN) layer, followed by a fully connected layer.

The forward pass involves convolution, adding bias, applying the ReLU activation function, pooling, flattening the result, and finally using a sigmoid activation to produce the output.

----
##Parameters

I: The input image or feature map.

K: The convolutional kernel (filter).

b: The bias term for the convolutional layer.

w: The weight vector for the fully connected layer.

bf: The bias term for the fully connected layer.

-----
##Steps in the Function
###Padding Calculation:

    p = int(K.shape[0] / 2)

This calculates the amount of padding needed. For a kernel of size fSize, padding of fSize/2 ensures that the output feature map has the same spatial dimensions as the input.

----
----
Let's break down and explain the purpose of this line of code:

    p = int(K.shape[0] / 2)

##Purpose of the Line
This line calculates the amount of padding required to ensure that the output feature map has the same spatial dimensions (height and width) as the input image after the convolution operation.

##Explanation
###Kernel Shape

    K.shape[0]

K is the convolutional kernel (or filter), and K.shape[0] gives the height (or number of rows) of the kernel.

Assuming the kernel is square, K.shape[0] is equal to K.shape[1].

###Padding Calculation:

    K.shape[0] / 2

This expression calculates half the height of the kernel. For example, if the kernel is of size 3x3, K.shape[0] is 3, and 3 / 2 gives 1.5.
Integer Conversion:

    int(K.shape[0] / 2)

The result of the division is converted to an integer using int(). This rounds down the result to the nearest whole number. So, if the kernel size is 3, 1.5 becomes 1.

----
-----

##Example
Consider a few different kernel sizes to see how padding is calculated:

###Kernel Size 3x3
    K.shape[0] = 3
    K.shape[0] / 2 = 1.5
    int(3 / 2) = 1
    p = 1

##Kernel Size 5x5
    K.shape[0] = 5
    K.shape[0] / 2 = 2.5
    int(5 / 2) = 2
    p = 2

----
##Purpose of Padding
Padding is used to control the spatial dimensions of the output feature map after applying the convolution operation.

By padding the input image appropriately, you ensure that the output feature map has the same height and width as the input image.

----
##Why Padding?
Let's take an example to understand why padding is necessary:

###Without Padding
Assume you have a 5x5 input image and a 3x3 kernel.

If you perform a convolution without padding, the output feature map will be smaller than the input image:

###Input Image: 5x5
###Kernel: 3x3

    Output Feature Map: (5 - 3 + 1) x (5 - 3 + 1) = 3x3

This reduction in size occurs because the kernel cannot be applied to the borders of the input image without going out of bounds.

##With Padding
To maintain the original size of the input image, you can add padding around the borders:

    Padding: 1 (for a 3x3 kernel)
    Input Image after Padding: 7x7 (5 original + 2 padding)
    Kernel: 3x3

Output Feature Map: (7 - 3 + 1) x (7 - 3 + 1) = 5x5

By adding 1 pixel of padding around the borders, the output feature map retains the same height and width as the input image.
----
##Conclusion
The line p = int(K.shape[0] / 2) calculates the amount of padding needed to ensure that the convolution operation does not reduce the spatial dimensions of the input image.

This is particularly useful in convolutional neural networks (CNNs) where maintaining the input size across layers is often desirable.

----
-----
##Convolution

    C = f_conv2d(I, K, p)

f_conv2d is called to perform a 2D convolution of the input image I with the kernel K, using padding p.

##Adding Bias

    C = C + b

The bias term b is added to the result of the convolution.

##ReLU Activation

    C = f_ReLU(C)

The ReLU activation function is applied to introduce non-linearity by setting all negative values in C to zero.

##Pooling

    S = f_pool(C)
    
f_pool is called to perform a pooling operation on the activated feature map C, reducing its spatial dimensions.

##Flattening

    f = S.flatten()

The pooled feature map S is flattened into a one-dimensional vector f, preparing it for the fully connected layer.

##Fully Connected Layer and Sigmoid Activation:

    y_hat = f_sigmoid(f, w, bf)

The sigmoid activation function is applied to the output of the fully connected layer.

This layer computes a weighted sum of the input vector f with weights w and adds the bias term bf.

----
##Return Values

    return C, f, y_hat

The function returns three values:

C: The feature map after convolution, bias addition, and ReLU activation.

f: The flattened feature vector after pooling.

y_hat: The final output of the forward pass after applying the sigmoid activation function.

-----
##Example
Let's walk through an example to see how it works:

import numpy as np

# Define dummy inputs and parameters
    I = np.random.rand(5, 5)  # Example input image
    K = np.random.rand(3, 3)  # Example kernel
    b = 1.0  # Bias term for convolution
    w = np.random.rand(9)  # Weights for fully connected layer (assuming 3x3 pooling result flattened)
    bf = 0.5  # Bias term for fully connected layer

# Define the necessary functions (as assumed to exist in the code)
    def f_conv2d(I, K, p):
       # Simplified convolution function
       I2 = np.pad(I, ((p, p), (p, p)), mode='constant')
       C = np.zeros_like(I)

       for i in range(I.shape[0]):
          for j in range(I.shape[1]):
            C[i, j] = np.sum(I2[i:i+K.shape[0], j:j+K.shape[1]] * K)
            return C

-----
    def f_ReLU(C):
       C[C < 0] = 0
       return C

----
    def f_pool(C):
       # Simplified pooling function (2x2 max pooling)
       S = C[::2, ::2]  # Assume downsampling by factor of 2
       return S

    def f_sigmoid(f, w, bf):
       x = w.dot(f) + bf
       y_hat = 1 / (1 + np.exp(-x))
       return y_hat

# Perform the forward pass
    C, f, y_hat = f_forwardPass(I, K, b, w, bf)
    print("Convolved feature map (C):", C)
    print("Flattened feature vector (f):", f)
    print("Output (y_hat):", y_hat)

##Summary
Padding Calculation: Determines how much padding is needed to maintain the input size after convolution.

###Convolution

Applies the kernel to the input image, adding padding to maintain the spatial dimensions.

###Adding Bias

Adds a bias term to each element of the convolved feature map.

###ReLU Activation

Applies the ReLU function to introduce non-linearity.

###Pooling

Reduces the spatial dimensions of the feature map.

###Flattening

Converts the pooled feature map into a one-dimensional vector.

###Fully Connected Layer and Sigmoid Activation

Applies a weighted sum and sigmoid function to produce the final output.

###Return Values

Provides the convolved feature map, flattened feature vector, and final output.

This function demonstrates the key steps involved in a forward pass through a simple CNN and fully connected layer.

In [None]:
def f_getGradient_w(y_hat,y,f):
    Dw = np.squeeze(np.zeros((1,len(f))))
    a = (y_hat-y)*y_hat*(1-y_hat)
    for i in range(len(f)):
        Dw[i] = a*f[i]
    return Dw

#1st Backpropagation Function -Calculates the Gradient (Single Neuron Backpropagation)

This function f_getGradient_w calculates the gradient of the weights (denoted as Dw) for a single neuron in a neural network during the backpropagation process.

The gradient is used to update the weights in order to minimize the loss function.


---


#Parameters

##y_hat

The predicted output of the neuron (a single value).

##y

The true label or target value (a single value).

##f

The flattened feature vector (input to the neuron).

----
#Steps in the Function

##Initialize the Gradient Array

    Dw = np.squeeze(np.zeros((1, len(f))))
    np.zeros((1, len(f)))
    
creates a 1xN array filled with zeros, where N is the length of the feature vector f.

=========================================
##np.squeeze

removes single-dimensional entries from the shape of the array, converting it to a 1D array of length N.

Let's create an array with a shape that includes a single-dimensional entry:

    import numpy as np
    array = np.zeros((1, 5))
    print(array.shape)  # Output: (1, 5)

This creates a 2D array with one row and five columns.

The shape of the array is (1, 5), meaning it has one row and five columns.

##Applying np.squeeze:

Now, let's apply np.squeeze to this array:

    squeezed_array = np.squeeze(array)
    print(squeezed_array.shape)  # Output: (5,)
    
np.squeeze removes the single-dimensional entry from the shape.

The shape of the array after squeezing is (5,), meaning it's now a 1D array with five elements.

##Explanation
The original array array has a shape of (1, 5).

This means it has one row and five columns.

When we apply np.squeeze, it removes the dimension of size 1 (the single row), resulting in a 1D array with the shape (5,).

##Applying to the Code Snippet
In the context of your code snippet:

    Dw = np.squeeze(np.zeros((1, len(f))))

##Creating a Zero Array

###np.zeros((1, len(f)))

creates a 2D array with shape (1, len(f)), where len(f) is the number of features.

This means it creates a single row with len(f) columns, all initialized to zero.

##Applying np.squeeze

np.squeeze removes the single-dimensional entry (the single row), converting it to a 1D array.

After squeezing, the shape of Dw is (len(f),), which is a 1D array with len(f) elements.

##Why Use np.squeeze?

In many machine learning and numerical computing tasks, we often need to ensure that arrays have the correct dimensions.

np.squeeze is used here to convert a 2D array with one row into a 1D array, which simplifies further calculations and operations.

##Conclusion
np.squeeze is a useful function in NumPy to remove single-dimensional entries from the shape of an array.

In the provided code, it ensures that the gradient array Dw has the correct shape (1D array with length equal to the number of features) for further processing.

-----
##Compute the Error Term

    a = (y_hat - y) * y_hat * (1 - y_hat)

This term a represents the error gradient with respect to the neuron's output.

y_hat - y is the difference between the predicted output and the true output.

y_hat * (1 - y_hat) is the derivative of the sigmoid activation function (which is commonly used in neural networks).

The product of these terms gives the gradient of the loss with respect to the neuron's output.

-----
##Calculate the Gradient for Each Weight

    for i in range(len(f)):
       Dw[i] = a * f[i]

Iterate over each element in the feature vector f.

For each feature f[i], calculate the gradient Dw[i] by multiplying the error term a with the corresponding feature value f[i].

This follows the chain rule in calculus, where the gradient of the weight is the product of the gradient of the output and the input feature.

----
##Return the Gradient

    return Dw

The function returns the calculated gradient array Dw.

------
#Example
Suppose y_hat = 0.8, y = 1, and f = [0.5, 0.3, 0.2].

##Initialize the Gradient Array

    Dw = np.squeeze(np.zeros((1, 3)))  # Dw = [0.0, 0.0, 0.0]

##Compute the Error Term

    a = (0.8 - 1) * 0.8 * (1 - 0.8)  # a = -0.16

##Calculate the Gradient for Each Weight

    Dw[0] = -0.16 * 0.5  # Dw[0] = -0.08
    Dw[1] = -0.16 * 0.3  # Dw[1] = -0.048
    Dw[2] = -0.16 * 0.2  # Dw[2] = -0.032

##Return the Gradient

    return Dw  # Dw = [-0.08, -0.048, -0.032]

----   
#Conclusion
This function computes the gradient of the loss function with respect to the weights of a neuron.

The gradients are essential for updating the weights during the training process to minimize the loss function and improve the model's predictions.

In [None]:
def f_getGradient_f(y_hat,y,w):
    Df = np.squeeze(np.zeros((1,len(w))))
    a = (y_hat-y)*y_hat*(1-y_hat)
    for i in range(len(w)):
        Df[i] = a*w[i]
    return Df

#2nd Backpropagation Function -Calculates the gradient of the loss with respect to the features

both calculate gradients, but they differ in the parameters they use to compute these gradients and their intended use in the context of training a neural network.

--------------
    f_getGradient_f:

##Inputs

    y_hat (predicted value), y (true value), w (weight vector)

##Outputs: Df (gradient with respect to features)

    def f_getGradient_f(y_hat, y, w):
      # Initialize gradient vector with zeros
     Df = np.squeeze(np.zeros((1, len(w))))
    
     # Calculate the error term 'a'
     a = (y_hat - y) * y_hat * (1 - y_hat)
    
     # Compute the gradient for each weight
     for i in range(len(w)):
        Df[i] = a * w[i]
    
     return Df

This function computes the gradient of the loss with respect to each feature by considering the contribution of each weight. Again, the error term a adjusts the gradient based on the difference between the predicted and true values.

-----
#Context in Neural Network Training
f_getGradient_w: This function is used during the backpropagation step to update the weights of the network.

The gradients calculated here indicate how much each weight should be adjusted to minimize the loss.

    f_getGradient_f:

This function calculates how much the input features (or activations from a previous layer) influence the loss, which can be used to update the features in certain types of neural networks or for analysis purposes.

----
#Example:
Assume we have:

Predicted value y_hat = 0.8

True value y = 1

Feature vector f = [0.5, 1.2, -0.7]

Weight vector w = [0.3, -0.8, 0.5]

    Using f_getGradient_w:
    gradients_w = f_getGradient_w(0.8, 1, [0.5, 1.2, -0.7])
    print("Gradients with respect to weights:", gradients_w)

##Using f_getGradient_f:

    gradients_f = f_getGradient_f(0.8, 1, [0.3, -0.8, 0.5])
    print("Gradients with respect to features:", gradients_f)

Both functions will provide different gradient vectors, reflecting their respective influences on the loss function.

----
#Conclusion
The main difference between the two functions lies in their targets for gradient computation:

f_getGradient_w focuses on the gradients of the weights using the feature vector.

f_getGradient_f focuses on the gradients of the features using the weight vector.

In summary, f_getGradient_w and f_getGradient_f are integral steps in the backpropagation process. They calculate the necessary gradients for updating weights and features to minimize the loss function, thereby improving the model's performance during training.


The first function in the backpropagation process is

##1. f_getGradient_w.

This is followed by

##f_getGradient_f.

By computing the gradients in this order, you ensure that the error term is propagated correctly from the output layer back through the network, allowing for proper weight updates during the training process.








In [None]:
def f_getGradient_bf(y_hat,y):
    Dbf = (y_hat-y)*y_hat*(1-y_hat)
    return Dbf

# 3rd Backpropagation Function - Calculates the gradient of the loss with respect to the bias term in the output layer

This function calculates the gradient of the loss with respect to the bias term in the output layer. In the context of a neural network, the bias term affects the activation function and needs to be adjusted to minimize the loss.

----
## Calculation

It uses the error term, computed as the difference between the predicted output (y_hat) and the actual output (y), multiplied by the derivative of the sigmoid activation function (since y_hat is the output of a sigmoid function).

This function calculates the gradient of the loss with respect to the bias term in the output layer. In the context of a neural network, the bias term affects the activation function and needs to be adjusted to minimize the loss.

Calculation: It uses the error term, computed as the difference between the predicted output (y_hat) and the actual output (y), multiplied by the derivative of the sigmoid activation function (since y_hat is the output of a sigmoid function).



In [None]:
def f_getGradient_S(Df):
    n = int(len(Df)**0.5)
    DS = Df.reshape((n,n))
    return DS

# 4th Backpropagation Function -Df is a 1D array (vector) that represents the gradient of some feature or activation, which was flattened from its original 2D shape

---
##Input Gradient Vector (Df):

Df is assumed to be a 1D array (vector) that represents the gradient of some feature or activation, which was flattened from its original 2D shape.
Calculate the Dimension (n):

    n = int(len(Df) ** 0.5)

This calculates the size of one dimension of the original 2D shape. It assumes that Df was flattened from a square matrix, so the number of elements in Df should be a perfect square.

len(Df) gives the total number of elements in the 1D gradient vector.

Taking the square root of len(Df) gives the dimension of the original 2D matrix, n.

int() converts the result to an integer. This is necessary because the square root could result in a floating-point number.

-----
##Reshape the Vector (DS):

    DS = Df.reshape((n, n))

This reshapes the 1D gradient vector Df into a 2D matrix with shape (n, n).
The reshape method transforms the vector back into the 2D shape it was before being flattened.

----
##Return the Reshaped Matrix

The function returns the reshaped matrix DS, which now matches the original 2D dimensions from which Df was flattened.

-----
Example
Assume Df is a gradient vector with 16 elements. This suggests that the original matrix was of size 4x4. Here’s what happens:

    Df = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16])
    n = int(len(Df) ** 0.5)  # n = int(16 ** 0.5) = 4
    DS = Df.reshape((n, n))

The reshaped matrix DS would be:

[[ 1,  2,  3,  4],
 [ 5,  6,  7,  8],
 [ 9, 10, 11, 12],
 [13, 14, 15, 16]]

 -----
##Summary
The function f_getGradient_S is used to convert a flattened 1D gradient vector back into its original 2D form.

This is useful in neural networks for visualizing or updating the gradients associated with a particular feature or layer in its spatial dimensions.

In [None]:
def f_getGradient_C(DS,C):
    r = C.shape[0]
    c = C.shape[1]
    DC = np.zeros((r,c))
    for i in range(0,r,2):
        for j in range(0,c,2):
            C_block = C[i:i+2,j:j+2]
            ind = np.unravel_index(np.argmax(C_block,axis=None),C_block.shape)
            DC[i+ind[0],j+ind[1]] = DS[int(i/2),int(j/2)]
    return DC

# The Function computes the gradient with respect to the convolutional layer's output (C)

The function f_getGradient_C computes the gradient with respect to the convolutional layer's output (C) using the gradient from the pooling layer (DS).

This function backpropagates the gradients through a max-pooling layer.

-----
##Function Explanation

    def f_getGradient_C(DS, C):
       r = C.shape[0]
       c = C.shape[1]
       DC = np.zeros((r, c))
       for i in range(0, r, 2):
         for j in range(0, c, 2):
            C_block = C[i:i+2, j:j+2]
            ind = np.unravel_index(np.argmax(C_block, axis=None), C_block.shape)
            DC[i+ind[0], j+ind[1]] = DS[int(i/2), int(j/2)]
        return DC

----
##Detailed Explanation
###Input and Initialization

DS: This is the gradient coming from the subsequent layer (the pooling layer in this case). It has the same shape as the output of the pooling layer.

C: This is the output of the convolutional layer before pooling.

r and c: These are the number of rows and columns of the matrix C.

DC: This is the gradient of the loss with respect to the output of the convolutional layer, initialized to a zero matrix of the same shape as C.

###Loop Over the Convolution Output

The function iterates over the convolutional output C in steps of 2 (since it's assumed that the pooling window is 2x2 and stride is 2).

###Pooling Window and Gradient Assignment

C_block: This extracts a 2x2 block from the matrix C.

ind: This finds the index of the maximum value in the C_block. The max-pooling operation records which index within the 2x2 block had the maximum value.

The gradient DS for the corresponding pooled region is then assigned to the location in DC that corresponds to the maximum value in C_block.

###Returning the Gradient:

The function returns DC, which is the gradient with respect to the convolutional layer's output.

-----
#Example to Illustrate
Let's assume C is the output from a convolutional layer, and DS is the gradient from the pooling layer:

    import numpy as np

-----
# Example convolutional layer output (4x4 matrix)
    C = np.array([
      [1, 3, 2, 4],
      [5, 6, 7, 8],
      [9, 2, 4, 1],
      [3, 7, 5, 6]
    ])

-----
# Example pooling layer gradient (2x2 matrix)
    DS = np.array([
      [1, 2],
      [3, 4]
    ])

    def f_getGradient_C(DS, C):
       r = C.shape[0]
       c = C.shape[1]
       DC = np.zeros((r, c))
       for i in range(0, r, 2):
          for j in range(0, c, 2):
             C_block = C[i:i+2, j:j+2]
             ind = np.unravel_index(np.argmax(C_block, axis=None), C_block.shape)
             DC[i+ind[0], j+ind[1]] = DS[int(i/2), int(j/2)]
      return DC

###Explanation
Looping Over Blocks in the Convolutional Output:
    for i in range(0, r, 2):
       for j in range(0, c, 2):

These loops iterate over the convolutional output matrix C in steps of 2.
r and c are the dimensions (number of rows and columns) of the matrix C.

The step of 2 is used because the max-pooling operation typically uses a 2x2 window with a stride of 2.

###Extracting a 2x2 Block:

    C_block = C[i:i+2, j:j+2]

This line extracts a 2x2 block from C, starting at position (i, j).

C_block is a small 2x2 matrix containing elements from C.

###Finding the Index of the Maximum Value in the Block:

    ind = np.unravel_index(np.argmax(C_block, axis=None), C_block.shape)
    np.argmax(C_block, axis=None)
    
finds the index of the maximum value in C_block as if C_block were a flattened array.

    np.unravel_index

then converts this flat index back into a tuple of coordinates within the 2x2 block.

ind is a tuple representing the row and column indices of the maximum value within the C_block.

###Assigning the Gradient to the Corresponding Position in DC:

    DC[i+ind[0], j+ind[1]] = DS[int(i/2), int(j/2)]

This line assigns the gradient value from DS to the corresponding position in DC.

    DS[int(i/2), int(j/2)]

retrieves the gradient from the down-sampled gradient matrix DS. The indices int(i/2) and int(j/2) map the current 2x2 block position to the corresponding position in the smaller DS matrix.

    DC[i+ind[0], j+ind[1]]

places the retrieved gradient value into DC at the position corresponding to the maximum value within the 2x2 block.

-----
#Summary
The loop iterates over each 2x2 block in the convolutional output matrix C.
For each block, it identifies the position of the maximum value (which was selected during max-pooling).

It then assigns the corresponding gradient value from the pooled gradient matrix DS back to the original position in DC.

-----
#Example
To clarify, let's consider a small example:

##Convolutional Output C:

   [[1, 3, 2, 4],
    [5, 6, 7, 8],
    [9, 2, 4, 1],
    [3, 7, 5, 6]]

##Pooled Gradient DS:

    [[1, 2],
    [3, 4]]

The function will perform the following steps for each 2x2 block:

For the block at (0,0):

    [[1, 3],
    [5, 6]]

Maximum value is 6 at (1,1).

Assign DS[0,0] (which is 1) to DC[0+1, 0+1], so DC[1,1] = 1.

For the block at (0,2):

    [[2, 4],
    [7, 8]]

Maximum value is 8 at (1,1).

Assign DS[0,1] (which is 2) to DC[1,3], so DC[1,3] = 2.

For the block at (2,0):

    [[9, 2],
    [3, 7]]

Maximum value is 9 at (0,0).

Assign DS[1,0] (which is 3) to DC[2,0], so DC[2,0] = 3.

For the block at (2,2):

    [[4, 1],
    [5, 6]]
Maximum value is 6 at (1,1).

Assign DS[1,1] (which is 4) to DC[3,3], so DC[3,3] = 4.

Resulting DC:

   [[0, 0, 0, 0],
    [0, 1, 0, 2],
    [3, 0, 0, 0],
    [0, 0, 0, 4]]

This shows how the gradients are backpropagated from the pooled output to the original convolutional output, assigning them to the positions of the maximum values within each pooling block.
------
# Call the function
    DC = f_getGradient_C(DS, C)
    print(DC)

------
#Result

[[0. 0. 0. 2.]
 [0. 1. 0. 0.]
 [3. 0. 0. 0.]
 [0. 0. 0. 4.]]

Here, the DC matrix shows the gradient assigned to the positions of the maximum values in each 2x2 block of C, as determined by the DS matrix.

------------------
#Summary
The f_getGradient_C function essentially backpropagates the gradients through a max-pooling layer by distributing the gradient from the pooled output back to the positions of the maximum values in the convolutional output.

This is an essential step in implementing backpropagation in convolutional neural networks (CNNs), ensuring the correct gradients are computed for the convolutional layer parameters.

In [None]:
def f_getChainRuleGradients(C,DC,I,u,v):
    DKuv = 0
    for i in range(C.shape[0]):
        for j in range(C.shape[1]):
            if C[i,j]>0 and i-u>=0 and j-v>=0 and i-u<C.shape[0] and j-v<C.shape[1]:
                DKuv = DKuv + (I[i-u,j-v]*DC[i,j])
    return DKuv

# The gradient of the loss with respect to 𝐶
###C

The output of the convolutional layer after applying the activation function (in this case, ReLU).

###DC

The gradient of the loss with respect to
$$𝐶$$
which is propagated back from the pooling layer.

I: The original input image to the convolutional layer.

u, v: The coordinates of the element in the convolutional kernel
$$𝐾$$
for which we are calculating the gradient.

------
##Function Logic

###Initialization:

    DKuv = 0
This initializes the gradient

$$𝐷
𝐾
𝑢
𝑣$$

DKuv of the kernel element at position
$$(
𝑢
,
𝑣
)$$

(u,v) to zero.

Loop Over Each Element in
$$𝐶$$

    for i in range(C.shape[0]):
       for j in range(C.shape[1]):

This loops over each element in the convolutional output 𝐶.

Conditions to Ensure Valid Indices:


    if C[i,j] > 0 and i-u >= 0 and j-v >= 0 and i-u < C.shape[0] and j-v < C.shape[1]:

C[i, j] > 0: This checks if the element
𝐶
[
𝑖
,
𝑗
]
C[i,j]

is part of the region that was activated by the ReLU function.

Since ReLU sets negative values to zero, we only consider positive values.
i-u >= 0 and j-v >= 0 and i-u < C.shape[0] and j-v < C.shape[1]: These conditions ensure that the indices
$$(
𝑖
−
𝑢
,
𝑗
−
𝑣
)$$
(i−u,j−v) are within the bounds of the input image
𝐼

-----
##Accumulate Gradient

    DKuv = DKuv + (I[i-u, j-v] * DC[i, j])
If the conditions are satisfied, the gradient
$$𝐷
𝐾
𝑢
𝑣$$

DKuv is updated by adding the product of:

I[i-u, j-v]: The corresponding element in the input image.

DC[i, j]: The gradient of the loss with respect to
𝐶
[
𝑖
,
𝑗
]
C[i,j].
Return the Gradient:

python
Copy code
return DKuv
After looping through all elements, the function returns the calculated gradient
𝐷
𝐾
𝑢
𝑣
DKuv.

Example
Let's consider a small example to illustrate the calculation:

Input Matrices
Input Image
𝐼
I:

plaintext
Copy code
[[1, 2, 3],
 [4, 5, 6],
 [7, 8, 9]]
Output of Convolution
𝐶
C:

plaintext
Copy code
[[1, 3],
 [5, 6]]
Gradient with Respect to
𝐶
C (DC):

plaintext
Copy code
[[0.1, 0.2],
 [0.3, 0.4]]
Kernel Element Position
(
𝑢
,
𝑣
)
=
(
0
,
0
)
(u,v)=(0,0)

Calculation
Initialize
𝐷
𝐾
𝑢
𝑣
DKuv:

plaintext
Copy code
DKuv = 0
Loop through each element in
𝐶
C:

For
𝑖
=
0
,
𝑗
=
0
i=0,j=0:

plaintext
Copy code
DKuv += I[0-0, 0-0] * DC[0, 0]
DKuv += 1 * 0.1
DKuv = 0.1
For
𝑖
=
0
,
𝑗
=
1
i=0,j=1:

plaintext
Copy code
DKuv += I[0-0, 1-0] * DC[0, 1]
DKuv += 2 * 0.2
DKuv = 0.5
For
𝑖
=
1
,
𝑗
=
0
i=1,j=0:

plaintext
Copy code
DKuv += I[1-0, 0-0] * DC[1, 0]
DKuv += 4 * 0.3
DKuv = 1.7
For
𝑖
=
1
,
𝑗
=
1
i=1,j=1:

plaintext
Copy code
DKuv += I[1-0, 1-0] * DC[1, 1]
DKuv += 5 * 0.4
DKuv = 3.7
Return
𝐷
𝐾
𝑢
𝑣
DKuv:

plaintext
Copy code
return 3.7
The function f_getChainRuleGradients accumulates the gradient for the specific kernel element
(
𝑢
,
𝑣
)
(u,v) by considering the influence of each activated element in
𝐶
C and the corresponding elements in the input image
𝐼
I. This approach follows the chain rule to propagate the gradients back through the layers.

In [None]:
def f_getGradient_K(C,I,y_hat,y,w):
    Df = f_getGradient_f(y_hat,y,w)
    DS = f_getGradient_S(Df)
    DC = f_getGradient_C(DS,C)
    DK = np.zeros((5,5))
    for u in range(-2,3):
        for v in range(-2,3):
            DK[u+2,v+2] = f_getChainRuleGradients(C,DC,I,u,v)
    return DK,DC

# f_getGradient_K function

This function computes the gradient of the convolutional kernel
$$𝐾$$
during the backpropagation process in a convolutional neural network (CNN).


---


## Calculate Gradient with Respect to Flattened Output

    Df = f_getGradient_f(y_hat, y, w)

###f_getGradient_f(y_hat, y, w):

This function calculates the gradient of the loss with respect to the flattened feature map output from the pooling layer.

----
##Parameters
y_hat: The predicted output.

y: The true label.

w: The weights of the fully connected layer.

------
##Reshape Gradient to Match Pooled
 Output Dimensions

    DS = f_getGradient_S(Df)

###f_getGradient_S(Df)

This function reshapes the gradient Df from a flattened vector back into a 2D shape that matches the pooled output dimensions.

------
##Calculate Gradient with Respect to Convolutional Output:

    DC = f_getGradient_C(DS, C)

-----

    f_getGradient_C(DS, C)

This function calculates the gradient of the loss with respect to the output of the convolutional layer (before pooling).


###Parameters
DS: The gradient with respect to the pooled output.

C: The output of the convolutional layer.

----

Initialize Kernel Gradient Matrix:

    DK = np.zeros((5,5))

DK: This matrix will hold the gradients of the kernel 𝐾.

Assuming a 5x5 kernel, it's initialized to zeros.

------
###Calculate Gradients for Each Element in the Kernel:

    for u in range(-2, 3):
       for v in range(-2, 3):
          DK[u+2, v+2] = f_getChainRuleGradients(C, DC, I, u, v)

This double loop iterates over the kernel's dimensions.

For a 5x5 kernel, u and v range from -2 to 2.

----
###f_getChainRuleGradients(C, DC, I, u, v):

This function calculates the gradient for a specific element in the kernel using the chain rule.

###Parameters:
C: The output of the convolutional layer.

DC: The gradient with respect to the
convolutional layer's output.

I: The input image.

u, v: The current indices in the kernel.

Note: u+2 and v+2 adjust the indices to correctly map into the 5x5 kernel matrix DK.

-----
###Return the Gradient Matrices:

    return DK, DC

-----
##Summary of the Gradient Calculation Process
Gradient with Respect to Fully Connected Layer Output (Df):

1. Calculate the gradient of the loss with respect to the output of the fully connected layer.

2. Gradient with Respect to Pooled Output (DS):

3. Reshape the gradient Df to match the dimensions of the pooled output.

4. Gradient with Respect to Convolutional Layer Output (DC):

5. Calculate the gradient of the loss with respect to the output of the convolutional layer.

6. Gradient with Respect to Kernel (DK):

Iterate over each element in the kernel and use the chain rule to compute its gradient.

This ensures that gradients are propagated back correctly through each layer, allowing for the correct adjustment of parameters (weights and biases) during the training process of the neural network.

In [None]:
def f_getGradient_b(C,DC):
    Db = DC[C>0].sum()
    return Db

In [None]:
def f_backwardPass(I,C,f,w,y_hat,y):
    Dw = f_getGradient_w(y_hat,y,f)
    Dbf = f_getGradient_bf(y_hat,y)
    DK,DC = f_getGradient_K(C,I,y_hat,y,w)
    Db = f_getGradient_b(C,DC)
    return DK,Db,Dw,Dbf

In [None]:
def f_initParams():
    K = 0.01*np.random.randn(5,5)
    b = 0.01*np.random.randn()
    w = np.squeeze(0.01*np.random.randn(1,256))
    bf = 0.01*np.random.randn()
    return K,b,w,bf

In [None]:
I = np.random.randint(1,255,(32,32))
y = 0
K,b,w,bf = f_initParams()
for i in range(50):
    C,f,y_hat = f_forwardPass(I,K,b,w,bf)
    print(y_hat)
    DK,Db,Dw,Dbf = f_backwardPass(I,C,f,w,y_hat,y)
    alpha = 0.001
    K = K - alpha*DK
    b = b - alpha*Db
    w = w - alpha*Dw
    bf = bf - alpha*Dbf



0.4920229785037398
0.243209155495617
0.15288964553136547
0.11512223380183137
0.09087306899243046
0.07531031211479094
0.06470764726937782
0.05729752408381704
0.051654478064740786
0.04719946859083668
0.04359662849889232
0.04056794188721934
0.03802546596762972
0.03587733059268647
0.03405203009207826
0.03244757526677582
0.031024403919923484
0.029752018091007512
0.028606511851509826
0.027568868555206745
0.026623761696283544
0.025758693538737613
0.02496336530069008
0.02422920923141936
0.023549035905569685
0.022915914406668098
0.022323590582331607
0.021769937027563663
0.021251060744915732
0.0207635869665655
0.020304574630617984
0.019871447996215756
0.01946194089732216
0.019074050978987237
0.018706001881194704
0.018355699463212194
0.018021189144092755
0.01770243160950251
0.017398265894446933
0.01710764556369455
0.0168296247186427
0.016563346025295723
0.016308030428783798
0.01606296828194199
0.015827511664852217
0.01560106771175928
0.015383092793555568
0.015173087429734881
0.014970591824611076


In [None]:
y_hat

0.895626736610168

In [None]:
len(f)

256

In [None]:
y_hat

0.4961399908819838

In [None]:
DK

array([[-2.97547376, -2.61208367, -1.25452159, -0.10679289, -1.12689902],
       [-2.60374285, -2.91080737, -2.63396244, -3.35799435, -3.24738384],
       [-3.56845218, -3.09257297, -1.02921786, -4.64212009, -2.74420624],
       [-3.32659253, -2.67697302, -3.12026933, -3.79688228, -1.79560188],
       [-2.45224328, -3.83832392, -1.55289401, -2.82055551, -2.72611557]])

In [None]:
Dw

array([-0.34803887, -0.81728514, -0.87373002, -1.14537418, -0.59310609,
       -0.20351208, -0.72914047, -0.88524479, -0.41872099, -0.41498201,
       -0.37638551, -0.73271683, -0.30267096, -0.51834231, -0.77523831,
       -0.30602767, -0.47541978, -0.5287802 , -0.39389314, -0.2601277 ,
       -1.15820767, -0.16973692, -0.11222836, -0.        , -0.46965639,
       -0.75465536, -0.47732463, -0.34511674, -0.30446064, -0.49239171,
       -0.40545826, -0.71765816, -0.23753064, -0.23499351, -1.42790353,
       -0.17978169, -0.51620326, -1.13837197, -0.25103392, -0.66151841,
       -1.21015924, -0.46574599, -0.43922686, -0.37903103, -0.18672564,
       -0.43357395, -1.08880907, -0.        , -0.6861942 , -0.58266859,
       -0.49492594, -0.40179354, -0.52538117, -0.50736274, -1.15901314,
       -0.39998735, -0.22748738, -0.18657371, -0.68763762, -0.        ,
       -0.54415166, -0.26445015, -0.66209606, -0.31885034, -0.18546286,
       -0.31396223, -0.89413688, -0.55533751, -0.40425145, -0.93

In [None]:
Db

-0.020351432784903194

In [None]:
Dbf

-0.1060162056883096

# Summary of the CNN code
The provided code snippet is a simplified implementation of a Convolutional Neural Network (CNN) for training with backpropagation.

It includes functions for forward propagation, backpropagation, and parameter updates.

Here’s a detailed explanation of what each part of the code does:

---
---

#1. Function Definitions
##1.1 f_padd(I, p)

###Purpose

Adds padding to the input image I.

##Parameters
I: Input image.

p: Padding size.

##Operation

Adds p rows of zeros to the top and bottom.

Adds p columns of zeros to the left and right.

Returns the padded image.

...............................................................................

##1.2 f_conv2d(I, K, p)

Performs 2D convolution on the input image I with kernel K and padding p.

###Parameters
I: Input image.

K: Convolutional kernel.

p: Padding size.

###Operation
Pads the input image I.

Applies the convolution operation using kernel K.

Returns the feature map C.

------
##1.3 f_ReLU(C)

###Purpose

Applies the ReLU activation function.

###Parameters

C: Input feature map.

###Operation:
Sets all negative values in C to zero.

Returns the activated feature map.

------
##1.4 f_pool(C)

###Purpose

Performs max pooling on the feature map C.

###Parameters

C: Input feature map.

###Operation
Applies 2x2 max pooling.

Reduces the dimensions of C by half.

Returns the pooled feature map S.

------
##1.5 f_sigmoid(f, w, bf)

###Purpose

Computes the output of a sigmoid activation function.

###Parameters

f: Flattened pooled feature map.

w: Weights.

bf: Bias for the sigmoid function.

###Operation
Computes the weighted sum x = w.dot(f) + bf.

Applies the sigmoid function to x.

Returns the sigmoid output y_hat.

------
#1.6 f_forwardPass(I, K, b, w, bf)

###Purpose

Performs forward propagation through the network.

###Parameters
I: Input image.

K: Convolutional kernel.

b: Bias for the convolution layer.

w: Weights for the sigmoid function.

bf: Bias for the sigmoid function.

####Operation
Applies convolution, adds bias, applies ReLU, and performs pooling.

Flattens the pooled feature map and computes the sigmoid activation.

Returns the intermediate and final outputs.

------
##1.7 f_getGradient_w(y_hat, y, f)

###Purpose

Computes the gradient with respect to weights w.

###Parameters
y_hat: Predicted output.

y: Actual target.

f: Flattened pooled feature map.

###Operation
Calculates the gradient using the chain rule.

Returns the gradient with respect to w.

------
##1.8 f_getGradient_f(y_hat, y, w)
###Purpose

Computes the gradient with respect to the flattened feature map f.

###Parameters
y_hat: Predicted output.

y: Actual target.

w: Weights.

###Operation
Calculates the gradient using the chain rule.
Returns the gradient with respect to f.

-----
##1.9 f_getGradient_bf(y_hat, y)

###Purpose

Computes the gradient with respect to the bias bf in the sigmoid function.

###Parameters
y_hat: Predicted output.

y: Actual target.

####Operation
Calculates the gradient with respect to bf.

Returns the gradient.

-----
##1.10 f_getGradient_S(Df)

###Purpose

Reshapes the gradient vector Df into the shape of the pooled feature map S.

###Parameters
Df: Gradient vector.

###Operation
Reshapes Df into a square matrix.

Returns the reshaped gradient matrix DS.

-----
##1.11 f_getGradient_C(DS, C)
###Purpose

Computes the gradient with respect to the convolutional layer C from the pooled gradients DS.

###Parameters
DS: Gradient matrix for the pooled feature map.

C: Feature map before pooling.

###Operation:
Maps gradients from DS back to the locations in C that contributed to the pooling.

Returns the gradient with respect to C.

------
##1.12 f_getChainRuleGradients(C, DC, I, u, v)
###Purpose

Computes the gradient for each kernel weight using the chain rule.

###Parameters
C: Feature map before pooling.

DC: Gradient matrix for the pooled feature map.

I: Input image.

u, v: Offsets for the kernel position.

###Operation
Calculates the gradient contribution from each input pixel to each kernel weight.

Returns the gradient for a particular kernel weight.

------
##1.13 f_getGradient_K(C, I, y_hat, y, w)
###Purpose

Computes the gradient with respect to the convolutional kernel K.

###Parameters
C: Feature map before pooling.

I: Input image.

y_hat: Predicted output.

y: Actual target.

w: Weights.

###Operation
Calculates gradients for the kernel using the chain rule.
Returns the gradient with respect to K and the gradient matrix DC.

-----
##1.14 f_getGradient_b(C, DC)
###Purpose

Computes the gradient with respect to the bias b.

###Parameters
C: Feature map before pooling.

DC: Gradient matrix for the convolutional layer.

###Operation:
Sums up gradients where C is positive.

Returns the gradient with respect to b.

------
##1.15 f_backwardPass(I, C, f, w, y_hat, y)
###Purpose

Performs backpropagation to compute gradients for all parameters.

###Parameters
I: Input image.

C: Feature map before pooling.

f: Flattened pooled feature map.

w: Weights.

y_hat: Predicted output.

y: Actual target.

###Operation:
Calls gradient functions to compute gradients for weights, biases, and kernel.

Returns gradients for K, b, w, and bf.

-------
##1.16 f_initParams()
###Purpose

Initializes parameters for the convolutional layer.

###Operation
Initializes the convolution kernel K, bias b, weights w, and bias for sigmoid bf with small random values.

Returns initialized parameters.

-----
-----
#2. Main Code Execution
###Initialization

An input image I is randomly generated.

Parameters K, b, w, and bf are initialized.

###Training Loop

Forward Pass: Computes the forward pass through the network, producing output y_hat.

Backward Pass: Computes gradients for all parameters using the backward pass.

Parameter Update: Updates parameters (K, b, w, bf) using gradient descent with a learning rate alpha.

-----
-----
#Summary

This program demonstrates a basic implementation of forward and backward propagation in a simple CNN-like structure. It includes padding, convolution, ReLU activation, pooling, and sigmoid activation.

The backward propagation functions calculate gradients needed to update the weights and biases in the network, allowing the model to learn from the data over multiple iterations.