## Back Propagation in Convolutional Neural Networks 

lets assume the function f is a convolution between Input X and a Filter F. Input X is a 3x3 matrix and Filter F is a 2x2 matrix, as shown below:

![](https://miro.medium.com/max/692/1*VNr0GiFEwjmwj2v9YmPn5Q.png)

Convolution between Input X and Filter F, gives us an output O. This can be represented as:

![](https://miro.medium.com/max/1392/1*Q2GGz43E-o5FEtaDXuw8tA.png)

![](https://miro.medium.com/max/1000/1*K7dINARev0NUB-HWp9mbwA.gif)

This gives us the forward pass! Let’s get to the Backward pass. As mentioned earlier, we get the loss gradient with respect to the Output O from the next layer as ∂L/∂O, during Backward pass. And combining with our previous knowledge using Chain rule and Backpropagation we get:

![](https://miro.medium.com/max/800/1*w8VkZ50foXWTmoXDDnr8tg.png)

As seen above, we can find the local gradients ∂O/∂X and ∂O/∂F with respect to Output O. And with loss gradient from previous layers — ∂L/∂O and using chain rule, we can calculate ∂L/∂X and ∂L/∂F.

So let’s find the gradients for X and F — ∂L/∂X and ∂L/∂F

### Finding ∂L/∂F

This has two steps as we have done earlier.
Find the local gradient ∂O/∂F
Find ∂L/∂F using chain rule

![](https://miro.medium.com/max/418/1*2nAF7_I4J7xLtpS8m_mXeA.png)

If we look closely this above equation can be written in form of our convolution operation.

![](https://miro.medium.com/max/907/1*auj7ULC2kRCa99_6u1QSNA.jpeg)

### Finding ∂L/∂X

Similarly we can find the gradients of the input matrix ‘X’ with respect to the error ‘E’.

![](https://miro.medium.com/max/414/1*ndW3KqLjW9ht_8ZFjQOHDw.png)

Now, the above computation can be obtained by a different type of convolution operation known as full convolution. In order to obtain the gradients of the input matrix we need to rotate the filter by 180 degree and calculate the full convolution of the rotated filter by the gradients of the output with respect to error, As represented in the image below.

![](https://miro.medium.com/max/1189/1*YX5CVe6W7sOpKqJ8b2dVhg.jpeg)

The full convolution can be visualized as carrying out the procedure as represented in the figure below.

![](https://miro.medium.com/max/510/1*OiCxYNaVWOhOu36AXZoPgQ.jpeg)

### Implementation of forward and backward pass of a convolution function

In [None]:
def conv_forward(X, W):
    '''
    The forward computation for a convolution function
    
    Arguments:
    X -- output activations of the previous layer, numpy array of shape (n_H_prev, n_W_prev) assuming input channels = 1
    W -- Weights, numpy array of size (f, f) assuming number of filters = 1
    
    Returns:
    H -- conv output, numpy array of size (n_H, n_W)
    cache -- cache of values needed for conv_backward() function
    '''
    
    # Retrieving dimensions from X's shape
    (n_H_prev, n_W_prev) = X.shape
    
    # Retrieving dimensions from W's shape
    (f, f) = W.shape
    
    # Compute the output dimensions assuming no padding and stride = 1
    n_H = n_H_prev - f + 1
    n_W = n_W_prev - f + 1
    
    # Initialize the output H with zeros
    H = np.zeros((n_H, n_W))
    
    # Looping over vertical(h) and horizontal(w) axis of output volume
    for h in range(n_H):
        for w in range(n_W):
            x_slice = X[h:h+f, w:w+f]
            H[h,w] = np.sum(x_slice * W)
            
    # Saving information in 'cache' for backprop
    cache = (X, W)
    
    return H, cache

In [None]:
def conv_backward(dH, cache):
    '''
    The backward computation for a convolution function
    
    Arguments:
    dH -- gradient of the cost with respect to output of the conv layer (H), numpy array of shape (n_H, n_W) assuming channels = 1
    cache -- cache of values needed for the conv_backward(), output of conv_forward()
    
    Returns:
    dX -- gradient of the cost with respect to input of the conv layer (X), numpy array of shape (n_H_prev, n_W_prev) assuming channels = 1
    dW -- gradient of the cost with respect to the weights of the conv layer (W), numpy array of shape (f,f) assuming single filter
    '''
    
    # Retrieving information from the "cache"
    (X, W) = cache
    
    # Retrieving dimensions from X's shape
    (n_H_prev, n_W_prev) = X.shape
    
    # Retrieving dimensions from W's shape
    (f, f) = W.shape
    
    # Retrieving dimensions from dH's shape
    (n_H, n_W) = dH.shape
    
    # Initializing dX, dW with the correct shapes
    dX = np.zeros(X.shape)
    dW = np.zeros(W.shape)
    
    # Looping over vertical(h) and horizontal(w) axis of the output
    for h in range(n_H):
        for w in range(n_W):
            dW += X[h:h+f, w:w+f] * dH(h,w) # or dW[h,w] =  np.sum(X[h:h+f, w:w+f] * dH)
            dX[h:h+f, w:w+f] += W * dH(h,w) 
    
    return dX, dW