<a href="https://colab.research.google.com/github/Abhishek0697/Deep-Learning/blob/main/Build%20CNNs%20in%20Numpy/CNNs.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Some Facts about the Convolution Neural Networks

1. In CNNs, an explicit assumption is that the input are images.

2. CNNs are **Translational Invariant**. i.e: If it sees an object, say a flower only at top left corner of images during training, even then it can identify a flower anywhere in the image at test time. MultiLayer Perceptrons lack this feature.

3. **Filters**, also called as Kernels have a small spatial dimension i.e. Height and Width. However, each filter spans through the entire depth of the input volume. for e.g. for input images from CIFAR-10 of size 32x32x3, if we use kernel size of 5x5, the shape of the filters would be 5x5x3 

4. Filters in the same depth slice share weights. This **Parameter Sharing** gives a huge boost to the computation complexity and reduces learnable parameters in a layer.

5. CNNs are not naturally invariant to transformations like scaling or rotation

6. The initial layers learn simple patterns like horizonal/vertical edges, while the deeper layers can learn complex structures like a wheel of a car, bird's beak etc.

# Here is a code for the forward pass for a CNN layer

### A Convolutional Layer accepts an input of shape (B, C, H, W)
- B: Batch size; A batch of "B" number of input samples
- C: Number of Channels. for eg. N = 3 for an RGB image
- H: Height of images
- W: Width of images
<br>

### Parameters 
- S: Stride
- P: Padding
- F: Number of Filters
- K: Filter size. for eg - 3x3, 5x5
<br>

### The shape of the output of a layer is computed as

- $Output Width = \frac{W- F+ 2P}{S}+1$

- $Output Height = \frac{H- F+ 2P}{S}+1$

In [None]:
def forward_pass(x, w, b, stride, pad):
    """
    Input:
    1. x: input data of shape (B,C,H,W)
    2. w: Filter weights of shape (F, C, HH, WW)
    3. b: biases, of shape (F)
    
    Return: out: Output of the forward pass
    """
    
    B,C,H,W = np.shape(x)
    F,C,HH,WW = np.shape(w)
    
    '''
    First we will pad the input x with zeros using np.pad
    '''
    Padded_x=np.zeros((B,C, H+(2*pad), W+(2*pad)))
    
    for n in range(len(x)):
        for c in range(len(x[n])):
            Padded_x[n,c,:,:]=(np.pad(x[n,c], pad, 'constant', constant_values=0))

    
    '''
    Calculate the output shape using the formula mentioned before
    '''
    output_height= int(((H-HH+(2*pad))/stride)+1)
    output_width = int(((W-WW+(2*pad))/stride)+1)

    out = np.zeros((B,F,output_height,output_width))


    '''
    Now we will perform Convolution of the padded input with the filters.
    A rough description of the loop

    for each batch:
      for each filter:    
        for sliding across height dimension:
          for sliding across width dimension:
          
            initialize a temp variable that stores the sum to be aggregated across channels

            for each channel:
              step 1: get the input frame
              step 2: take the sum(element-wise dot product between the filter and the frame) add it to our temp variable

            Assign the sum to the output 
            
            move rightwards with incrementing the pointer with the stride for sliding along width dimension
          move downwards with incrementing the pointer with the stride for sliding along height dimension
    '''

    for n in range(B):                                                                                  
        for f in range(F):                                                                             
            ii=0                                                                                
            
            for i in range(output_height):        
                jj=0
                for j in range(output_width):

                        sum_conv_frames=0
                        
                        for c in range(C):                                                            
                            
                            frame=Padded_x[n,c,:,:]
                            convolution_frame=frame[ii:(ii+HH),jj:(jj+WW)]
                            sum_conv_frames+= np.sum(np.multiply(convolution_frame,w[f,c,:,:]))
                        
                        out[n,f,i,j]=sum_conv_frames+b[f]
                        jj+=stride
                
                ii+=stride                      
    return out

# Some handy tricks while designing a Convolutional Neural Networks

1. The resolution of input can be reduced by setting the 'stride' parameter or using pooling layer after the convolution layer. 

2. If we want to keep the resolution of the output same as that of input to the layer, 
- If using 3x3 kernel, use padding of 1 
- If using 5x5 kernel, use padding of 2 
- If using 7x7 kernel, use padding of 3  and so on....

