# Convolutional Neural Networks (CNN)
---------------------------------------

## Introduction to CNNs

Wikipedia describes CNN as 'a regularized type of feed-forward neural network that learns feature engineering by itself via filters (or kernel) optimization.'

In simpler words, it is a type of Neural Network that learns to understand and analyse images / other structured data. What makes CNN special is the fact that it can learn features directly without relying on humans to handcraft features for it. CNN utilizes a process called optimization for this.

In the vast field of ML and AI, CNNs are something that have the capability to change the playing field and one of the most versatile Neural Networks, they're useful for tasks like image classification and object detection. 

### Importance of CNNs

CNNs as an algorithm are one of the most important part of ML. They are used in many different applications like image classification, object detection, speech recognition, etc. These alogrithms have been used for many years and are still in use today. Furthermore, they're also used in generative models like GANs. Outside of ML, CNNs are used in other fields like medical image processing, natural language processing, etc.

### Terms used in CNNs
1) Convolution:
    - Convolution is a mathematical operation on two functions (f&g) which produces another function. This new third function expresses how f is modified by g, the new function is called convolution and is the actual output of the CNN.

   → Mathematical Representation of Convolution:


   
   ![image.png](attachment:image.png)

   In the image,
    - f and g are the functions being convoluted.
    - (f*g)(t) is the output of the convolution at point t.
    - t is the real number variable of functions f and g , is the point where we are evaluating the convolution result itself. For example, of we're working with images, t will be the pixel location for a specific pixel of the image.
    - g(τ) is the convolution of function f(t)
    - dτ is the 1<sup>st</sup> derivative of g(τ) function.


2) Filter / Kernel :
    - A filter or kernel is a small matrix of weights that slide over the input data (like an image). It basically performs element wise multiplication with the part of image/input  that it's currently on and sums up the results into a single output pixel. 
    
    - These smaller matrices are the heart of what makes CNN work as they make the data to be processed smaller incerasing efficiency and performance and are the primary componenets that helps CNN models extract useful features form input data.

    * Okay, but how does a filter even work?

        → A filter can be understood like sliding your cupped hands to look through a window. The size of the cup, how fast/ slow you slide it and how you process information all changes your understanding of the stuff in the opposite side of window.

        → A step by step example of how filters work:

            Size → Sliding → Convolution operation → Learning → Feature Extraction

        Let's take a matrix example to understand it better.


        
        ![image-2.png](attachment:image-2.png)

        In the image, the size of filter is 2x2. The highlighted field is Local Receptive Field (LRF) for a 2x2 kernel.

    Concept of LRF is inspired from the fact that many neurons in our visual cortex are responsive to stimuli located in a limited region of the visual field. So, LRF is basically like what you focus on in your visual field. Interesting, isn't it?   

    <i>Neurons in CNN (Convolution layer) can be understood as a local region (LRF) of the input field.</i>


    <i>Neurons in CNN (Fully Connected layer) can be understood as a tiny processor that takes the outputs of previous layers ,performs a weighted sum of the outputs and provides the weighted sum as output. To perform a weighted sum operation, the input of neuron in multiplied by a corresponding weight, which is learned by the network in training and sums up the product.</i>



3) Weight sharing :
    - Weight sharing or parameter sharing is the key feature of a CNN that sets it apart from other Neural Networks.

    - A kernel's weight is shared across all the neurons that use the feature. So, in this image:
        
        ![image-2.png](attachment:image-2.png)

    The weight of the shaded field is shared across all the neurons that use the feature. 





4) Stride:
    - Stride is the parameter which determines the step size or the number of positions the kernel moves when sliding over the input data. The size of stride is directly proportional to the size of the feature map. Stride of size 1 (1x1) results in overlapping receptive fields, meaning the same feature is captured multiple times.
    
    - If a stride size is `n` it means that our kernel is moving `n` pixels both horizontally and vertically in the data.

    #### Stride: How it affects CNN models?
    1) Dimensionality Reduction:
        - Dimensionality reduction, as the term implies is a process of reducing the number of random variables under consideration in the input data. It's basically reducing the size (dimension) of the input image.
        
        Relationship between dimension and stride:
        - stride ∝ 1/dimensions

    2) Relationship between stride and computation time:
    - stride ∝ computation time 
        - Since, kernel needs to be applied less times if the stride is bigger, it reduces the computational time and increases the computation efficiency.

    3) Model Capacity
        - If our stride is too large, it results in the model losing detailed information since the filter won't cover every single pixel, which could result in a model losing it's accuracy! 

    
    * Let's look at an example to understand stride a bit better.
        - Output Size = (Input Size - Filter Size / Stride) + 1 (Size can be replaced with either height or weight as per need, this is the general equation / formula.)

        ![image-2.png](attachment:image-2.png)

        In this image, the kernel of 2x2 is being applied, let us consider it is being applied only horizontally (width) with a size of 2, the resulting feature map can be calculated as:

        Output size :

        = (7-2/2)+1

        = (5/2)+1

        = 2.5+1 

        Rounding off to nearest integer as a size of 2.5 is not possible, both 2 and 3 are correct in this case, as the output size depends on various use-cases. It's conventional to round down to the nearest integer. Furthermore, the rounding down is done by something called `Floor Function` in CNNs.
                
        = 3


5) Padding

6) Pooling/Subsampling

7) Fully Connected Layer

8) Flattening

9) Activation Functions

10) Epoch

11) Batch Size

12) Loss Function

13) Optimizer

14) Feature Map

15) Transfer Learning

16) Data Augmentation

18) Learning Rate


## Parts of a CNN
A typical Convolutional Neural Network consists of various layers, each of which is responsible for a specific task. The most common layers in CNNs are:




