# **PYTORCH CV - CONVULUTIONAL NEURAL NETS**

### What is Convolution?

Convolution is a mathematical operation used to **combine two functions (or data) to produce a third function** that shows how one function modifies the other.

In image processing, convolution is commonly used to extract features (like edges, textures, shapes, or patterns) by applying a filter (a.k.a kernel) to the image.

* ***For example:*** *A 3x3 filter (e.g., edge detection filter or kernel) slides over the image (matrix data) and multiplies the filter values with the image pixel values (matrix values), summing them up to create a new value. This is repeated across the entire image (matrix data) to create a **"feature map"** of the image (matrix data).*


Let's see a detailed calculation animation of convolution operation:

<img src="../resources/Convolution_Operation.gif" width=50%></img>
*A 6x6 matrix, representing an image data matrix, is being convolved with a 3x3 filter (kernel) using a step size of 3. This means that during each iteration, the filter shifts 3 steps horizontally and vertically across the matrix. Typically, the step size is set to 1, allowing the filter to move only one step at a time.*

</br>

<font color="salmon"><b><i>⭑Note: A 6x6 matrix convolved with a 3x3 filter can produce outputs of different dimensions depending on the step size (stride). If the step size is 1, the output will be a 4x4 matrix. With a step size of 2, the output will be a 2x2 matrix. A step size greater than 2 will not allow the 3x3 filter to fit within the 6x6 matrix, making convolution impossible.</i></b></font>

### How Do Convolutional Neural Networks (CNNs) Work?

CNNs are specialized neural networks designed for processing structured data like images. They work in the following steps:
1. **Convolution Layer:**
    * Filters (small matrices) slide over the input image and extract features like edges, corners, or textures. This creates feature maps.
    * Example: A 3x3 edge-detection filter highlights edges in an image.


2. **ReLU Activation:**
    * Applies a non-linear function $ReLU = max(0, x)$ to the feature maps to introduce non-linearity and remove negative values.


3. **Pooling (Downsampling):**
    * Reduces the size of feature maps while retaining important features (e.g., MaxPooling takes the maximum value in a region).
    * Example: A 2x2 pooling layer reduces the size of a 4x4 feature map to 2x2.
    * Types of Pooling:
        - $Max Pooling (MaxPool)$: Takes the maximum value in each pooling region (e.g., 2x2 or 3x3) and retains the most prominent (strongest) feature in the region.
        - $Average Pooling (AvgPool)$: Calculates the average value of all elements in the pooling region and preserves smoother feature representations by averaging.
        - $Min Pooling (MinPool)$: Takes the minimum value in each pooling region and captures subtle or weak features by highlighting the smallest value.


4. **Fully Connected Layer:**
    * Flattened feature maps are passed to traditional dense layers for final classification or regression tasks.


5. **Softmax/Output Layer:**
    * Converts final outputs into probabilities or predictions.

<h3 align="center"> What Makes CNN Special? </h3>

##### Automatic Feature Extraction
CNNs learn important features (edges, patterns) from data automatically, unlike traditional models where features are manually engineered.
##### Translation Invariance
CNNs recognize patterns regardless of their location in the input (e.g., a face in an image can be detected whether it's in the center or corner).
##### Parameter Efficiency
CNNs use fewer parameters due to shared weights in filters, making them computationally efficient.
##### Great for Images and Spatial Data
CNNs excel at capturing spatial hierarchies (edges → shapes → objects) in data like images, videos, or even audio spectrograms.