# Intro to CNNs

![rainbow](https://github.com/ancilcleetus/My-Learning-Journey/assets/25684256/839c3524-2a1d-4779-85a0-83c562e1e5e5)

# 📖 TABLE OF CONTENTS

- [1. Convolutional Neural Networks Animation]()
- [2. Evolution of CNNs]()
- [3. Basics of Convolution Operation]()
- [4. Convolution Operation with Padding and Stride]()
- [5. Calculations for image size after convolution operation]()
- [6. Convolution Layer]()
- [7. No of Trainable Parameters]()
  - [No of Trainable Parameters - CNNs Vs ANNs]()
  - [No of Trainable Parameters in a CNN]()
- [8. A Sample CNN]()
- [9. Some well known CNN Models]()
  - [1. LeNet-5]()
  - [2. AlexNet]()
  - [3. VGG16]()

![rainbow](https://github.com/ancilcleetus/My-Learning-Journey/assets/25684256/839c3524-2a1d-4779-85a0-83c562e1e5e5)

# 1. Convolutional Neural Networks Animation

In [None]:
!pip install -q mediapy

In [None]:
# CNN Animation

import mediapy
cnn_gif = mediapy.read_video("data/images/CV_01_Intro_to_CNNs-01-CNN.gif")
mediapy.show_video(cnn_gif)

0
This browser does not support the video tag.


![rainbow](https://github.com/ancilcleetus/My-Learning-Journey/assets/25684256/839c3524-2a1d-4779-85a0-83c562e1e5e5)

# 2. Evolution of CNNs

Even though Traditional ML Algorithms performed well on tabular data, their performance was poor on Image Processing tasks. 2D Images were flattened to 1D vector before input to Traditional ML Algorithms. Since images represent information using local features (where pixels that form local relationships represent a particular feature of the image), flattening 2D images to 1D vector does not properly extract the information from the image.

Research on visual cortex of cats led to the discovery of CNNs (Convolutional Neural Nets). In CNNs, the 2D image is passed to multiple Convolutional layers. Each Convolutional layer has multiple kernels / filters. These kernels capture different features from the image such as edges, shapes, depth, color etc. After passing through multiple layers and extracting enough information to do an Image Processing task such as Image Classification, we have a set of feature maps. All of these 2D feature maps are flattened to create a 1D vector called an image embedding.

Initially, scientists used traditional ML models to classify these image embeddings. But, accuracy was low. Then, they tried Multilayer Perceptrons and obtained really good results. Thus, a fully connnected neural network using Multilayer Perceptrons need to be trained on these image embeddings to perform Image Classification.

![rainbow](https://github.com/ancilcleetus/My-Learning-Journey/assets/25684256/839c3524-2a1d-4779-85a0-83c562e1e5e5)

# 3. Basics of Convolution Operation

In [None]:
# Convolution Operation: Step 1

from IPython import display
display.Image("data/images/CV_01_Intro_to_CNNs-02-Convolution-01.jpg")

<IPython.core.display.Image object>

In [None]:
# Convolution Operation: Step 2

from IPython import display
display.Image("data/images/CV_01_Intro_to_CNNs-02-Convolution-02.jpg")

<IPython.core.display.Image object>

In [None]:
# Convolution Operation Animation

import mediapy
conv_gif_01 = mediapy.read_video("data/images/CV_01_Intro_to_CNNs-02-Convolution-03.gif")
mediapy.show_video(conv_gif_01)

0
This browser does not support the video tag.


![rainbow](https://github.com/ancilcleetus/My-Learning-Journey/assets/25684256/839c3524-2a1d-4779-85a0-83c562e1e5e5)

# 4. Convolution Operation with Padding and Stride
1. Padding: Preserves spatial resolution of image over multiple layers
2. Stride: Reduces the number of operations in the convolution operation => Reduction of Computational Complexity

In [None]:
# Convolution Operation with Padding and Stride Animation

import mediapy
conv_gif_02 = mediapy.read_video("data/images/CV_01_Intro_to_CNNs-02-Convolution-04-with-Padding-and-Stride.gif")
mediapy.show_video(conv_gif_02)

0
This browser does not support the video tag.


![rainbow](https://github.com/ancilcleetus/My-Learning-Journey/assets/25684256/839c3524-2a1d-4779-85a0-83c562e1e5e5)

# 5. Calculations for image size after convolution operation

Assuming input image size as $n \times n$, kernel size as $k \times k$, padding $p$ and stride $s$, we have the following equations for the image size after convolution operation:

- No padding ($p = 0$), stride $s = 1$ $\implies$ $(n - k + 1) \times (n - k + 1)$
- Padding = $p$, stride $s = 1$ $\implies$ $(n + 2p - k + 1) \times (n + 2p - k + 1)$
- No padding ($p = 0$), stride = $s$ $\implies$ $(\frac{n - k}{s} + 1) \times (\frac{n - k}{s} + 1)$
- Padding = $p$, stride = $s$ $\implies$ $(\frac{n + 2p - k}{s} + 1) \times (\frac{n + 2p - k}{s} + 1)$

---

**Note:** If the expression $\frac{n + 2p - k}{s}$ results in a fractional number, you typically have two common options for handling this situation in Convolutional Neural Networks:
1. Floor Division: Round down the result of the division to the nearest integer.
2. Ceiling Division: Round up the result of the division to the nearest integer.

The choice between these two options depends on the specific implementation and the desired behavior. In most deep learning frameworks, such as PyTorch or TensorFlow, the default behavior is floor division. However, you might find cases where ceiling division is used depending on the specific needs of the model or the layer.

By default, image size after convolution operation = $(\left\lfloor \frac{n + 2p - k}{s} \right\rfloor  + 1) \times (\left\lfloor \frac{n + 2p - k}{s} \right\rfloor  + 1)$

![rainbow](https://github.com/ancilcleetus/My-Learning-Journey/assets/25684256/839c3524-2a1d-4779-85a0-83c562e1e5e5)

# 6. Convolution Layer

In [None]:
# Convolution Operation with RGB input (Three Channels)

from IPython import display
display.Image("data/images/CV_01_Intro_to_CNNs-02-Convolution-05-Three-Channels.jpg")

<IPython.core.display.Image object>

In [None]:
# Convolution Operation with multiple filters

from IPython import display
display.Image("data/images/CV_01_Intro_to_CNNs-02-Convolution-06-Multiple-Filters.jpg")

<IPython.core.display.Image object>

In [None]:
# Max Pooling and Average Pooling

import mediapy
conv_gif_03 = mediapy.read_video("data/images/CV_01_Intro_to_CNNs-02-Convolution-07-Max-and-Avg-Pooling.gif")
mediapy.show_video(conv_gif_03)

0
This browser does not support the video tag.


In [None]:
# Convolution Layer

from IPython import display
display.Image("data/images/CV_01_Intro_to_CNNs-02-Convolution-08-Convolution-Layer.jpg")

<IPython.core.display.Image object>

![rainbow](https://github.com/ancilcleetus/My-Learning-Journey/assets/25684256/839c3524-2a1d-4779-85a0-83c562e1e5e5)

# 7. No of Trainable Parameters

## No of Trainable Parameters - CNNs Vs ANNs

1. **No of Trainable Parameters in CNNs:**
    - No of Trainable Parameters = $(F \times F \times D + 1) \times K$ where:
        - F $\implies$ Filter Size (3 for a 3x3 filter)
        - D $\implies$ Input Depth (No of channels in input)
        - K $\implies$ No of filters
    
    - **Example 1**
        - Input image $\implies$ (64x64x3)
        - Filters $\implies$ 50 (3x3) filters
        - No of Trainable Parameters = $(3 \times 3 \times 3 \times 50) + 50 = 1350 + 50 = 1400$

    - **Example 2**
        - Input image $\implies$ (1080x1080x3)
        - Filters $\implies$ 50 (3x3) filters
        - No of Trainable Parameters = $(3 \times 3 \times 3 \times 50) + 50 = 1350 + 50 = 1400$

1. **No of Trainable Parameters in ANNs:**

    In the case of ANNs (i.e. fully connected layers), in the first layer with 50 neurons, we have:

    - Input image (64x64x3) $\implies$ No of Trainable Parameters = $(64 \times 64 \times 3 \times 50) + 50 = 614400 + 50 = 614450 \approx 61k$
    
    - Input image (1080x1080x3) $\implies$ No of Trainable Parameters = $(1080 \times 1080 \times 3 \times 50) + 50 = 174960000 + 50 = 174960050 \approx 175M$

**Note**

**Whatever be the size of the image,** the no of trainable parameters depends only on the architecture of the CNN. Hence, **the no of trainable parameters in a CNN will be the same for the same architecture.**

## No of Trainable Parameters in a CNN

Let's break down the parameter calculations for each layer of a typical CNN for MNIST Digits Classification step by step:

1. **Input Layer (28x28 grayscale image):**
- The input to the network is a grayscale image of size 28x28, which means the input dimensions are:
    - Width: 28
    - Height: 28
    - Depth (number of channels for grayscale): 1

- There are no parameters associated with the input layer because it simply provides data to the network.

2. **1st Convolutional Layer (64 filters of 3x3):**
- Input to this layer: 28x28x1 (height, width, depth)
- Number of filters: 64
- Filter size, F: 3x3
- Stride, S: Assuming the default stride is 1.
- Padding, P: Assuming 'valid' padding (no padding added, i.e., output dimensions shrink).

- Output width/height = $\frac {(W + 2P - F)}{S} + 1 = \frac {28 + (2 \times 0) - 3}{1} + 1 = 26$
3. **1st Max Pooling Layer (2x2 pooling):**
4. **2nd Convolutional Layer (64 filters of 3x3):**
5. **2nd Max Pooling Layer (2x2 pooling):**
6. **Flatten Layer:**
7. **1st Dense Layer (128 nodes):**
8. **2nd Dense Layer (10 nodes):**

- **Convolutional Layer**

Input to this layer: 28x28x1 (height, width, depth)

Number of filters: 64

Filter size: 3x3

We can verify the no of trainable parameters by building the above CNN model:

In [None]:
from keras import Sequential
from keras.layers import Conv2D, MaxPooling2D, Flatten, Dense

In [None]:
model = Sequential()

model.add(Conv2D(64, (3, 3), activation='relu', input_shape=(28, 28, 1)))
model.add(MaxPooling2D((2, 2)))

model.add(Conv2D(64, (3, 3), activation='relu'))
model.add(MaxPooling2D((2, 2)))

model.add(Flatten())

model.add(Dense(128, activation='relu'))
model.add(Dense(10, activation='softmax'))

In [None]:
model.summary()

![rainbow](https://github.com/ancilcleetus/My-Learning-Journey/assets/25684256/839c3524-2a1d-4779-85a0-83c562e1e5e5)

# 8. A Sample CNN

In [None]:
# A sample CNN

from IPython import display
display.Image("data/images/CV_01_Intro_to_CNNs-02-Convolution-09-Sample-CNN.jpg")

<IPython.core.display.Image object>

![rainbow](https://github.com/ancilcleetus/My-Learning-Journey/assets/25684256/839c3524-2a1d-4779-85a0-83c562e1e5e5)

# 9. Some well known CNN Models

## 1. LeNet-5

In [None]:
# LeNet-5

from IPython import display
display.Image("data/images/CV_01_Intro_to_CNNs-02-Convolution-10-LeNet5.jpg")

<IPython.core.display.Image object>

![rainbow](https://github.com/ancilcleetus/My-Learning-Journey/assets/25684256/839c3524-2a1d-4779-85a0-83c562e1e5e5)

## 2. AlexNet

In [None]:
# AlexNet

from IPython import display
display.Image("data/images/CV_01_Intro_to_CNNs-02-Convolution-11-AlexNet.jpg")

<IPython.core.display.Image object>

![rainbow](https://github.com/ancilcleetus/My-Learning-Journey/assets/25684256/839c3524-2a1d-4779-85a0-83c562e1e5e5)

## 3. VGG16

In [None]:
# VGG16

from IPython import display
display.Image("data/images/CV_01_Intro_to_CNNs-02-Convolution-12-VGG16.jpg")

<IPython.core.display.Image object>

![rainbow](https://github.com/ancilcleetus/My-Learning-Journey/assets/25684256/839c3524-2a1d-4779-85a0-83c562e1e5e5)