<a href="https://colab.research.google.com/github/aaronhowellai/machine-learning-projects/blob/main/machine%20learning%20algorithms/theory/deep%20learning/computer%20vision/Understanding%20CNNs.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Undestanding CNNs** 🕸️

## **Part of a 3-Part Series:** Convolutional Neural Networks, Part Two
* An introduction to CNNs, following Pierian Training's online Udemy course "PyTorch for Deep Learning with Python" bootcamp.

* As part of an introduction to CNNs for image processing and multi-class classification, I explore:
  * Image Filters & Kernels
  * Convolutional Layers
  * Pooling Layers

## **What is Computer Vision?**
* Computer Vision is a general term for the use of computer programs to process image data, and a prominent field in modern artificial intelligence, for tasks such as image classification, object detection and biometric verification.
  * [Wikipedia](https://en.wikipedia.org/wiki/Computer_vision)

## **What are Convolutional Neural Networks?**
* A CNN is a Neural Network Architecture that are extremely effective at processing with image data.
#### ***Below is an example of a CNN processing an image being classified to a label from 1000 different labels with a probability score as an output:***

![image description](https://raw.githubusercontent.com/aaronhowellai/machine-learning-projects/main/machine%20learning%20concepts/cnn%20architecture.png)

#### ***Below is an example of a the AlexNet CNN Architecture:***

![image description](https://raw.githubusercontent.com/aaronhowellai/machine-learning-projects/main/machine%20learning%20concepts/alexnet%20architecture.png)

* [Link](https://github.com/aaronhowellai/machine-learning-projects/blob/main/machine%20learning%20concepts/cnn%20architecture.png) to image on my "/machine learning concepts" GitHub page
* [Link](https://github.com/aaronhowellai/machine-learning-projects/tree/main/machine%20learning%20concepts) to other Machine Learning concepts



-------

# **Image Filters & Kernels** (Computer Vision)

* Filters are essentially an **image kernel**, which is a small matrix applied to an entire image
  * A blur filter is a popular filter that is often used in computer graphics programs and software such as Adobe Photoshop.

  $$
  \begin{pmatrix}
  0.0625 & 0.125 & 0.0625 \\
  0.125 & 0.25 & 0.125 \\
  0.0625 & 0.125 & 0.0625
  \end{pmatrix}
  $$
* When a filter (image kernel) matrix is applied to a pixel image, the pixel values are multiplied by the weights of the image kernel.
  * CNNs choose the weights automatically
* The weights are then summed from the entire matrix

### **Convolutions**
![image description](https://raw.githubusercontent.com/aaronhowellai/machine-learning-projects/main/machine%20learning%20concepts/convolutions%2C%20convolution%20kernels%2C%20computer%20vision.png)

* In the context of CNNs, **image filters** are referred to as **convolutional kernels**
* The process of passing them over an image is known as **convolution**.
* Padding allows users to retain borders by adding an extra layer around the original pixels, preserving the image size and data.



# **Convolutional Layer**
* **Objective:**
  * Understand the architecture of a CNN that allows the network to select the optimal weights for the convolutional kernel in the **convolutional layer**.

* In [Part One](https://github.com/aaronhowellai/machine-learning-projects/blob/main/machine%20learning%20algorithms/ANN%20Image%20Classifier%20with%20MNIST.ipynb) of this **3-part series** of notebooks on CNNs:
  * I used an ANN to classify handwritten digits with the MNIST Dataset, resulting in >100k parameters for tiny 28x28 pixel images.
    * High Definition images at this scale would demand in the 10's of millions of parameters and extremely long training times, which is not as efficient as CNNs.
  * All the 2D (matrix) information is lost by flattening the image data into a 1D array.
  * ANNs will only work well for image classification of very similar, well centred images.

## **How can CNNs avoid restrictions on model performance?**
* A CNN can use **convolutional layers** to avoid restrictions and bottlenecks on model performance

## **When are convolutional layers created?**
* A convolutional layer is created when multiple convolutional kernels are applied to the input image matrices
  * The layer will then be trained to compute the best kernel weight values

## **How do CNNs reduce the total number of parameters?**

![image description](https://raw.githubusercontent.com/aaronhowellai/machine-learning-projects/main/machine%20learning%20concepts/local%20connectivity%2C%20reducing%20parameters%20with%20cnns.png)

* CNNs reduce the number of total parameters by focusing on **local connectivity**
* Not all neurons in the network get fully connected.
  * Instead, neurons are only connected to a subset of local neurons in the next layer, which become the filters.

* There can be multiple filters, and the weights are computed by the network.

## **Convolutional Layers for 3D Tensors**
* Colour images can be thought of as 3D Tensors consisting of values between 0-255 (256 values) in Red, Green and Blue channels.
* Colour photos can be split into 3 dimensions:
  * Height (720)
  * Width (1080)
  * Colour Channels (3) (RGB)

* `image.shape()`
  * `(720,1280,3)`

# **Pooling Layers** (aka Downsampling Layers)
**What else apart from local connectivity can help reduce the number of parameters in a Convolutional Neural Network?**
* Even when dealing with local connectivity, when dealing with color images and possibly 10s or 100s of filters, a CNN can contain a huge number of parameters.
  * A dimensionality reduction technique in **Pooling layers** are utilised to perform downsampling operations on convolutional layers and their kernels.

**How are convolutional kernel sizes reduced in pooling layers?**
* Using subsampling.
  * **Max pooling** defines the window size and stride length of a sampling window, and takes the max values within each of them.
* `[[0,0,0,0],`
* `[0,0,0,0],`
* `[0,0,0,0],`
* `[0,0,0,0]]`
* For a convolutional kernel with 16 parameters like what is shown above, you could use:
  * **Max Pooling:**
    * Window: 2x2
    * Stride: 2
  * This would downsample it to the square root of the original parameter size (4).

* Some information is lost, but the most important parts are still kept in the subsampling.

* **Other types of pooling:**
  * Average Pooling (Common)
  * Global Pooling
  * Sum Pooling
  * Mixed Pooling
  * LP Pooling
  * Multi-scale Order-less Pooling (MOP)
  * Super-pixel Pooling
  * Compact Bilinear Pooling

**What is another common technique to help reduce training times of CNNs?**
* Another common technique used for training efficiency is called "Dropout".

  * Dropout can be thought of as a form of regularization to help prevent overfitting.
  * During training, the user can specify a percentage value for units to be  randomly dropped (turned off), along with their connections.


# **What are some famous CNN architectures?**
* LeNet-5 by Yann LeCun
* AlexNet by Alex Krizhevsky et al.
* GoogLeNet by Szegedy @ Google Research
* ResNet by Kaiming He et al.