# Convolution

>**Small tangent about language:**
- **Convoluted:** Excessively complicated and difficult to understand, like a bad sci-fi movie plot.
- **Convolved:** Past tense of to convolve, which is a mathematical operation used in signal and image processing.

![Convolution in 1D](./images/image-1.png)

![Convolution in 2D](./images/image-2.png) ![Convolution in 2D](./images/image-3.png)

![Convolution in 3D](./images/image-4.png)

![With Bias Term](./images/image-5.png)

![With Bias Term](./images/image-6.png)

>**Additional Notes About Convolution**
- Edges always cause difficulties. We deal with this through *padding*.
- Image convolution in DL also involves downsampling. This is done via *stride* and *pooling*.
- N kernels produces an N-layer result. These layers are called "channels" but they are features not RGB.
- Use odd kernel sizes (3, 5, 7 etc.) to have an exact center.
- Formally, CNNs implement cross-correlation, not convolution. But it doesn't actually matter because the kernels are empirically learned.

## Feature Maps & Convolution Kernels

![Feature Maps](./images/image-7.png)

>**Kernels concepts:**
- Kernels are filters that extract features from an image. The same kernel applied to different images will give different feature maps.
- Kernels are generally small (3x3, 5x5, 7x7).
- In DL, kernels begin random and are learned through gradient descent. After learning, kernels are the same for all images. Using pre-trained kernels is called "transfer learning".
- Kernels are not used to classify or make decisions; they are used to extract features. Those features are used for classification.

![Channels Dimensions](./images/image-8.png)

>**Code:**
- [Part 1 - Convolution in code](https://github.com/Sayan-Roy-729/Data-Science/blob/main/Deep%20Learning/Using%20Pytorch/Part%2014%20-%20Convolution/Part%201%20-%20Convolution%20in%20code.ipynb)

## Convolution Parameters (stride, padding)

![Convolution Padding](./images/image-9.png)

>**Convolution & Padding:**
- Padding is used to increase the size of the result of convolution, and match with the previous layer (or image).
- Padding involves inserting 1+ rows and columns.
- Added rows/columns are symmetric!
- Padded numbers are usually zeros. It's also possible to wrap the image from top-to-bottom (circular convolution).

![Stride](./images/image-10.png)

>**Convolution and Stride**
- Stride is used to decrease the size of the result of convolution. It is a mechanism of downsampling, and reduces the number of parameters in a CNN.
- The stride parameter (should have been called skip IMHO) is an integer. Stride=1 gives the full result.
- Stride is usually the same for rows and columns, but can be different when warranted.

![Padding and Stride Formula](./images/image-11.png)

>**Code:**
- [Part 2 - The Conv2 Class in PyTorch](https://github.com/Sayan-Roy-729/Data-Science/blob/main/Deep%20Learning/Using%20Pytorch/Part%2014%20-%20Convolution/Part%202%20-%20The%20Conv2%20Class%20in%20PyTorch.ipynb)
- [Part 3 - CodeChallenge Choose the Parameters](https://github.com/Sayan-Roy-729/Data-Science/blob/main/Deep%20Learning/Using%20Pytorch/Part%2014%20-%20Convolution/Part%203%20-%20CodeChallenge%20Choose%20the%20Parameters.ipynb)

## Transpose Convolution

![Transpose Convolution](./images/image-12.png)
![Transpose Convolution](./images/image-13.png)

>**What transpose convolution is:**
- Transpose convolution means to scalar-multipy a kernel by each pixel in an image.
- As long as the kernel is >1 pixel, the result will be higher resolution than the original image.
- Transpose convolution is used for autoencoders and super-resolution CNNs.
- Transpose convolution takes the same parameters as "forward" convolution: kernel size, padding, stride.

$$N_h = s_h(M_h - 1) + k - 2p$$

where,
- $N_h$ = Number of pixels in output image.
- $H_h$ = Number of pixels in input image.
- $p$ = Padding
- $k$ = Number of pixels in kernel (height)
- $s_h$ = Stride

>**Code:**
- [Part 4 - Transpose convolution](https://github.com/Sayan-Roy-729/Data-Science/blob/main/Deep%20Learning/Using%20Pytorch/Part%2014%20-%20Convolution/Part%204%20-%20Transpose%20convolution.ipynb)

## Max/Mean Pooling

![Mean Pooling](./images/image-14.png)

![Max Pooling](./images/image-15.png)

>**Why use a pooling layer?**
- Reduces dimensionality (fewer parameters)
- Selects for features over a broader spatial area (increased "receptive field" size)
- Deeper into the model, we want more channels with fewer pixels. This makes the representations increasingly abstract.

>**Max or Mean Pooling?**
- *Max Pooling*: Highlishts sharp features. Useful for sparse data and increasing contrast.
- *Mean Pooling*: Smooths images (it's a low-pass filter). Useful for noisy data and to reduce the impact of outliers on learning.

![What are receptive fields?](./images/image-16.png)

>**What about deep ANNs?**
- Sufficiently deep ANNs will also work "just as well" (because of the universal approximation theorem).
- But they will be much more complex, have many more parameters and will be much harder to train.
- CNNs are a more efficient architecture for certain kinds of problems, namely image categorization.

![Model](./images/image-17.png)

>**Parameters of Pooling:**
- *Spatial extent ("kernel size")*: The number of pixels in the pooling window. Typically set to 2 (actuall 2x2)
- *Stride*: The number of pixels to skip for each window. Typically set to 2 (produces no overlap).
- Can also use (3, 3). Setting stride < kernel creates overlapping windows, which is less common.

>**Code:**
- [Part 5 - Max Mean pooling](https://github.com/Sayan-Roy-729/Data-Science/blob/main/Deep%20Learning/Using%20Pytorch/Part%2014%20-%20Convolution/Part%205%20-%20Max%20Mean%20pooling.ipynb)

## To Pool or To Stride?

![ Pool vs Stride](./images/image-18.png)

| Pooling | Stride |
| :--: | :--: |
| Computationally fast | Somewhat slower |
| No parameters | Learned parameters |
| Kernel spans a smaller area (smaller receptive fields) | Kernel spans a larger area (larger receptive fields) |
| Highly stable | Can be unstable in complex architectures |

>**Conclusion:** Pooling is historical. No one really knows when to use which. Try both!

## Image Transform

>**Twpo reasons to transform images:**
- Pre-trained CNNs are coded for certain image sizes. You might need to resize your images to work, or convert to grayscale.
- Transforming images changes raw pixel values without changing the image information. Transforms are thus a way to increase the total amount of data.

>**Code:**
- [Part 6 - Image Transformation](https://github.com/Sayan-Roy-729/Data-Science/blob/main/Deep%20Learning/Using%20Pytorch/Part%2014%20-%20Convolution/Part%206%20-%20Image%20Transformation.ipynb)

## Creating & Using Custom Datasets

![Data Loader](./images/image-19.png)

>**Order of operations when applying transformation:**
1. Inport the data
2. Create a custom DataSet class
3. Define the transformations
4. Create a DatasSet with your data and transformation.
5. Create a DataLoader (same as usual).

*Step 1 and Step 2 will be combined if import a `torchvision` dataset that already allows added transformations*

>**Code:**
- [Part 7 - Creating and using custom DataSets](https://github.com/Sayan-Roy-729/Data-Science/blob/main/Deep%20Learning/Using%20Pytorch/Part%2014%20-%20Convolution/Part%207%20-%20Creating%20and%20using%20custom%20DataSets.ipynb)