# CNN

## The caninical CNN architecture

>**Three types of layers in a CNN**

| Type | Purpose |
| :--: | :-- |
| Convolution | Learn filters (kernels) to create feature maps |
| Pooling | Reduce dimensionality and increase receptive field size |
| Fully connected | Prediction (categorical and/or continuous) |

![CNN Architeture](https://i0.wp.com/thecleverprogrammer.com/wp-content/uploads/2020/11/1-cnnlayer.png?resize=1024%2C259&ssl=1)

>**With increasing depth:**
- Image resolution (number of pixels) decreases
- Representation resolution (number of filters) increases
- Layers are smaller but wider

## CNN to classify MNIST digits

![MNIST Our Architecture](./images/image-1.png)

>**Code:**
- Part 1 - CNN to classify MNIST digits
- Part 2 - Shifted MNIST

## Classify Gaussian Blurs

![2D Gaussians](./images/image-2.png)

![Architecture](./images/image-3.png)

>**Code:**
- Part 3 - Classify Gaussian blurs

>**Discussion:**
- The weights (convolution kernels) change during learning but are fixed after learning. They are independent of the data.
- The activation of the layers in each channel (feature maps) are not learned; they are representations of the data. They can look completely different for different images, although the weights are identical.

>**Code:**
- Part 4 - Examine feature map activations

>**Discussion:**
- Reminder 1: Weights are fixed; feature layer activations require data.
- Reminder 2: Convolution layers learn features but don't make decisions; fully-connected layers use those features to make decisions.
- Feature map activations can (should!) be qualitatively and quantitatively inspected. Empty or near-identical maps indicate gratuitous complexity.
- Simple models are "easy" to inspect; larger models become increasingly difficult. Develop the habit of thinking critically about model architecture.

>**Code:**
- Part 5 - CodeChallenge Softcoding
- Part 6 - CodeChallenge How width the Linear Units

## Do autoencoders clean Gaussians?

![Autoencoder and CNN Architecture](./images/image-4.png)

>**Code:**
- Part 7 - Do autoencoders clean Gaussians
- Part 8 - CodeChallenge Autoencoders and occluded Gaussians
- Part 9 - CodeChallenge Custom loss functions

## Discover Gaussian Parameters

![DL vs Statistics](./images/image-6.png)
![DL vs Statistics](./images/image-7.png)

>**Code:**
- Part 10 - Find the Gaussian Parameters
- Part 11 - The EMNIST dataset (letter recognition)

## Dropout in CNN
- Dropout means that units learn to be independent of each other.
- But dependence is the entire purpose of the kernel weights in a convolutional layer!
- Thus, strong dropout (p=0.5) in convolutional layers might prevent learning spatial dependencies.
- Some dropout (p=0.1 to 0.2) regularizes by adding noise.

![Dropout in CNN](./images/image-8.png)

>**Code:**
- Part 12 - codeChallengeBeatThis
- Part 13 - CodeChallenge Varying number of channels

>**How to choose best CNN model?**
- There are so many options for architecture and metaparameters.
- It is literally impossible to evaluate even a reasonably complete set of possible models.
- Some architectures might be good for some datasets and bad for others.
- You can never be 100% confident that any given model (yours or anyone else's) is the best.
- There is no solution :(
- Start from existing models, published or github, kaggle, blogs etc. Try to find models that are similar to what you need.
- Use transfer learning.
- Experiment, tweak, guess... be creative and think big.
- Write down the changes and performances!