# 1. Convolutional Neural Networks

## 1.1 Computer Vision

**CV Problems:**
- Image classification
- Object detection (what & where)
- Neural style transfer


**Challenge:**
- Large images (millions of weights) 
- CNN to fix

## 1.2 Edge Detection

**Convolution Operations:**
- In deep learning, maybe not set the nine numbers in the filters by humans but learn the weights by nueral networks. 
- Thus, nueral networks can learn the low-level features in the image.

![](./imgs/Convolution_schematic.gif)

## 1.3 Padding

**Downsides of Convolutions:**
- Image shrink every time you apply a convolution
    - n * n image by f * f operation --> n-f+1 edge 
- Pixels on the edges are used less than the pixels on the center.

**Padding:**
- Add zeros around the images, so the image size can be preseved after convolutional operations.
    - n + 2p - f + 1, where p is the padding number, indicating how many pixels are added to the boundary. 
    
**Valid and Same Convolutions:**
- Valid: as long as n > p 
- Same: the output image have the same size of the input image

![](./imgs/PAD.png)
*Padding of 2*

## 1.4 Strided Convolution

**Def:**
- Skip a step ***S*** of pixels when using convolutions
- $$ \frac{n + 2p - f}{S} + 1 $$

![](./imgs/summary_con.png)

***This formular is valid for padding as well!***

## 1.5 Convolutions Over Volume

**Def on RGB Images:**
- Change the filter to 3D
- Output is still 1D because of the sum of all combinitions

**Multiple Filters:**
- Do the RGB 3D way for each filters
- Add all the output together at the end   $n'*n'*n_{num_filters}$

![](./imgs/multifilter.png)

## 1.6 One Layer of a Convolutional Network

**Notations:**

If layer $l$ is a current convolution layer, 
- $f^{[l]}$ = filter size 
- $p^{[l]}$ = padding 
- $s^{[l]}$ = stride size 
- $n_{C}^{[l]}$ = number of filters 


Given input image size

$$ n_{H}^{[l-1]} \times n_{W}^{[l-1]} \times n_{C}^{[l-1]} $$

Output size is 

$$ n_{H}^{[l]} \times n_{W}^{[l]} \times n_{C}^{[l]} $$

where

$$ n_{H}^{[l]} = floor ( \frac{n_{H}^{[l-1]} + 2p^{[l]} - f^{[l]}}{s^{[l]}} + 1) $$


In layer $l$, 
- Each fitler is: $f^{[l]} \times f^{[l]} \times n_{C}^{[l-1]}$
- Activations: $a^{[l]} -> n_{H}^{[l]} \times n_{W}^{[l]} \times n_{C}^{[l]} $
- Weights: $ f^{[l]} * f^{[l]} * n_{C}^{[l-1]} * n_{C}^{[l]} $
- Bias: $ n_{C}^{[l]} $


![](./imgs/eg-conv.png)

**LeNet5**

![](./imgs/lenet5.png)

![](./imgs/eg-convparm.png)
*Wrong numbers*

## 1.7 Pooling Layers

**Def:**
- Only two fixed parameters: size (square) & stride
    - No parameters to learn!
- Two types 
    - max : used much more
    - average : in some very deep layers to collect representations effectively    


**Intuitive:**
- No rigid proofs but intuitively, say max pooling, it marks feature locations
- If features detected anywhere in this filter, then keep a high number; if not, the max pooling is also quite small

## 1.8 Why Convolutions?

**Parameter Sharing:**
- A feature detector (e.g. edge) that is useful in the one part of the image is prbabbly useful in another part of the image

**Sparsity of Connections:**
- In each layer, each output value depends only on a small number of inputs.
- In other words, the target pixel only depends on its filtered neighboring pixels

**Translation Invariance:**
- because the same filter is applied to everywhere of an image


----------

# Quiz

**1. Because pooling layers do not have parameters, they do not affect the backpropagation (derivatives) calculation.**

***FALSE***
- The pooling layers do not have any parameters, so there is nothing for back propagation to change about them. So "back propagation does not affect the pooling layers"
- Back propagation must pass backwards through the pooling layers and the way it works depends on the type of pooling. For max pooling, the gradients are applied only to the maximum of the inputs to the pooling layer. For average pooling, the gradients are applied proportionally to all the inputs. So pooling layers do affect back propagation: something happens at the pooling layers during back propagation.

*By Paul Mielke*


**2. Which of the following statements about parameter sharing in ConvNets are true? (Check all that apply.)**
- It allows gradient descent to set many of the parameters to zero, thus making the connections sparse. ***FALSE***
- It reduces the total number of parameters, thus reducing overfitting. ***TRUE***
- It allows parameters learned for one task to be shared even for a different task (transfer learning). ***FALSE***
- It allows a feature detector to be used in multiple locations throughout the whole input image/input volume. ***TRUE***

---

# Assignments

**Implemented forward- and back-prop CNN!**
- How backprop is implemented in convolution, maxpooling, and averagepooling