# Convolutional Neural Networks
---

###  Computer Vision 
* The main architecture for learning from computer vision applications is **convolutional neural networks**
    * The main problem right now is computational cost: images can be very big/detailed which means lots of pixels which means lots of computation
---

###  Edge Detection Example 

* motivating example for understanding convolution operation 
* filter/kernel - object which an image will be convolved by 
    * in example, it's a column of 1s then 0s, then -1s 
* convolution process:
    * each cell in a filter/kernel is multipled by a cell in the image until all the filter's cells have been used
    * add up all the multiplications and put that value in an output matrix
    * stride based on some predetermined stride value
    * repeat
* Ng illustrates why the filter used in the example is a vertical edge detector 
---

### More Edge Detection 
* Ng continues to illustrate vertical edge detectors and then a horizontal edge detector using different filters 
* There's a debate in the literature as to what are the best numbers for filters to use
    * Sobel filter - more weight for central region 
    * Scharr filter 
* for deep learning, maybe you don't have to hand pick the 9 numbers, but maybe **learn** them by treating each cell in the filter as a parameter to tune 
----


### Padding 
* Problems that padding can alleviate:
   * shrinking output: $(n - f + 1)$ output image size 
   * throwing out a lot of information from the edges of the image 
* put a border, a padding around an image to perform padding 
    * thus, output image size with padding is $(n + 2p - f + 1)$
* valid convolution - no padding i.e. output size is $n - f + 1$ 
* same convolution - output size is the same as the input size s.t. $n + 2p - f + 1 = n$
* $f$ is usually odd, and sometimes the literature puts importance to the center of the filter/kernel 
---


### Strided Convolutions
* stride - how many positions to move between each convolution
* output size will then be $\lfloor\dfrac{n + 2p -f}{s} + 1\rfloor$ by $\lfloor\dfrac{n + 2p -f}{s} + 1\rfloor$
* Ng mentions how AI convolution is defined a little differently from math convolution 
    * more precise to call AI convolution as the 'cross correlation' operation
---
    

### Convolutions Over Volume 
* convolution of volumes involves having filters/kernels with size (f_height, f_width, num_channels_in_input)
    * Ng has a convolution filter example of (3 x 3 x 3)
* The 3D filter/kernel is basically a 2D filter for each channel.
    * Each 2D filter can have different numbers associated with it to find something you want from an image 
* To create a 3D output volume, you can have 2 or more convolutions on the same input and stack them up 
    * Ng illustrates that 2 convolutions of size (3x3x3) on a (6x6x3) image results in (4x4x2) output volume 
----
    
### One Layer of a Convolutional Network
* One layer of a convolutional network is as follow: 
    * input image (convolved_by) filters $=$ convolved_matrix $\rightarrow$ RELU(convolved_matrix + $b$) = output_matrix  
* the parameters to tune are each cell in a filter. Ng gives an example where there are in image is convolved by 10 (3x3x3) filters which means 27 + 1 = 28 parameters for each filter x 10 filters = 280 parameters in total. 
    * One **really** good property of CNNs is that no matter how big the input size is, the parameters to tune stay constant. In this case, 280 parameters no matter if the image size is (3x3x3) or (1000x1000x3)


* Notation of CNNs 
    * If layer $l$ is a convolution layer:
       * $f^{[l]}$ = filter size 
       * $p^{[l]}$ = padding 
       * $s^{[l]}$ = stride 
       * $n^{[l]}_c$ = number of filters
       * Input size: $n^{[l-1]}_H \times n^{[l-1]}_W \times n^{[l-1]}_C$
       * convolved_matrix size : $n^{[l]}_H \times n^{[l]}_W \times n^{[l]}_C$
           * $n^{[l]}_H = \Bigl\lfloor \dfrac{n^{[l-1]}_H + 2p - f^{[l]}}{s^{[l]}} + 1 \Bigr\rfloor$
           * $n^{[l]}_W = \Bigl\lfloor \dfrac{n^{[l-1]}_W + 2p - f^{[l]}}{s^{[l]}} + 1 \Bigr\rfloor$
           * $n^{[l]}_c$ is described above 
       * Each filter is $f^{[l]} \times f^{[l]} \times n^{[l-1]}_C$
       * Activation size $a^{[l]} = n^{[l]}_H \times n^{[l]}_W \times n^{[l]}_C = $ convolved_matrix size
            * Thus, if you have $m$ examples, the size of the whole set would be $A^{[l]} = m \times n^{[l]}_H \times n^{[l]}_W \times n^{[l]}_C$
       * Weights size is $f^{[l]} \times f^{[l]} \times n^{[l-1]}_C \times n^{[l]}_C$
       * bias size is $1 \times 1 \times 1 \times n^{[l]}$

---


### Simple Convolutional Network Example 
* Ng walks through a convolutional network:
    * input $\rightarrow$ (3X3)CONV $\rightarrow$ (5x5)CONV $\rightarrow$ (5X5)CONV $\rightarrow$ flatten to a vector $\rightarrow$ logistic/softmax function $\rightarrow \hat{y}$
    * general trend is that height and width gradually goes down when flowing through a network and number of channels goes up 
* Types of layers in a convolutional network:
    * Convolution
    * Pooling 
    * Fully connected 
----
### Pooling Layers 
* pooling reduces the size of the representation as well as makes some of the features detected more robust 
* Ng illustrates MAXPOOLing process 
    * Take some MAXPOOL filter/window on an image, and take the max value of all the cells in that filter, and put it in the maxpooled_matrix 
* preserves features that have been found in some portion of a matrix/image 
* no parameters to learn!
* Ng illustrates an example
* For 3D images, do max pooling independently on each $n_c$ channels 
* AVGPOOLing also exists, but isn't used as much 
---


### Convolutional Neural Network Example 
* Ng gives an example of a CNN
* 2 conventions
    * treat CONV1 and POOL1 as in the same layer
    * treat CONV1 and POOL1 as each their own layer 