#### Pacific Warriors Meetup 15-Jan-2020

# **Convolutional Neural Networks**

## What are CNNs?

A very important Deep Learning technique used in Image Classification, Object Detection, Machine Vision <br> 
<br>
Inspired by research done on the Visual Cortex of mammals ([Hubel and Wiesel Cat experiment 1959](https://www.youtube.com/watch?v=IOHayh06LJ4))  <br> 
<br>
Visual Cortex consists of a layered architecture. Each layer consists of groups of neurons designed specifically to recognize different shapes. <br> 
<br>
<img center src='images/cnn-human-vision-analogy.jpeg?raw=1' height=50% width=75% align="center"/>

# Architecture

**Typical Layers**

<br>
Convolutional Layer + Activation Function

Pooling Layer

Fully Connected Layer

<br>

<img src='images/cnn-arch-2.png?raw=1' height=50% width=75%/>

# Convolutional Layer + Activation function 

##Goal 
Feature Extraction, Find Spatial Features <br>

Apply series of image filters aka '**Convolutional Kernels**' to input image.<br> 
Filters trained to recognize low-level features in an image.<br> Activations of these filters across the image returns a '**Feature Map**' for each filter. <br> <br>


**Activation Function** <br>
Normalizes pixel values.


## Convolution Operation



<img center src='images/convolution-working2.png?raw=1' height=50% width=75% align="center"/>

(`88*1 + 126*0 + 145*1`) + (`86*1 + 125*1 + 142*0`) + (`85*0 + 124*0 + 141*0`) <br>
= (88 + 145) + (86 + 125 ) <br>
= 233 + 211 <br>
= 444

Size of the feature map





## Convolutional Layer - Result <br>

<img center src='images/stride-convolution.gif?raw=1' align="center"/>



# Pooling Layer

## Goal

Dimensionality Reduction

Reduces the x-y size of of the extracted feature maps and only keeps the most *active* pixel values.<br>
Preserves Spatial Information. <br>
<br>
Few Types of Pooling <br>

*   Max Pooling (most common)
*   Average Pooling
*   Sum Pooling


### Max Pooling Example <br> 
2x2 pooling kernel, with a stride of 2. Only the maximum pixel values in 2x2 remain in the new, pooled output.


<img src='images/maxpooling_ex.png?raw=1' height=50% width=50% />

# Fully Connected Layer

## Goal

Produce list of class scores & prediction. 

Last Fully Connected layer will have as many nodes as there are classes.

<img center src='https://github.com/codeisi/writeups/blob/master/deck-cnn/images/fully-connected-layers.gif?raw=1' height=50% width=75% align="center"/>

# Stride & Padding

## Stride

Number of pixels a filter moves at a time

<img center src='images/full-padding-no-strides-transposed.gif?raw=1' height=25% width=25%  align="center"/>
<br>
<br>

## Padding

Addition of pixels to the edge of the image. <br>
Preserves border information.<br>

<img center src='images/padding-example.png?raw=1' height=25% width=50%  align="center"/>

<br>
<br>

## Padding Modes

Zeros (default): Add 1 or more layers of zeros (black) to the edges.<br>
Reflection ([Interesting discussion here](https://twitter.com/karpathy/status/720622989289644033)) : Pads with reflection of outer edges of the input image or filter tensor. <br>
Other options: Replicate, Circular 


# Number of Parameters in a Convolutional Layer

Number of parameters in the convolutional layer = 
K&#42;F&#42;F&#42;D_in + K. <br>

K - Number of filters <br>
F - Filter/Kernel size <br>
D_in - Depth of previous layer typically 1 or 3 (RGB and grayscale, respectively). <br>
<br>
As Kernel size increases, <br> 
* Number of parameters increase & <br>
* Size of patterns detected increases <br>

As number of filters increase, number of parameters increase


# Shape of a Convolutional Layer

The shape of a convolutional layer depends on the supplied values of kernel_size, input_shape, padding, and stride. 

The spatial dimensions of a convolutional layer can be calculated as: `(W_in−F+2P)/S+1`

K - the number of filters <br>
F - filter/kernel size <br>
S - stride <br>
P - padding <br>
W_in - input size (width/height (square) of the previous layer) <br>

The depth of the convolutional layer will always equal the number of filters K.
<br><br>

### Sample Calculation (Quiz 6.35)
For an input image that is 130x130 (x, y) and 3 in depth (RGB). 
Say, this image goes through the following layers in order: 
<br> <br>
nn.Conv2d(3, 10, 3) `stride=1, padding=0 output_size=(130-3+(2x0))/1+1=128` `depth=10`
<br>
nn.MaxPool2d(4, 4) `output_size=128/4=32` <br><br>
nn.Conv2d(10, 20, 5, padding=2) `stride=1 output_size=(32-5+(2x2))/1+1=32` <br>
nn.MaxPool2d(2, 2) `output_size=32/2=16` <br>

# Kernel Size Considerations

<table style="width:80%" align=center>
  <tr>
    <th width=50% align=left>Smaller Filter Sizes</th>
    <th width=50% align=left>Larger Filter Sizes</th>
  </tr>
  <tr>
    <td>It has a smaller receptive field as it looks at very few pixels at once.</td>
    <td>Larger receptive field per layer.</td>
  </tr>
  <tr>
    <td>Highly local features extracted without much image overview.</td>
    <td>Quite generic features extracted spread across the image.</td>
  </tr>
  <tr>
    <td>Therefore captures smaller, complex features in the image.</td>
    <td>Therefore captures the basic components in the image.</td>
  </tr>
  <tr>
    <td>Amount of information extracted will be vast, maybe useful in later layers.</td>
    <td>Amount of information extracted are considerably lesser.</td>
  </tr>
  <tr>
    <td>Slow reduction in the image dimension can make the network deep</td>
    <td>Fast reduction in the image dimension makes the network shallow</td>
  </tr>
  <tr>
    <td>Better weight sharing</td>
    <td>Poorer weight sharing</td>
  </tr>
  <tr>
    <td>In an extreme scenario, using a 1x1 convolution is like treating each pixel as a useful feature.</td>
    <td>Using a image sized filter is equivalent to a fully connected layer.</td>
  </tr>
</table>

# References

[Understanding Convolutional Neural Networks](https://towardsdatascience.com/understanding-convolutional-neural-networks-221930904a8e)

[Deciding optimimal kernel size for CNN](https://towardsdatascience.com/deciding-optimal-filter-size-for-cnns-d6f7b56f9363)