# Convolutional Neural Networks

<hr>

**Gentle introduction to CNNs**<br>

The key components of a CNN are the following:

1. **Patch filtering**

    Takes a small patch of an image as the input, a feature of the image that we would like to capture, and learn the weights to recognize a given patch.
    
    If the input image has a feature similar to the patch, then the $ReLU$ activation function would generate a large response.
    
    We break up the image into patches of the same size and pass it through patch weights and transform it into a feature map that indicates if the activation function generates a large response, i.e. How much of the target feature exists in this patch?
    
    <img alt="Patch Filter" src="assets/patch_filter.png" width="400">
    
    <img alt="Convolution" src="assets/convolution_to_feature_map.png" width="400">

    
    
2. **Pooling**

    Given that we just want to know if the feature exists and not where it is, then we run *pooling* to find the maximum value of a patch in the *feature map* from (1). These maximum values get transformed into a *pooled map* which will be in smaller dimension than the feature map.
    
    <img alt="Pooling" src="assets/pooling.png" width="400">

    
****

**Convolution operation to create a feature map**

Here, we formally define the convolution as an operation between two functions $f$ and $g$:

$(f * g)(t) \equiv \int _{-\infty }^{+\infty } f(\tau )g(t-\tau )d\tau$

where $f$ is the image, $g$ is the patch filter, $\tau$ is the dummy variable for integration and $t$ is the parameter of interest. Intuitively, convolution *blends* the two functions by expressing the amount of overlap. Suppose if an image patch, the input signal $f$, contains a feature in the filter patch $g$ then we expect the convolutional output to generate a large response, i.e. the area under the curve between the product of these two functions (the overlap).

Suppose $f$ is a 2D discrete signal and we have a filter $g'$ of the following:

$f = \begin{bmatrix}  1 &  2 &  1 \\ 2 &  1 &  1 \\ 1 &  1 &  1 \end{bmatrix}$

$g' = \begin{bmatrix}  1 &  0.5 \\ 0.5 &  1 \end{bmatrix}$

We move the filter around $f$ and multiply element-wise, resulting in a convolutional output of the following:

$C = \begin{bmatrix}  4 &  4 \\ 4 &  3 \end{bmatrix}$


****

**Pooling operation to create a pooled map**

Here we formally define pooling as an operation:

$\text {Pool}(\text {ReLU}(\text {Conv}(I)))$

where

- $\text {ReLU}(x) = \text {max}(0, x)$
- $\text {Pool} = \text{max} (\text{ReLU}(x))$


Given an image $I$, filter weights $F$, the following is a an example of how we would arrive at an output:

$I = \begin{bmatrix}  1 &  0 &  2 \\ 3 &  1 &  0 \\ 0 &  0 &  4 \end{bmatrix}$

$F = \begin{bmatrix}  1 &  0 \\ 0 &  1 \end{bmatrix}$

$\text {Conv}(I) = \begin{bmatrix}  1 &  0 &  2 \\ 3 &  1 &  0 \\ 0 &  0 &  4 \end{bmatrix}. \begin{bmatrix}  1 &  0 \\ 0 &  1 \end{bmatrix}$

$\text {Conv}(I) = \begin{bmatrix}  2 &  0 \\ 3 &  5 \end{bmatrix}$

$\text {ReLU}(\text {Conv}(I)) = \text {ReLU}(\begin{bmatrix}  2 &  0 \\ 3 &  5 \end{bmatrix})$

$\text {ReLU}(\text {Conv}(I)) = \begin{bmatrix}  2 &  0 \\ 3 &  5 \end{bmatrix}$

$\text {Pool}(\text {ReLU}(\text {Conv}(I))) = \text {Pool}(\begin{bmatrix}  2 &  0 \\ 3 &  5 \end{bmatrix})$

$\text {Pool}(\text {ReLU}(\text {Conv}(I))) = 5$

****

**Constructing a CNN, in reality**

1. Map an input into multiple feature maps, based on the number of features that we would like to capture
2. Run a pooling layer to extract the maximum value of each feature map
3. Create another convolutional layer to extract a combination of these features
4. Another pooling layer to extract the maximum value of these combinations
5. Finally, an output classification

<img alt="CNN Construction" src="assets/cnn_construction.png" width="600">


<hr>

# Basic code
A `minimal, reproducible example`