# Cross Correlation

Each output unit is a linear function of localized subset of input units


![SegmentLocal](images/no_padding_no_strides.gif "segment")


$H[x,y]=\sum_{v=-k}^{k} \sum_{u=-k}^{k} I[x+u,y+v] F[u,v]$

# Convolution

Refs: [1](https://github.com/vdumoulin/conv_arithmetic) [2](https://theblog.github.io/post/convolution-in-autoregressive-neural-networks/)

# Shape of the Convolution Output

$H_{out} =\left \lfloor \frac{ H_{in} +2 \times \text{padding[0]}-\text{dilation[0]} \times(\text{kernel_size}[0]-1)-1}{stride[0]}   +1\right \rfloor$



$W_{out} =\left \lfloor \frac{ W_{in} +2\times \text{padding[1]}-\text{dilation[1]} \times(\text{kernel_size}[1]-1)-1}{stride[1]}   +1\right \rfloor$

# 2D Convolution as Matrix Multiplication

You can write 2D Convolution as Matrix Multiplication. There are several way to do that

## 1) Discrete convolution
You should use a **doubly block circulant matrix** which is a special case of **Toeplitz matrix**.

The following matrix is a Toeplitz matrix:


$\begin{bmatrix}
a & b & c & d & e \\
f & a & b & c & d \\
g & f & a & b & c \\
h & g & f & a & b \\
i & h & g & f & a 
\end{bmatrix}$

Toeplitz matrix of a $n×n$ matrix A is:


${\displaystyle {\begin{bmatrix}a_{0}&a_{-1}&a_{-2}&\cdots &\cdots &a_{-(n-1)}\\a_{1}&a_{0}&a_{-1}&\ddots &&\vdots \\a_{2}&a_{1}&\ddots &\ddots &\ddots &\vdots \\\vdots &\ddots &\ddots &\ddots &a_{-1}&a_{-2}\\\vdots &&\ddots &a_{1}&a_{0}&a_{-1}\\a_{n-1}&\cdots &\cdots &a_{2}&a_{1}&a_{0}\end{bmatrix}}}$


If the i,j element of A is denoted $A_{i,j}$, then we have

${\displaystyle A_{i,j}=A_{i+1,j+1}=a_{i-j}.\ }$ 

${\displaystyle y=k\ast x={\begin{bmatrix}k_{1}&0&\cdots &0&0\\k_{2}&k_{1}&&\vdots &\vdots \\k_{3}&k_{2}&\cdots &0&0\\\vdots &k_{3}&\cdots &k_{1}&0\\k_{m-1}&\vdots &\ddots &k_{2}&k_{1}\\k_{m}&k_{m-1}&&\vdots &k_{2}\\0&k_{m}&\ddots &k_{m-2}&\vdots \\0&0&\cdots &k_{m-1}&k_{m-2}\\\vdots &\vdots &&k_{m}&k_{m-1}\\0&0&0&\cdots &k_{m}\end{bmatrix}}{\begin{bmatrix}x_{1}\\x_{2}\\x_{3}\\\vdots \\x_{n}\end{bmatrix}}}$

Refs: [1](https://en.wikipedia.org/wiki/Toeplitz_matrix#Discrete_convolution)

## 2) Using im2col

Suppose we have a single channel 4 x 4 image, X, and its pixel values are as follows:

<img src='images/im2col_1.png'>

and our weight is:

$\begin{bmatrix}
1 &2 \\ 
 3& 4
\end{bmatrix}$

This means that there will be 9 2 x 2 image patches that will be element-wise multiplied with the matrix W, like so:
<img src='images/im2col_2.png'>


These image patches can be represented as 4-dimensional column vectors and concatenated to form a single 4 x 9 matrix, P, like so:


<img src='images/im2col_3.png'>

To perform the convolution, we first matrix multiply K with P to get a 9-dimensional row vector (1 x 9 matrix) which gives us:

<img src='images/im2col_4.png'>


Then we reshape the result of K P to the correct shape, which is a 3 x 3 x 1

Refs: [1](https://medium.com/@_init_/an-illustrated-explanation-of-performing-2d-convolutions-using-matrix-multiplications-1e8de8cd2544)

## 3) Using Doubly Block Circulant Matrix

Let's say we have a filter $k$ of size $ m\times m$ and your input data $\times$ is of size $n\timesn$.




<img src='images/input.png'>  <img src='images/k.png'>

You should unroll $k$ into a sparse matrix of size $(n-m+1)^2 \times  n^2$, and unroll x into a long vector $n^2 \times 1$

<img src='images/conv_mult.png'>

<img src='images/conv_result.png'>


In the end you should reshape your vector. Convert the resulting vector (which will have a size $(n-m+1)^2 \times 1)$ into a $n-m+1$ square matrix 

Refs: [1](https://stackoverflow.com/questions/16798888/2-d-convolution-as-a-matrix-matrix-multiplication), [2](https://dsp.stackexchange.com/questions/35373/2d-convolution-as-a-doubly-block-circulant-matrix-operating-on-a-vector)

# Convolution in RGB Images

Number of channels in our image must match the number of channels in our filter, so these two numbers have to be equal. The output of this will be a $4 \times 4 \times 1$. We ofetn have $k$ filters of size $3\times3\times3$ so the output would be $k$ images of size $4 \times 4 \times 1$

<img src='images/06_03.png'/>

<img src='images/06_09.png'>

<img src='images/3_channel_conv.gif'>

Refs: [1](http://datahacker.rs/convolution-rgb-image/), [2](https://cs231n.github.io/convolutional-networks/#conv)

# Transpose Convolution

AKA :
- Deconvolution (bad)
- Upconvolution
- Fractionally strided convolution
- Backward strided convolution


No padding, no strides, transposed
<img src='images/no_padding_no_strides_transposed.gif'>


Full padding, no strides, transposed

<img src='images/full_padding_no_strides_transposed.gif'>

# 1x1 Convolution: Network-in-Network

Lets say you have tensor $(N, C, H, W)$, ($N$ is the batch size, $CF$ is the number of channel, $
H,W$ are the spatial dimensions). Suppose this output is fed into a conv layer with $F_1$ $1\times1\times C$ with zero padding and stride 1. Then the output of this $1\times1$ conv layer will have shape $(N,1,H,W)$. We dot product every element of the filter with the tensor and apply a RelU function on the output. You can imagine this a single neuron which has $C$ input. Thats why it is called **Network-in-Network**.


You can use a $1\times1$ convolutional layer to reduce $n_C$ but not $n_H, n_W$.

You can use a pooling layer to reduce $n_H$, $n_W$, and $n_C$.



Refs: [1](https://arxiv.org/abs/1312.4400), [2](https://www.youtube.com/watch?v=vcp0XvDAX68), [3](https://stats.stackexchange.com/questions/194142/what-does-1x1-convolution-mean-in-a-neural-network)

# Dilated Convolutions


This can be very useful in some settings to use in conjunction with 0-dilated filters because it allows you to merge spatial information across the inputs much more agressively with fewer layers. For example, if you stack two 3x3 CONV layers on top of each other then you can convince yourself that the neurons on the 2nd layer are a function of a 5x5 patch of the input (we would say that the effective receptive field of these neurons is 5x5). If we use dilated convolutions then this effective receptive field would grow much quicker.





<img src='images/dilation.gif'>

Refs: [1](https://arxiv.org/abs/1511.07122)