# How Convolution is implemented in TensorFlow, Caffe or PyTorch?
> see  
> 1. https://leonardoaraujosantos.gitbooks.io/artificial-inteligence/content/making_faster.html
> 2. https://www.reddit.com/r/MLQuestions/comments/8no4xe/anyone_familiar_with_how_tensorflow_or_pytorch/
> 3. https://cs231n.github.io/convolutional-networks/#conv
> 4. [tensorflow && caffe conv2D GPU版](https://www.jianshu.com/p/89667d844cac)


## 2D Convolution:
- Convolution is a mathematical operation that does the integral of the product of 2 functions(signals), with one of the signals flipped. 
- Equation: 

$$ \begin{split} y[n_1, n_2] &= \sum_{k_1 = -\infty}^{\infty} \sum_{k_2 = -\infty}^{\infty} x[k_1, k_2] h [n_1 - k_1, n_2 - k_2] \\ 
&= x[n_1, n_2] * h[n_1, n_2] \\
&= h[n_1, n_2] * x[n_1, n_2]
\end{split} $$

- Procedure:  
![Alt text|center|0X250](./files/convolution-procedure.png)


---

## Convolution in CNN:

- CNN vs Neuron:  
![Alt text|center](./files/convolution-in-cnn.png)


---


## [Making faster convolution via matrix multiplication](https://leonardoaraujosantos.gitbooks.io/artificial-inteligence/content/making_faster.html)

Here we show a way to convert your convolution operation into a matrix multiplication. This has the advantage to compute faster, at the `expense of more memory usage`. We employ the `im2col` operation that will transform the input image or batch into a matrix, then we multiply this matrix with a reshaped version of our `kernel`. Then at the end we reshape this multiplied matrix back to an image with the `col2im` operation.

As shown on previous source code, we use a lot for for-loops to implement the convolutions, while this is useful for learning purpose, it's not fast enough. On this section we will learn how to implement convolutions on a `vectorized fashion`.

First, if we inspect closer the code for convolution is basically a `dot-product` between the kernel filter and the local regions selected by the moving window, that sample a patch with the same size as our kernel.

What would happens if we `expand all possible windows` on memory and perform the `dot product` as a matrix multiplication. Answer `200x or more` speedups, at the expense of more `memory consumption`.  

![Alt text|center](https://leonardoaraujosantos.gitbooks.io/artificial-inteligence/content/image_folder_4/Convolution_With_Im2col.png)

For example, if the input is `[227x227x3]` and it is to be convolved with `11x11x3` filters at stride `4` and padding `0`, then we would take `[11x11x3]` blocks of pixels in the input and stretch each block into a `column vector` of size  `11∗11∗3  = 363`.

Calculating with input 227 with stride 4 and padding 0, gives `((227-11)/4)+1 = 55` locations along both width and height, leading to an output matrix `X_col` of size `[363 x 3025]`.
Here every column is a stretched out `receptive field` (patch with depth) and there are `55*55 = 3025` of them in total.

To summarize how we calculate the `im2col` output sizes:

```python
[img_height, img_width, img_channels] = size(img);
newImgHeight = floor(((img_height + 2*P - ksize) / S)+1);
newImgWidth = floor(((img_width + 2*P - ksize) / S)+1);        
cols = single(zeros((img_channels*ksize*ksize),(newImgHeight * newImgWidth)));
```

The weights of the CONV layer are similarly stretched out into `rows`. For example, if there are 96 filters of size `[11x11x3]` this would give a matrix `W_row` of size `[96 x 363]`, where `11x11x3=363`  

![Alt text|center](https://leonardoaraujosantos.gitbooks.io/artificial-inteligence/content/image_folder_4/im2col_operation.png)

After the image and the kernel are converted, the convolution can be implemented as a simple matrix multiplication, in our case it will be `W_col[96 x 363] multiplied by X_col[363 x 3025]` resulting as a matrix `[96 x 3025]`, that need to be reshaped back to `[55x55x96]`.

This final reshape can also be implemented as a function called `col2im`.

Notice that some implementations of `im2col` will have this result transposed, if this is the case then the order of the matrix multiplication must be changed.  

![Alt text|center](./files/Im2Col_cs231n.png)


## Forward graph
In order to help the usage of `im2col` with convolution and also to derive the back-propagation, let's show the convolution with `im2col` as a graph. Here the input tensor is single a 3 channel 4x4 image. That will pass to a convolution layer with `S:1 P:0 K:2 and F:1 (Output volume)`.  

![convolution  forward graph|center](https://leonardoaraujosantos.gitbooks.io/artificial-inteligence/content/assets/Conv_Graph_Im2col.png)


## Backward graph

Using the `im2col` technique the computation graph resembles the `FC` layer with the same format  $f(x, \theta, \beta )=(x \cdot \theta^T) + \beta $, the difference that now we have a bunch of reshapes, transposes and the `im2col` block.

About the reshapes and transposes during back propagation you just need to `invert` their operations using again another reshape or transpose, the only important thing to remember is that if you use a `reshape row major` during forward propagation you need to use a `reshape row major` on the backpropagation.

The only point to pay attention is the `im2col backpropagation` operation. The issue is that it cannot be implemented as a simple reshape. This is because the patches could actually overlap (depending on the stride), so you need to sum the gradients where the patches intersect.  

![Alt text|center](https://leonardoaraujosantos.gitbooks.io/artificial-inteligence/content/assets/Conv_Graph_Im2col_Backward.png)


## Smaller example:
To make things simpler on our heads, follow the simple example of convolving X[3x3] with W[2x2]
![Alt text](./1551297448178.png)


## Python Code:

### Im2col and Col2im sources in python:
This implementation will receive a image on the format of a 3 dimension tensor `[channels, rows, cols]` and will create a 2d matrix on the format `[rows=(new_h*new_w), cols=(kw*kw*C)]` notice that this algorithm will output the transposed version of the diagram above.

In [13]:
import numpy as np
def im2col(x,hh,ww,stride):

    """
    Args:
      x: image matrix to be translated into columns, (C,H,W)
      hh: filter height
      ww: filter width
      stride: stride
    Returns:
      col: (new_h*new_w,hh*ww*C) matrix, each column is a cube that will convolve with a filter
            new_h = (H-hh) // stride + 1, new_w = (W-ww) // stride + 1
    """

    c,h,w = x.shape
    new_h = (h-hh) // stride + 1
    new_w = (w-ww) // stride + 1
    col = np.zeros([new_h*new_w,c*hh*ww])

    for i in range(new_h):
       for j in range(new_w):
           patch = x[...,i*stride:i*stride+hh,j*stride:j*stride+ww]
           col[i*new_w+j,:] = np.reshape(patch,-1)
    return col

In [11]:
def col2im(mul,h_prime,w_prime,C):
    """
      Args:
      mul: (h_prime*w_prime*w,F) matrix, each col should be reshaped to C*h_prime*w_prime when C>0, or h_prime*w_prime when C = 0
      h_prime: reshaped filter height
      w_prime: reshaped filter width
      C: reshaped filter channel, if 0, reshape the filter to 2D, Otherwise reshape it to 3D
    Returns:
      if C == 0: (F,h_prime,w_prime) matrix
      Otherwise: (F,C,h_prime,w_prime) matrix
    """
    F = mul.shape[1]
    if(C == 1):
        out = np.zeros([F,h_prime,w_prime])
        for i in range(F):
            col = mul[:,i]
            out[i,:,:] = np.reshape(col,(h_prime,w_prime))
    else:
        out = np.zeros([F,C,h_prime,w_prime])
        for i in range(F):
            col = mul[:,i]
            out[i,:,:] = np.reshape(col,(C,h_prime,w_prime))

    return out

In [12]:
def col2im_back(dim_col,h_prime,w_prime,stride,hh,ww,c):
    """
    Args:
      dim_col: gradients for im_col,(h_prime*w_prime,hh*ww*c)
      h_prime,w_prime: height and width for the feature map
      strid: stride
      hh,ww,c: size of the filters
    Returns:
      dx: Gradients for x, (C,H,W)
    """
    H = (h_prime - 1) * stride + hh
    W = (w_prime - 1) * stride + ww
    dx = np.zeros([c,H,W])
    for i in range(h_prime*w_prime):
        row = dim_col[i,:]
        h_start = (i / w_prime) * stride
        w_start = (i % w_prime) * stride
        dx[:,h_start:h_start+hh,w_start:w_start+ww] += np.reshape(row,(c,hh,ww))
    return dx

### Python example for forward propagation

In [8]:
def conv_forward_naive(x, w, b, conv_param):
  """
  A naive implementation of the forward pass for a convolutional layer.

  The input consists of N data points, each with C channels, height H and width
  W. We convolve each input with F different filters, where each filter spans
  all C channels and has height HH and width HH.

  Input:
  - x: Input data of shape (N, C, H, W)
  - w: Filter weights of shape (F, C, HH, WW)
  - b: Biases, of shape (F,)
  - conv_param: A dictionary with the following keys:
    - 'stride': The number of pixels between adjacent receptive fields in the
      horizontal and vertical directions.
    - 'pad': The number of pixels that will be used to zero-pad the input.

  Returns a tuple of:
  - out: Output data, of shape (N, F, H', W') where H' and W' are given by
    H' = 1 + (H + 2 * pad - HH) / stride
    W' = 1 + (W + 2 * pad - WW) / stride
  - cache: (x, w, b, conv_param)
  """
  out = None
  pad_num = conv_param['pad']
  stride = conv_param['stride']
  N,C,H,W = x.shape
  F,C,HH,WW = w.shape
  H_prime = (H+2*pad_num-HH) // stride + 1
  W_prime = (W+2*pad_num-WW) // stride + 1
  out = np.zeros([N,F,H_prime,W_prime])
  #im2col
  for im_num in range(N):
      im = x[im_num,:,:,:]
      im_pad = np.pad(im,((0,0),(pad_num,pad_num),(pad_num,pad_num)),'constant')
      im_col = im2col(im_pad,HH,WW,stride)
      filter_col = np.reshape(w,(F,-1))
      mul = im_col.dot(filter_col.T) + b
      out[im_num,:,:,:] = col2im(mul,H_prime,W_prime,1)
  cache = (x, w, b, conv_param)
  return out, cache

### Python example for backward propagation:

In [9]:
def conv_backward_naive(dout, cache):
  """
  A naive implementation of the backward pass for a convolutional layer.

  Inputs:
  - dout: Upstream derivatives.
  - cache: A tuple of (x, w, b, conv_param) as in conv_forward_naive

  Returns a tuple of:
  - dx: Gradient with respect to x
  - dw: Gradient with respect to w
  - db: Gradient with respect to b
  """
  dx, dw, db = None, None, None

  x, w, b, conv_param = cache
  pad_num = conv_param['pad']
  stride = conv_param['stride']
  N,C,H,W = x.shape
  F,C,HH,WW = w.shape
  H_prime = (H+2*pad_num-HH) // stride + 1
  W_prime = (W+2*pad_num-WW) // stride + 1

  dw = np.zeros(w.shape)
  dx = np.zeros(x.shape)
  db = np.zeros(b.shape)

  # We could calculate the bias by just summing over the right dimensions
  # Bias gradient (Sum on dout dimensions (batch, rows, cols)
  #db = np.sum(dout, axis=(0, 2, 3))

  for i in range(N):
      im = x[i,:,:,:]
      im_pad = np.pad(im,((0,0),(pad_num,pad_num),(pad_num,pad_num)),'constant')
      im_col = im2col(im_pad,HH,WW,stride)
      filter_col = np.reshape(w,(F,-1)).T

      dout_i = dout[i,:,:,:]
      dbias_sum = np.reshape(dout_i,(F,-1))
      dbias_sum = dbias_sum.T

      #bias_sum = mul + b
      db += np.sum(dbias_sum,axis=0)
      dmul = dbias_sum

      #mul = im_col * filter_col
      dfilter_col = (im_col.T).dot(dmul)
      dim_col = dmul.dot(filter_col.T)

      dx_padded = col2im_back(dim_col,H_prime,W_prime,stride,HH,WW,C)
      dx[i,:,:,:] = dx_padded[:,pad_num:H+pad_num,pad_num:W+pad_num]
      dw += np.reshape(dfilter_col.T,(F,C,HH,WW))
  return dx, dw, db

## Smaller example:
To make things simpler on our heads, follow the simple example of convolving X[3x3] with W[2x2]  
![Alt text](https://leonardoaraujosantos.gitbooks.io/artificial-inteligence/content/image_folder_4/simple_im2col.png)

In [25]:
W = np.array([[1,3], [2,4]])
print 'filter W = ', W
X = np.array([[[1, 4 ,7], [2, 5, 8], [3, 6,9]]])
W = np.flip(np.flip(W, axis = 0), axis = 1)
print 'flipped W = ', W
W_col = W.flatten()
print 'W_col = ', W_col
X_col = im2col(X, hh = 2, ww = 2, stride = 1)
print 'X_col = ', X_col
conv = X_col.dot(W_col.T)
print 'conv result = ', conv

filter W =  [[1 3]
 [2 4]]
flipped W =  [[4 2]
 [3 1]]
W_col =  [4 2 3 1]
X_col =  [[1. 4. 2. 5.]
 [4. 7. 5. 8.]
 [2. 5. 3. 6.]
 [5. 8. 6. 9.]]
conv result =  [23. 53. 33. 63.]
