<a href="https://colab.research.google.com/github/ajayrfhp/LearningDeepLearning/blob/main/Convolution.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## Conv net
- Images have rich spatial structure. FLattening an image and building MLPs is waste of parameters. An image with million pixels could need million parameters
- Objects in images can be present anywhere in an image. We can slide a patch detector across image, get activation and use it to see object is present in the image.
- Earlier regions detect local patches, later regions detect higher level activations
- By parameter sharing, that is, sharing a conv kernel across an image, we can greatly decrease number of parameters of the network
- Mechanics
  
  ```
  delta = k / 2
  for l in (-delta, delta):
    for m in (-delta, delta):
      h[i, j] = w[l][m] * x[i+l][j+m]
  ```
- Params
  - Params in one conv filter = $k*k + 1$
  - Params in n conv filters = $n * (k*k + 1)$
  - You would need a 3d conv filter if input has multiple color channels called c
  - Params in n conv filters operating over c channels = $n * c * (k*k + 1)$
- If k = 0, then you have one weight param per pixel, channel. This leads to network in network architectures.

- Shape of output without padding is (w-k+1,h-k+1)

- Padding & Stride
   - To prevent loss of pixel information, dummy values can be added to input to side. This is typically 0.
   - Shape of output with padding is $(w-k+1+p, h-k+1+p)$, if p = k-1, then we can have same shape for input and output (w, h)
   - Strides are introduced to $((w-k+1+s+p)/s, (h-k+1+s+p)/s)$
     - if p = k -1, w is divisible by s, output shape simiplifies to $(w/s, h/s)$
     - Strides are useful for downsampling, providing different set of activations

-  Convolution as matrix multiplication
  - For input size(h, w) and kernel size k, convolution / cross correlation can be represented as matrix mulitplication using a special tobelitz matrix.
  - Refer to [here](https://github.com/alisaaalehi/convolution_as_multiplication) for convolution can be implemented as a matrix multiplication.
  - Shape of matrix T is $((h-k+1) * (w-k+1), (h*w))$.
  - Convolution can be represented as matrix multiplication of T with a flattened input vector of shape $(h*w)$
  - Cost and memory footprint is $ (h-k+1) * (w-k+1) * (h*w) $ for one input channel and one output channel.
  - If we have $c_i$ input and $c_o$ output channels, it is $ (h-k+1) * (w-k+1) * (h*w) * c_i * c_o $
  - Ignoring kenel size with padding and a stride of s, it simplifies to $(h^2*w^2*c_i*c_o)/s$



In [None]:
!pip install d2l

Collecting d2l
  Downloading d2l-1.0.3-py3-none-any.whl (111 kB)
[?25l     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/111.7 kB[0m [31m?[0m eta [36m-:--:--[0m[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m111.7/111.7 kB[0m [31m3.8 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting jupyter==1.0.0 (from d2l)
  Downloading jupyter-1.0.0-py2.py3-none-any.whl (2.7 kB)
Collecting matplotlib==3.7.2 (from d2l)
  Downloading matplotlib-3.7.2-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (11.6 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m11.6/11.6 MB[0m [31m29.8 MB/s[0m eta [36m0:00:00[0m
Collecting pandas==2.0.3 (from d2l)
  Downloading pandas-2.0.3-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (12.3 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m12.3/12.3 MB[0m [31m55.1 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting scipy==1.10.1 (from d2l)
  Downloading scipy-1.10.1-cp310-cp310-ma

In [None]:
import torch
from torch import nn
import torchvision
from d2l import torch as d2l
import numpy as np
from scipy import signal

In [None]:
def conv_2d(x, weight, bias=0, padding_type='same'):
  """
    Args
      x - (h, w)
      weight - (k, k)
      bias - ()
    Returns
      convolved output - (h-k+1, w-k+1) for padding type cut
      convolved output - (h, w) for padding type same
  """
  h, w = x.shape[0], x.shape[1]
  k, l = weight.shape[0], weight.shape[1]
  if padding_type == 'cut':
    o = torch.zeros((h-k+1, w-l+1))
  else:
    o = torch.zeros((h, w))

  k_low = int(-k/2)
  k_high = int(k/2) + int(k%2 != 0)

  l_low = int(-l/2)
  l_high = int(l/2) + int(l%2 != 0)


  for i in range(h):
    for j in range(w):
      if i + k_low >= 0 and i + k_high <= x.shape[0] and j + l_low >= 0 and j + l_high <= x.shape[1]:
        x_hat = x[i+k_low:i+k_high, j+l_low:j+l_high]
        if padding_type == 'same':
          o[i, j] = (x_hat * weight).sum() + bias
        elif padding_type == 'cut':
          o[i+k_low, j+l_low] = (x_hat * weight).sum()


  return o

In [None]:
x = torch.tensor(([
    [1, 2, 3, 4],
    [4, 5, 6, 7],
    [8, 9, 10, 11],
    [12, 13, 14, 15]
]))

w = torch.tensor(([
    [0, 0, 0],
    [0, 1, 0],
    [0, 0, 0]
]))


print(conv_2d(x, w, padding_type='cut'))
print(conv_2d(x, w, padding_type='same'))


tensor([[ 5.,  6.],
        [ 9., 10.]])
tensor([[ 0.,  0.,  0.,  0.],
        [ 0.,  5.,  6.,  0.],
        [ 0.,  9., 10.,  0.],
        [ 0.,  0.,  0.,  0.]])


In [None]:
x = torch.tensor(([
    [1, 2, 3],
    [4, 5, 6],
    [8, 9, 10],
]))

w = torch.tensor(([
    [0, 0],
    [0, 1],
]))


print(conv_2d(x, w, padding_type='cut'))
print(conv_2d(x, w, bias=1, padding_type='same'))


tensor([[ 5.,  6.],
        [ 9., 10.]])
tensor([[ 0.,  0.,  0.],
        [ 0.,  6.,  7.],
        [ 0., 10., 11.]])


In [None]:
w = torch.tensor(([
    [0.5, 0.5],
    [0.5, 0.5]
]))

print(conv_2d(x, w, padding_type='cut'))
print(conv_2d(x, w, padding_type='same'))

tensor([[ 6.,  8.],
        [13., 15.]])
tensor([[ 0.,  0.,  0.],
        [ 0.,  6.,  8.],
        [ 0., 13., 15.]])


## Horizontal kernels

In [None]:
X = torch.ones((5, 5))
X[:,1:3] = 0
k = torch.tensor([1, -1]).reshape((1, 2))

print(X)

conv_2d(X, k)

tensor([[1., 0., 0., 1., 1.],
        [1., 0., 0., 1., 1.],
        [1., 0., 0., 1., 1.],
        [1., 0., 0., 1., 1.],
        [1., 0., 0., 1., 1.]])


tensor([[ 0.,  1.,  0., -1.,  0.],
        [ 0.,  1.,  0., -1.,  0.],
        [ 0.,  1.,  0., -1.,  0.],
        [ 0.,  1.,  0., -1.,  0.],
        [ 0.,  1.,  0., -1.,  0.]])

- Kernel K of shape (1, 2) can only detect horizontal edges.

In [None]:
print(X.T)
conv_2d(X.T, k)

tensor([[1., 1., 1., 1., 1.],
        [0., 0., 0., 0., 0.],
        [0., 0., 0., 0., 0.],
        [1., 1., 1., 1., 1.],
        [1., 1., 1., 1., 1.]])


tensor([[0., 0., 0., 0., 0.],
        [0., 0., 0., 0., 0.],
        [0., 0., 0., 0., 0.],
        [0., 0., 0., 0., 0.],
        [0., 0., 0., 0., 0.]])

In [None]:
print(k.T)
conv_2d(X.T, k.T)

tensor([[ 1],
        [-1]])


tensor([[ 0.,  0.,  0.,  0.,  0.],
        [ 1.,  1.,  1.,  1.,  1.],
        [ 0.,  0.,  0.,  0.,  0.],
        [-1., -1., -1., -1., -1.],
        [ 0.,  0.,  0.,  0.,  0.]])

In [None]:
x = torch.randn((10, 3, 100, 100))

fc1 = nn.Conv2d(in_channels=3, out_channels=3, kernel_size=5, padding='same', stride=1)

fc1.forward(x).shape

torch.Size([10, 3, 100, 100])

In [None]:
fc2 = nn.Conv2d(in_channels=3, out_channels=3, kernel_size=5, padding='valid', stride=2)
fc2.forward(x).shape

torch.Size([10, 3, 48, 48])

## Implement conv2d with multiple input channels

In [None]:


def multi_input_channel_conv2d(x, weight, bias, padding_type='same'):
    """
    Args
      x - (c, h, w)
      weight - (c, k, k)
      bias - ()
    Returns
      convolved output - (h-k+1, w-k+1) for padding type cut
      conv
    """

    h, w = x.shape[1], x.shape[2]
    c = x.shape[0]
    k = weight.shape[0]
    o = torch.zeros((h, w))
    if padding_type == "cut":
      o = torch.zeros((h-k+1, w-k+1))
    for i in range(c):
      o += conv_2d(x[i], weight[i], bias, padding_type)
    return o

x = torch.arange(0, 18).reshape((2, 3, 3))
# when input has multiple channels, our conv kernel needs to have multiple channels.
print('\n\nInputs\n\n')
print(x[0], x[1])

weight = torch.tensor((
  [
  [0.5, 0.5, 0.5],
  [0.5, 0.5, 0.5],
  [0.5, 0.5, 0.5]
  ],
  [
  [0.5, 0.5, 0],
  [0, 0, 0],
  [0, 0, 0]
  ]
))

print("\n\n Weight \n\n")
print(weight)

print("\n\nOutput\n\n")
multi_input_channel_conv2d(x, weight, bias=0)



Inputs


tensor([[0, 1, 2],
        [3, 4, 5],
        [6, 7, 8]]) tensor([[ 9, 10, 11],
        [12, 13, 14],
        [15, 16, 17]])


 Weight 


tensor([[[0.5000, 0.5000, 0.5000],
         [0.5000, 0.5000, 0.5000],
         [0.5000, 0.5000, 0.5000]],

        [[0.5000, 0.5000, 0.0000],
         [0.0000, 0.0000, 0.0000],
         [0.0000, 0.0000, 0.0000]]])


Output




tensor([[ 0.0000,  0.0000,  0.0000],
        [ 0.0000, 27.5000,  0.0000],
        [ 0.0000,  0.0000,  0.0000]])

In [None]:
def multi_output_channel_conv2d(x, n_out, weight, bias, padding_type):
    """
    Args
      x - (c, h, w)
      n_out - number of output channels
      weight - (n_out, c, k, k)
      bias - ()
      padding_type - padding type
    Returns
      convolved output - (n_out, h-k+1, w-k+1) for padding type cut
    """
    return torch.stack([ multi_input_channel_conv2d(x, weight[n], bias, padding_type)  for n in range(n_out)])


x = torch.arange(0, 18).reshape((2, 3, 3))
# when input has multiple channels, our conv kernel needs to have multiple channels.
print('\n\nInputs\n\n')
print(x[0], x[1])

weight = torch.tensor(([
  [
  [0.5, 0.5, 0.5],
  [0.5, 0.5, 0.5],
  [0.5, 0.5, 0.5]
  ],
  [
  [0.5, 0.5, 0],
  [0, 0, 0],
  [0, 0, 0]
  ]],
  [[
  [2, 2, 2],
  [2, 2, 2],
  [2, 2, 2]
  ],
  [
  [0.5, 0.5, 0],
  [0, 0, 0],
  [0, 0, 0]
  ]]))

print("\n\n Weight \n\n")
print(weight)

print("\n\nOutput\n\n")
multi_output_channel_conv2d(x, 2, weight, bias=0, padding_type='same')




Inputs


tensor([[0, 1, 2],
        [3, 4, 5],
        [6, 7, 8]]) tensor([[ 9, 10, 11],
        [12, 13, 14],
        [15, 16, 17]])


 Weight 


tensor([[[[0.5000, 0.5000, 0.5000],
          [0.5000, 0.5000, 0.5000],
          [0.5000, 0.5000, 0.5000]],

         [[0.5000, 0.5000, 0.0000],
          [0.0000, 0.0000, 0.0000],
          [0.0000, 0.0000, 0.0000]]],


        [[[2.0000, 2.0000, 2.0000],
          [2.0000, 2.0000, 2.0000],
          [2.0000, 2.0000, 2.0000]],

         [[0.5000, 0.5000, 0.0000],
          [0.0000, 0.0000, 0.0000],
          [0.0000, 0.0000, 0.0000]]]])


Output




tensor([[[ 0.0000,  0.0000,  0.0000],
         [ 0.0000, 27.5000,  0.0000],
         [ 0.0000,  0.0000,  0.0000]],

        [[ 0.0000,  0.0000,  0.0000],
         [ 0.0000, 81.5000,  0.0000],
         [ 0.0000,  0.0000,  0.0000]]])

In [None]:
i = np.array([
    [1, 2, 3],
    [4, 5, 6]
])
f = np.array([
    [10, 20],
    [30, 40]])

signal.convolve(i, np.flip(f), 'full')

array([[ 40, 110, 180,  90],
       [180, 370, 470, 210],
       [ 80, 140, 170,  60]])