<a href="https://colab.research.google.com/github/HardikPrabhu/Quick-tutorials-for-pytorch/blob/main/Convolutions.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [12]:
import torch
import torch.nn as nn
import torch.nn.functional as F

#Convolutions

For continuous one dimensional functions f and g, their convolution (f * g) is defined as:
$$(f * g)(t) = ∫ f(τ)g(t - τ) dτ$$


* Think of convolution between two real valued functions as taking one function and flipping it on the x axis and adding a small step (t) to it and taling the weighted sum across all the points in the domain to get the output as function of the step.



Discrete Convolution
* For discrete functions, convolution is defined as:
(f ∗ g)[n] = Σ f[m]g[n - m]

This involves flipping one of the signals (the g[n - m] term represents g flipped and shifted).

Cross-Correlation (What We Actually Use)
* In practice, we work with cross-correlation instead, which is defined as:
(f * g)[n] = Σ f[n + m]g[m]


Key differences:

No flipping: Cross-correlation doesn't flip the second function
More intuitive: Direct element-wise multiplication and summation



## Convolution is a Linear Trasformation

If we consider the input vector's ith dimension as a output of a function:
 $f(i) =x_i$

and $g(i) = w_i$, ith weight in the kernel (set of parameters).


Then Convolution is at the end of the day a linear trasformation, its just the number of parameters reduce by using the same set of parameters (kernel) over and over again to create a sparse (with lots of zeros) Matrix for the linear transforamtion done.

So, can directly be used in between layers just as we use regular weight matrices.

Now its a two step process:
 1. Create appropriate weight matrix using the kernel
 2. Then do the linear transfomration (this is what is meant by parellel processing)


 Example:
Consider the input vector x  as the following:

$x = [x1,x2,x3,x4]^T$

and output $y = [y1,y2]^T$

In normal linear transformation we would require:

$y =Wx$, we requre 2x4 matrix (8 params), and the connection is dense ($y_i$ depends on all the $x_j$'s)

## Convolution = sparse connection + parameter sharing

## Sparse Conection:



$y_1$ depends on $(x_1, x_2, x_3)$
$y_2$ depends on $(x_2, x_3, x_4)$


Weight matrix becomes:
$W$ = \begin{bmatrix}
w_{11} & w_{12} & w_{13} & 0 \end{bmatrix}

\begin{bmatrix}
0 & w_{22} & w_{23} & w_{24}
\end{bmatrix}

## Parameter sharing:


For true convolution (parameter sharing), we would use the same kernel weights:
$W$ = \begin{bmatrix}
w_1 & w_2 & w_3 & 0 \
\end{bmatrix}

\begin{bmatrix}
0 & w_1 & w_2 & w_3
\end{bmatrix}




## PyTorch Convolution Rules

##Output Size Determination

When using the default convolution layer in PyTorch:
Output dimension formula:

$\text{out_dim} = \text{input_dim} - \text{kernel_size} + 1$

##Mathematical Foundation
The convolution operation is defined as:

$(f * g)[n] = \sum_{m} f[n + m] \cdot g[m]$

This works as long as $f(n+m)$ is defined within the valid input range. The output size naturally follows from the valid positions where the kernel can be applied without going out of bounds.

##

1-D Convolution using pytorch


In [6]:

# we are ready to create our first convolution layer
size = 3
input_dim = 10
kernel = torch.rand(size).view(1,-1)
x = torch.arange(1,input_dim+1,dtype=torch.float)

# W is going to be (8,10)
W = torch.zeros(8,10)
for i in range(8):
  W[i,i:i+size] = kernel
print(W)

tensor([[0.3447, 0.5001, 0.2005, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000,
         0.0000],
        [0.0000, 0.3447, 0.5001, 0.2005, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000,
         0.0000],
        [0.0000, 0.0000, 0.3447, 0.5001, 0.2005, 0.0000, 0.0000, 0.0000, 0.0000,
         0.0000],
        [0.0000, 0.0000, 0.0000, 0.3447, 0.5001, 0.2005, 0.0000, 0.0000, 0.0000,
         0.0000],
        [0.0000, 0.0000, 0.0000, 0.0000, 0.3447, 0.5001, 0.2005, 0.0000, 0.0000,
         0.0000],
        [0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.3447, 0.5001, 0.2005, 0.0000,
         0.0000],
        [0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.3447, 0.5001, 0.2005,
         0.0000],
        [0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.3447, 0.5001,
         0.2005]])


In [7]:
# finally we get
y = W@x
print(y)

tensor([1.9466, 2.9920, 4.0374, 5.0828, 6.1282, 7.1736, 8.2190, 9.2643])


Remember Neural nets support sequence of such inputs:
$Y = XW^T$

In [9]:
import torch

class Conv1D(torch.nn.Module):
   def __init__(self, kernel_size, input_dim):
       super().__init__()
       self.kernel_size = kernel_size
       self.input_dim = input_dim
       self.output_dim = input_dim - kernel_size + 1

       # Initialize kernel weights
       self.kernel = torch.nn.Parameter(torch.randn(kernel_size)) # Optimizer needs to see this, ony using requires_grad=True, grad is calcualted but the optimizer never updates it!

   def forward(self, x):
       # Create sparse weight matrix W of shape (output_dim, input_dim)
       W = torch.zeros(self.output_dim, self.input_dim, device=x.device)
       for i in range(self.output_dim):
           W[i, i:i+self.kernel_size] = self.kernel

       # Apply convolution: Y = X @ W^T
       return x @ W.T


In [10]:
# Usage example
batch_size = 4
input_dim = 10
kernel_size = 3

conv_layer = Conv1D(kernel_size, input_dim)
x = torch.randn(batch_size, input_dim)
y = conv_layer(x)

print(f"Input shape: {x.shape}")   # (4, 10)
print(f"Output shape: {y.shape}")  # (4, 8)

Input shape: torch.Size([4, 10])
Output shape: torch.Size([4, 8])


# Stride and Dilation

## Stride
Controls how much you move the kernel between applications:

Stride = 1 (default): move kernel 1 position at a time
```
# Input: [1,2,3,4,5,6,7,8,9,10], kernel_size=3
# Positions: [1,2,3], [2,3,4], [3,4,5], [4,5,6], [5,6,7], [6,7,8], [7,8,9], [8,9,10]


W = [[w1, w2, w3, 0,  0,  0,  0,  0,  0,  0 ],
     [0,  w1, w2, w3, 0,  0,  0,  0,  0,  0 ],
     [0,  0,  w1, w2, w3, 0,  0,  0,  0,  0 ],
     [0,  0,  0,  w1, w2, w3, 0,  0,  0,  0 ],
     [0,  0,  0,  0,  w1, w2, w3, 0,  0,  0 ],
     [0,  0,  0,  0,  0,  w1, w2, w3, 0,  0 ],
     [0,  0,  0,  0,  0,  0,  w1, w2, w3, 0 ],
     [0,  0,  0,  0,  0,  0,  0,  w1, w2, w3]]


```
Stride = 2: move kernel 2 positions at a time  

```
# Positions: [1,2,3], [3,4,5], [5,6,7], [7,8,9]
# Output size = (input_dim - kernel_size) // stride + 1


W = [[w1, w2, w3, 0,  0,  0,  0,  0,  0,  0 ],
     [0,  0,  w1, w2, w3, 0,  0,  0,  0,  0 ],
     [0,  0,  0,  0,  w1, w2, w3, 0,  0,  0 ],
     [0,  0,  0,  0,  0,  0,  w1, w2, w3, 0 ]]

```

## Dilation
Controls the spacing within the kernel itself:


Dilation = 1 (default): kernel elements are adjacent

Dilation = 2: skip 1 element between kernel positions

```
# Positions: [1,3,5], [2,4,6], [3,5,7], [4,6,8], etc.
# Output size = input_dim - (kernel_size-1) * dilation = 10 - (3-1)*2 = 6

W = [[w1, 0,  w2, 0,  w3, 0,  0,  0,  0,  0 ],
     [0,  w1, 0,  w2, 0,  w3, 0,  0,  0,  0 ],
     [0,  0,  w1, 0,  w2, 0,  w3, 0,  0,  0 ],
     [0,  0,  0,  w1, 0,  w2, 0,  w3, 0,  0 ],
     [0,  0,  0,  0,  w1, 0,  w2, 0,  w3, 0 ],
     [0,  0,  0,  0,  0,  w1, 0,  w2, 0,  w3]]


```

Note here is a lazy example but could use different indexing techniques shown in the first notebook to store values form kerenel into weight matrix:

For visualisation on 2-D Conv for the same concepts refer: https://github.com/vdumoulin/conv_arithmetic/tree/master

## Implementation: First do the dialtion, then do the stride

Think of dialtion as incresing kernel size but adding zeros in between

In [None]:
class Conv1D(torch.nn.Module):
   def __init__(self, kernel_size, input_dim, stride=1, dilation=1):
       super().__init__()
       self.kernel_size = kernel_size
       self.input_dim = input_dim
       self.stride = stride
       self.dilation = dilation

       # Calculate output dimension
       effective_kernel_size = (kernel_size - 1) * dilation + 1
       self.output_dim = (input_dim - effective_kernel_size) // stride + 1

       # Initialize kernel weights
       self.kernel = torch.nn.Parameter(torch.randn(kernel_size))

   def forward(self, x):
       # x shape: (batch_size, input_dim)
       # Create sparse weight matrix W
       W = torch.zeros(self.output_dim, self.input_dim, device=x.device)

       for i in range(self.output_dim):
           start_pos = i * self.stride
           for j in range(self.kernel_size):
               col_idx = start_pos + j * self.dilation
               if col_idx < self.input_dim:
                   W[i, col_idx] = self.kernel[j]


       return x @ W.T


# Padding


If you want to return the output in the same dimension as of input, we need to pad it with zeros first so that

output_dim = f(input_dim + padding, stride,dilation) is same as input_dim

## Note: The math for how many zeros we should pad may not add up when we have stride and dialaton (Read specific implementations to see how they work)

In [11]:
import torch.nn.functional as F

# Symmetric padding (default)
x = torch.tensor([[1., 2., 3., 4., 5.]])
padded_sym = F.pad(x, (2, 2), value=0)  # (left_pad, right_pad)
print(f"Symmetric:  {padded_sym}")  # [0, 0, 1, 2, 3, 4, 5, 0, 0]

# Asymmetric padding
padded_asym = F.pad(x, (1, 3), value=0)  # 1 zero left, 3 zeros right
print(f"Asymmetric: {padded_asym}")     # [0, 1, 2, 3, 4, 5, 0, 0, 0]

# Only left padding
padded_left = F.pad(x, (2, 0), value=0)
print(f"Left only:  {padded_left}")     # [0, 0, 1, 2, 3, 4, 5]

# Only right padding
padded_right = F.pad(x, (0, 2), value=0)
print(f"Right only: {padded_right}")    # [1, 2, 3, 4, 5, 0, 0]

Symmetric:  tensor([[0., 0., 1., 2., 3., 4., 5., 0., 0.]])
Asymmetric: tensor([[0., 1., 2., 3., 4., 5., 0., 0., 0.]])
Left only:  tensor([[0., 0., 1., 2., 3., 4., 5.]])
Right only: tensor([[1., 2., 3., 4., 5., 0., 0.]])


# Channels: Processing multiple signals at a time

## in_channels=2, out_channels=1

Input: Two separate sequences:

          Channel1: [a,b,c,d,e,f,g,h]
          Channel2: [i,j,k,l,m,n,o,p]

Filters:
- Channel1 filter: [w1, w2, w3]
- Channel2 filter: [w4, w5, w6]

Operation: w1*a + w2*b + w3*c + w4*i + w5*j + w6*k = output[0]

Only difference is we dont create two seperate outputs but actually sum the two to create one output.

## Multiple Output Channels: in_channels=2, out_channels=3

Each output channel is essentially running its own complete convolution across all input channels!

Total kernels needed: 2 input × 3 output = 6 separate kernels

For Output Channel 0:

```

├── Input Channel 0 kernel: [w1_00, w2_00, w3_00]
└── Input Channel 1 kernel: [w4_00, w5_00, w6_00]
```
For Output Channel 1:
```
├── Input Channel 0 kernel: [w1_01, w2_01, w3_01]  
└── Input Channel 1 kernel: [w4_01, w5_01, w6_01]
```
For Output Channel 2:
```
├── Input Channel 0 kernel: [w1_02, w2_02, w3_02]
└── Input Channel 1 kernel: [w4_02, w5_02, w6_02]
```

## Equivalence between 1-d conv with muliple channels to perfoming 2-d conv

Note: 2-d conv across a grayscale(matrix) image can be viewwed as 1-conv across mutiple row chanels and output channel =1

In [14]:
# Create a sample grayscale image (4x5)
grayscale_image = torch.tensor([
        [1.0, 2.0, 3.0, 4.0, 5.0],
        [6.0, 7.0, 8.0, 9.0, 10.0],
        [11.0, 12.0, 13.0, 14.0, 15.0],
        [16.0, 17.0, 18.0, 19.0, 20.0]
    ])

In [16]:
# Reshape for 2D conv: [batch, channels, height, width]
img_2d = grayscale_image.unsqueeze(0).unsqueeze(0)  # [1, 1, 4, 5]
print(f"2D Conv Input Shape: {img_2d.shape}")

# Create 2D convolution layer
conv2d = nn.Conv2d(in_channels=1, out_channels=1, kernel_size=3, bias=False)

# Set specific weights for demonstration
weights_2d = torch.tensor([
    [0.1, 0.2, 0.3],
    [0.4, 0.5, 0.6],
    [0.7, 0.8, 0.9]
]).unsqueeze(0).unsqueeze(0)  # [1, 1, 3, 3]

conv2d.weight.data = weights_2d
print("2D Kernel:")
print(weights_2d.squeeze())

# Apply 2D convolution
output_2d = conv2d(img_2d)
print(f"2D Conv Output Shape: {output_2d.shape}")
print("2D Conv Output:")
print(output_2d.squeeze())

2D Conv Input Shape: torch.Size([1, 1, 4, 5])
2D Kernel:
tensor([[0.1000, 0.2000, 0.3000],
        [0.4000, 0.5000, 0.6000],
        [0.7000, 0.8000, 0.9000]])
2D Conv Output Shape: torch.Size([1, 1, 2, 3])
2D Conv Output:
tensor([[41.1000, 45.6000, 50.1000],
        [63.6000, 68.1000, 72.6000]], grad_fn=<SqueezeBackward0>)


Now viewing as 1-D Conv across row number of input channels and 2 output channel

In [19]:
#Create 1D convolution: 4 input channels (rows), 2 output channels
conv1d = nn.Conv1d(in_channels=4, out_channels=2, kernel_size=3, bias=False)

# Set weights to match 2D kernel
# For 2 output channels, we need to simulate the 2D kernel sliding vertically
weights_1d = torch.tensor([
    # Output Channel 0: Uses rows 0,1,2 (top position of 3x3 kernel)
    [[0.1, 0.2, 0.3],  # Kernel for input channel 0 (row 0)
      [0.4, 0.5, 0.6],  # Kernel for input channel 1 (row 1)
      [0.7, 0.8, 0.9],  # Kernel for input channel 2 (row 2)
      [0.0, 0.0, 0.0]], # Kernel for input channel 3 (row 3) - not used

    # Output Channel 1: Uses rows 1,2,3 (bottom position of 3x3 kernel)
    [[0.0, 0.0, 0.0],  # Kernel for input channel 0 (row 0) - not used
      [0.1, 0.2, 0.3],  # Kernel for input channel 1 (row 1)
      [0.4, 0.5, 0.6],  # Kernel for input channel 2 (row 2)
      [0.7, 0.8, 0.9]]  # Kernel for input channel 3 (row 3)
])  # [2, 4, 3]

conv1d.weight.data = weights_1d
print("1D Kernels for 2 output channels:")
print("Output Channel 0 (simulates 3x3 kernel at top position):")
for i, kernel in enumerate(weights_1d[0]):
    print(f"  Input Channel {i} kernel: {kernel.tolist()}")
print("Output Channel 1 (simulates 3x3 kernel at bottom position):")
for i, kernel in enumerate(weights_1d[1]):
    print(f"  Input Channel {i} kernel: {kernel.tolist()}")

# Apply 1D convolution
output_1d = conv1d(img_1d)
print(f"1D Conv Output Shape: {output_1d.shape}")
print("1D Conv Output (2 channels):")
print(output_1d.squeeze())

1D Kernels for 2 output channels:
Output Channel 0 (simulates 3x3 kernel at top position):
  Input Channel 0 kernel: [0.10000000149011612, 0.20000000298023224, 0.30000001192092896]
  Input Channel 1 kernel: [0.4000000059604645, 0.5, 0.6000000238418579]
  Input Channel 2 kernel: [0.699999988079071, 0.800000011920929, 0.8999999761581421]
  Input Channel 3 kernel: [0.0, 0.0, 0.0]
Output Channel 1 (simulates 3x3 kernel at bottom position):
  Input Channel 0 kernel: [0.0, 0.0, 0.0]
  Input Channel 1 kernel: [0.10000000149011612, 0.20000000298023224, 0.30000001192092896]
  Input Channel 2 kernel: [0.4000000059604645, 0.5, 0.6000000238418579]
  Input Channel 3 kernel: [0.699999988079071, 0.800000011920929, 0.8999999761581421]
1D Conv Output Shape: torch.Size([1, 2, 3])
1D Conv Output (2 channels):
tensor([[41.1000, 45.6000, 50.1000],
        [63.6000, 68.1000, 72.6000]], grad_fn=<SqueezeBackward0>)


## Note: you could in theory just use 1-d conv with varying channels for images, coloured images etc but then things get very complicated and you may not end up applying things as intended. So better stick with the intended methods and refer the papers for any nuanced implmentation.

## A note on Conv Transpose

Its simple (its exactly as the name suggests)

Let $y = Wx$ where W is the convolution matrix



Then $W^T$ is the matrix which can be applied to the y

$WW^T$ is a square matrix

This is used for upscaling in encoder-decoder architecture (conv usually reduces the dimension)


Note: its not an inverse operation, remember x and y are of not same dimension so its not possible mathematically


Anyway in Cnns we dont share params across layers (they could go anywhere as per the training)

In Shape calcuation, the roles flip:

For example: with no dialation,stride

Output_size = Input_size + Kernel_size - 1
