intuition for object detection in an image
by enumerating a few desiderata to guide our design
of a neural network architecture suitable for computer vision:

1. In the earliest layers, our network
   should respond similarly to the same patch,
   regardless of where it appears in the image. This principle is called *translation invariance* (or *translation equivariance*).
1. The earliest layers of the network should focus on local regions,
   without regard for the contents of the image in distant regions. This is the *locality* principle.
   Eventually, these local representations can be aggregated
   to make predictions at the whole image level.
1. As we proceed, deeper layers should be able to capture longer-range features of the 
   image, in a way similar to higher level vision in nature. 

Let's see how this translates into mathematics.

In [2]:
import torch
from torch import nn

In [54]:
def corr2d(X, k):

    h, w = k.shape
    y = torch.zeros((X.shape[0] - h + 1, X.shape[1] - w + 1))

    for i in range(y.shape[0]):
        for j in range(y.shape[1]):
            y[i, j] = (k * X[i : i + h, j : j + w]).sum()

    return y

def test_corr2d():
    con = nn.Conv2d(in_channels=1, out_channels=1, kernel_size=3, stride=1, bias=False)
    weight = con.weight.data[0, 0].clone().detach()
    x = torch.rand((6, 6))
    x.requires_grad = True
    res1 = corr2d(x, weight)
    res1.backward(torch.ones_like(res1))

    x2 = x.clone().detach().requires_grad_(True)
    res2 = con(x2[None, None, :, :])
    res2.backward(torch.ones_like(res2))


    assert (corr2d(x, weight) - con(x[None, None, :, :])[0, 0] < 1e-5).sum() == 4 * 4, f"problem occour, corr2d result is {corr2d(x, weight)} and conv {con(x[None, None, :, :])[0, 0]}"
    assert (x.grad - x2.grad < 1e-5).sum() == 6 * 6, f"problem occour, x grad is {x.grad} and x2 grad {x2.grad}"

test_corr2d()

In [None]:
class Conv2D(nn.Module):

    def __init__(self, kernel_size) -> None:
        super().__init__()
        weight = nn.parameter(torch.rand(kernel_size))
        bias = nn.parameter(torch.rand(1))
    
    def forward(X):

        return corr2d(X, self.weight) + self.bias