# Exercise sheet 5

*This assignment was done by group of ID*

<YOUR_GROUP_ID> e.g. 10

<YOUR_GROUP_DAY> Tuesday or Wednesday

## 1. Coding

### a) Strided convolution (1 point)
In the lecture you learned about the convolution operation. It was also introduced as a sliding window operation. The stride, in this intuition, is the step size of the sliding window.
In this exercise, you will implement a 2d convolutional layer with the additional option to use a stride. The strided convoultion $x*k$ for a given 2d input signal $x \in \mathbb{R}^{H \times W}$, a kernel $k  \in \mathbb{R}^{2k+1 \times 2k+1}$ and a stride $s$ is defined as follows:

$$(x * k)(i, j) = \sum_{m=-k}^{k} \sum_{n=-k}^{k} x(is + m, js + n) \cdot k(m, n)$$

The following figure illustrates the strided convolution with a stride of 2:

![Convolution with stride 2](https://upload.wikimedia.org/wikipedia/commons/0/04/Convolution_arithmetic_-_Padding_strides.gif)


Your task is to extend the code given in the lecture to include the stride parameter. The function should have the following signature:

```python
def conv2d(self: Tensor, kernel: Tensor, stride = 1: int, padding = None: int) -> Tensor:
```

In [None]:
import numpy as np
from Tensor import Tensor

In [None]:
def conv2d(self: Tensor, kernel: Tensor, stride: int = 1, padding: int = None) -> Tensor:
    ## your code here ##
    pass

#### Tests

In [None]:
import imageio.v3 as imageio
url = 'https://upload.wikimedia.org/wikipedia/en/7/7d/Lenna_%28test_image%29.png'
lena = imageio.imread(url)

from PIL import Image

# make gray-scale image
lena_bw = Image.fromarray(lena).convert('L')
lena_np = np.array(lena_bw).astype(np.float32) / 255.0
lena_tensor = Tensor(data=lena_np)
smoothing_kernel = np.array([[1, 1, 1], [1, 1, 1], [1, 1, 1]]).astype(np.float32) / 9
smoothing_kernel = Tensor(data=smoothing_kernel)
padding = None

#### Functional tests
Your implementation should pass the following functional tests:

In [None]:
import torch
import torch.nn.functional as F
lena_torch = torch.tensor(lena_np)
smoothing_kernel_torch = torch.tensor(smoothing_kernel.data)

for stride in [1, 2, 4, 8]:
    output_torch = F.conv2d(lena_torch[None, None, ...], smoothing_kernel_torch[None, None, ...], stride=stride, padding=padding if padding is not None else smoothing_kernel.data.shape[-1] // 2)
    output = conv2d(lena_tensor, smoothing_kernel, stride, padding)
    assert output.data.shape == output_torch.squeeze().numpy().shape, f"Shape mismatch: {output.data.shape} != {output_torch.squeeze().numpy().shape}"
    np.testing.assert_allclose(output.data, output_torch.squeeze().numpy(), rtol=1e-5)
    print(f"Stride: {stride} - Correct")

#### Performance test
This test is just for your interest. You can use it to compare different implementations you come up with or against the pytorch implementation.

In [None]:
np.random.seed(42)
data = np.random.rand(1000, 1000).astype(np.float32)
data_tensor = Tensor(data=data)
kernel = np.random.rand(3, 3).astype(np.float32)
kernel_tensor = Tensor(data=kernel)
stride = 2
padding = None

In [None]:
## Your implementation
%timeit conv2d(data_tensor, kernel_tensor, stride, padding)

In [None]:
## PyTorch implementation
%timeit F.conv2d(torch.tensor(data[None, None, ...]), torch.tensor(kernel[None, None, ...]), stride=stride, padding=padding if padding is not None else kernel.shape[-1] // 2)

### b) 2d transposed convolution (2 points)

In this exercise, you will implement a 2d transposed convolution operation $x *^T k$ for a given 2d input signal $x \in \mathbb{R}^{H \times W}$ and a kernel $k \in \mathbb{R}^{2k + 1 \times 2k+1}$ with unit stride and no padding.

The 2d transposed convolution can be understood as a sliding window operation over the elements in the input signal $x$. The whole filter $k$ is applied to each element of the input signal $x$ by multiplying the filter with the element under consideration and adding the result to the output tensor at the corresponding position.
To implement the transposed convolution, you can initialize the output tensor with zeros and then iteratevely add the result up.

The following figure illustrates the transposed convolution in terms of an example

![Transposed convolution](https://www.researchgate.net/profile/Md-Afif-Al-Mamun-2/publication/358607641/figure/fig4/AS:1123654414934027@1644911497422/6-Transposed-convolution-operation-6-shows-transposed-convolution-of-a-2-2-input.ppm)


Your task is to implement the transposed convolution operation. The function should have the following signature:

```python
def conv_transpose2d(self: Tensor, kernel: Tensor) -> Tensor:
```

In [None]:
def conv_transpose2d(self: Tensor, kernel: Tensor) -> Tensor:
    ## your code here ##
    pass

### Tests

#### Functional tests
Your implementation should pass the following functional tests, which test the shape of the output and also the numerical values of it.

In [None]:
np.random.seed(42)

for input_size in [(3, 3), (4, 4), (5, 5), (6, 6)]:
    for kernel_size in [(1, 1), (2, 2), (3, 3), (4, 4)]:
        data = np.random.rand(*input_size).astype(np.float32)
        data_tensor = Tensor(data=data)
        kernel = np.random.rand(*kernel_size).astype(np.float32)
        kernel_tensor = Tensor(data=kernel)

        output_torch = F.conv_transpose2d(torch.tensor(data[None, None, ...]), torch.tensor(kernel[None, None, ...]))

        output = conv_transpose2d(data_tensor, kernel_tensor)
        assert output.data.shape == output_torch.squeeze().numpy().shape, f"Shape mismatch: {output.data.shape} != {output_torch.squeeze().numpy().shape}"
        np.testing.assert_allclose(output.data, output_torch.squeeze().numpy(), rtol=1e-5)
        print(f"Input size: {input_size}, Kernel size: {kernel_size} - Correct")

#### Performance test

In [None]:
np.random.seed(42)
data = np.random.rand(1000,1000).astype(np.float32)
data_tensor = Tensor(data=data)
kernel = np.random.rand(11,11).astype(np.float32)
kernel_tensor = Tensor(data=kernel)

In [None]:
%timeit conv_transpose2d(data_tensor, kernel_tensor)

In [None]:
%timeit F.conv_transpose2d(torch.tensor(data[None, None, ...]), torch.tensor(kernel[None, None, ...]))

--- 

## 2. Theory (2 point)

Derive the formula for the output shape of a convolutional layer. Assume the input shape is $H \times W$ and the convolutional layer has a kernel of dimension $F \times F$. The stride is $S$ and the padding is $P$. (i.e. come up with a formula that gives you $H'$ and $W'$ for the output shape $H' \times W'$ of the convolutional layer in terms of $H$, $W$, $F$, $S$ and $P$ and explain why it is correct).