# Lab 11 - Autoencoders

Dominik Gaweł

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/dg7s/Machine-Learning/blob/main/hw/Understanding_Deconvolution_in_Autoencoders.ipynb)
-------------------------------

# **Homework Assignment: Understanding Deconvolution in Autoencoders**
---------------

In class, we worked with autoencoders built from multilayer perceptrons (MLPs). However, encoders are often constructed using convolutional architectures to better capture spatial patterns. In this assignment, you'll explore how the decoder can use deconvolutional (transposed convolution) layers to reverse and mirror the operations performed by the convolutional encoder.

While convolutional encoders are relatively well understood, **decoding (or upsampling) the compressed representation** using **deconvolutional layers** (also known as **transposed convolutions**) often raises questions.

This assignment is particularly relevant because deconvolution is a core component of the U-Net architecture, a prominent neural network used extensively in image segmentation tasks.

Your main objective is to deeply understand **how transposed convolution layers work**, and explain them in both words and visuals.


## **The Objective**

Understand and clearly explain how **transposed convolutions** work. Use 2D transposed convolutions and a small grid of 2D points as a working example.

You may need to do some additional reading to complete this assignment.

## **Tasks & Deliverables**

### 1. **Theory Exploration**

Using markdown cells in your Colab notebook, answer the following:

- What is a **transposed convolution**?
- How does it differ from a regular convolution?
- How does it upsample feature maps?
- What are **stride**, **padding**, and **kernel size**, and how do they influence the result in a transposed convolution?
- To earn full two points, your explanation must be detailed enough for a reader to reproduce the upsampling process step by step.


### 2. **Manual Diagram (by your hand, not a generated image)**

Carefully plan and draw **by hand** a diagram or a set of diagrams that:

- Explain the process of using **transposed convolution**.
- Use an example of a **small input grid of 2D points** which gets expanded into a larger output grid.
- Explain how stride, padding, and the kernel shape affect the result.
- Show intermediate steps of the operation, not just input and output.

**Scan or photograph your diagram(s)**, and upload it to your **GitHub repository** for this course.

Then embed it in your Colab notebook using markdown (you can find examples on *how to do it* in previous notebooks related to this class, e.g. the one on linear regression or the one on the MLP network).


### 3. **Publish on GitHub**  
   - Place the Colab notebook in your **GitHub repository** for this course.
   - In your repository’s **README**, add a **link** to the notebook and also include an **“Open in Colab”** badge at the top of the notebook so it can be launched directly from GitHub.


## 1. Theory Exploration

- ### What is a **transposed convolution**?
  A transposed convolution reverses the spatial downsampling of a standard convolution—though it is not its exact inverse. Instead of sliding a kernel over an input to produce a smaller output, it:

  1. **Starts** with a zero‑filled output grid of the target size
  $(X_{in}-1)\times\text{stride}+\text{dilation}\times(\text{kernel_size}-1)+1$. `dilation`, if specified dilates the kernel, not the input.  
  2. **Scatters** each non‑zero input value: multiply the entire K×K kernel by that scalar and adds it into the output grid at the position determined by stride and dilation.
  3. **Accumulates** overlapping results, then adjusts the final output using `padding` and `output_padding` options.
---

- ### How does it differ from a regular convolution?

| Aspect                    | Regular Convolution      | Transposed Convolution             |
|---------------------------|--------------------------|------------------------------------|
| **Spatial size change**   | Usually **decreases**    | Usually **Increases**                      |
| **Kernel operation**            | Multiplies each kernel element with the corresponding value in an input patch and sums    | Multiplies the entire kernel by each single non‑zero input scalar and adds that block to the output         |
| **Padding effect**         | Increases output spatial size | Decreases output spatial size |
| **Typical use**           | Feature extraction       | Learned upsampling / decoding      |

---
- ### How does it upsample feature maps?

  1. **Initialize Output**  
    - Create a zero matrix: $((H_{in}-1)\times\text{stride}[0]+\text{dilation}[0]\times(\text{kernel_size}[0]-1)+1) \times ((W_{in}-1)\times\text{stride}[1]+\text{dilation}[1]\times(\text{kernel_size}[1]-1)+1)$  

  2. **Scatter Inputs**  
    - For each input element v at $(i,j)$, compute its top‑left output index $(i×S,j×S)$, multiply the $K×K$ kernel by v, and add that scaled kernel block into the zero‑initialized output grid at that position.

  3. **Overlap & Accumulation**
    - Where kernel applications overlap, their outputs are summed, resulting in the upsampled feature map.

  4. **Trimming with Padding**  
    - After convolution, remove P rows from the top and bottom and P columns from the left and right of the result.

---

- ### What are **stride**, **padding**, and **kernel size**, and how do they influence the result in a transposed convolution?

  - **Kernel size (K)**  
    - Size of the square filter.  
    - Larger kernels increase overlap between expanded activations, producing smoother interpolation.

  - **Stride (S)**  
    - Determines both how far the kernel moves and how many zeros are inserted.  
    - A larger S inserts more zeros, yielding a larger output with more widely spaced contributions.

  - **Padding (P)**  
    - Specifies how many rows and columns to trim from the convolved result’s borders.  
    - Higher P trims more, reducing output dimensions and smoothing edge transitions.


## 2. Manual Diagram

https://docs.pytorch.org/docs/stable/generated/torch.nn.ConvTranspose2d.html

In [16]:
import torch
import torch.nn.functional as F

In [27]:
inp  = torch.Tensor([[[[3, 2, 1],
                      [0, 5, 2],
                      [1, 4, 7]]]])

kernel = torch.Tensor([[[[1, 2],
                        [3, 4]]]])
print(inp)

tensor([[[[3., 2., 1.],
          [0., 5., 2.],
          [1., 4., 7.]]]])


In [28]:
padding = 1
stride = 2
out = F.conv_transpose2d(inp, kernel, stride=stride, padding=padding)
print(out)

tensor([[[[12.,  6.,  8.,  3.],
          [ 0.,  5., 10.,  2.],
          [ 0., 15., 20.,  6.],
          [ 2.,  4.,  8.,  7.]]]])


![img](https://raw.githubusercontent.com/dg7s/Machine-Learning/refs/heads/main/hw/conv_transpose2d.png)