# **Homework Assignment: Understanding Deconvolution in Autoencoders**
---------------

In class, we worked with autoencoders built from multilayer perceptrons (MLPs). However, encoders are often constructed using convolutional architectures to better capture spatial patterns. In this assignment, you'll explore how the decoder can use deconvolutional (transposed convolution) layers to reverse and mirror the operations performed by the convolutional encoder.

While convolutional encoders are relatively well understood, **decoding (or upsampling) the compressed representation** using **deconvolutional layers** (also known as **transposed convolutions**) often raises questions.

This assignment is particularly relevant because deconvolution is a core component of the U-Net architecture, a prominent neural network used extensively in image segmentation tasks.

Your main objective is to deeply understand **how transposed convolution layers work**, and explain them in both words and visuals.


## **The Objective**

Understand and clearly explain how **transposed convolutions** work. Use 2D transposed convolutions and a small grid of 2D points as a working example.

You may need to do some additional reading to complete this assignment.

## **Tasks & Deliverables**

### 1. **Theory Exploration**

Using markdown cells in your Colab notebook, answer the following:

- What is a **transposed convolution**?
- How does it differ from a regular convolution?
- How does it upsample feature maps?
- What are **stride**, **padding**, and **kernel size**, and how do they influence the result in a transposed convolution?
- To earn full two points, your explanation must be detailed enough for a reader to reproduce the upsampling process step by step.


### 2. **Manual Diagram (by your hand, not a generated image)**

Carefully plan and draw **by hand** a diagram or a set of diagrams that:

- Explain the process of using **transposed convolution**.
- Use an example of a **small input grid of 2D points** which gets expanded into a larger output grid.
- Explain how stride, padding, and the kernel shape affect the result.
- Show intermediate steps of the operation, not just input and output.

**Scan or photograph your diagram(s)**, and upload it to your **GitHub repository** for this course.

Then embed it in your Colab notebook using markdown (you can find examples on *how to do it* in previous notebooks related to this class, e.g. the one on linear regression or the one on the MLP network).


### 3. **Publish on GitHub**  
   - Place the Colab notebook in your **GitHub repository** for this course.
   - In your repository’s **README**, add a **link** to the notebook and also include an **“Open in Colab”** badge at the top of the notebook so it can be launched directly from GitHub.


# THEORY EXPLORATION ANSWER

- **What is a transposed convolution?**
    A transposed convolution (also called deconvolution or fractionally strided convolution) is a technique used to upsample a feature map, i.e., to increase its spatial resolution.
    
    It does so by reversing the process of a standard convolution: instead of sliding a filter over the input and collapsing regions into a single value, we distribute each input value across a larger region in the output — effectively "spreading out" the information.




- **How does it differ from a regular convolution?**
    In a regular convolution:
    A large input is compressed into a smaller output using a sliding filter (kernel), by computing dot products over local regions.
    This operation is typically many-to-one: multiple input values contribute to one output.

    In a transposed convolution:
    The process is reversed: a smaller input is expanded into a larger output.
    It’s a one-to-many operation: each input value contributes to a larger patch in the output, depending on the kernel.

- **How does it upsample feature maps?**
    Each element of the input tensor is multiplied (element-wise) with the kernel, and the result is placed at a specific location in the output tensor, with appropriate spacing (controlled by stride) and overlap (determined by kernel size and padding).

    If multiple input elements contribute to the same location in the output, their values are summed (just like in normal convolution).
    This produces a larger output tensor, thus achieving upsampling.  

- What are **stride**, **padding**, and **kernel size**, and how do they influence the result in a transposed convolution?

**Padding**:
In regular convolution, padding adds zeros around the input to control output size.
In transposed convolution, padding refers to cropping the output or controlling how overlapping regions are summed.
Proper padding ensures the output has desired dimensions.

Example of padding applied to a 2×2 input:
$$
\begin{array}{ccc}
\begin{pmatrix}
1 & 2 \\
3 & 4
\end{pmatrix}
&
\xrightarrow{\text{padded}}
&
\begin{pmatrix}
0 & 0 & 0 & 0\\
0 & 1 & 2 & 0\\
0 & 3 & 4 & 0\\
0 & 0 & 0 & 0
\end{pmatrix}
\end{array}
$$


**Kernel size**:
The dimensions of the filter (e.g., 3×3).
A larger kernel allows each input value to influence a wider region in the output.
Typically, odd sizes (like 3, 5) are used to preserve symmetry around the center. There are two example kernels (3x3). First for sharpnening and second for extracting edges.
  
$$
\begin{array}{cc}
\text{Sharpening kernel}
\begin{pmatrix}
 0 & -1 & 0 \\
-1 & 5 & -1 \\
 0 & -1 & 0
\end{pmatrix}
&
\text{Prewitt kernel}
\begin{pmatrix}
-1 & 0 & 1 \\
-1 & 0 & 1 \\
-1 & 0 & -1
\end{pmatrix}
\end{array}
$$

**Stride**:
The number of steps the kernel is moved during the operation.
In a transposed convolution, a larger stride means more spacing between the contributions of each input value, thus resulting in a larger output. Kernel starts at top-left corner of the output image. Each value of kernel is multiplied elementwise with top-left value of input image. The results are stored in output. Then move kernel by `stride` to the right. If there is no space to move, then move kernel back to the left side of output image and move it down by `stride`. Repeat this process until you fill the output with values. If numbers from different multiplications overlap - just add them.