# Day 2: Architecture Deep Dive

**Goals:**
- Thoroughly understand your existing architecture
- Verify shapes, gradients, and parameter counts
- Implement skip connections and residual blocks (if missing)

**Time:** 6 hours

**Approach:** Instructions only. Write all code yourself.

---

## Setup

Import PyTorch and its nn module. Also import numpy and matplotlib. Set up your device (CUDA if available, else CPU) and print which device you're using.

In [None]:
# Your imports and device setup


Now import your model modules from `src/models/`. Try to import your autoencoder, encoder, decoder, and blocks modules.

In [None]:
# Import your model modules


---

# Part 1: Theory Questions

---

## Q2.1: Parameter Counting

Consider a convolutional layer: `Conv2d(in_channels=64, out_channels=128, kernel_size=3, bias=True)`

**a)** Calculate the exact number of parameters. Show your formula and arithmetic.

**b)** If the input spatial size is 64×64, how many multiply-accumulate operations does this layer perform? (Each output pixel requires kernel_size² × in_channels multiplications, summed across all output channels and spatial positions.)

**c)** If you add `groups=2`, how does this change the parameter count? What does grouped convolution do?

### Your Answer:

**a)** Formula: 

Calculation:

**b)**

**c)**


## Q2.2: Shape Propagation

Trace the tensor shape through this sequence of layers. Start with input shape (batch=8, channels=1, height=256, width=256).

```python
Conv2d(1, 64, kernel_size=7, stride=2, padding=3)
Conv2d(64, 128, kernel_size=3, stride=2, padding=1)
Conv2d(128, 256, kernel_size=3, stride=2, padding=1)
Conv2d(256, 64, kernel_size=3, stride=2, padding=1)
```

Use the output size formula: `out = floor((in + 2×padding - kernel_size) / stride) + 1`

### Your Answer:

After layer 1: (8, ?, ?, ?)

After layer 2: 

After layer 3: 

After layer 4: 


## Q2.3: Receptive Field

Your encoder uses 4 convolutional layers, each with kernel_size=4, stride=2, padding=1.

**a)** Calculate the receptive field after each layer. The formula for layer n is:
```
RF_n = RF_{n-1} + (kernel_size - 1) × stride_product_{n-1}
```
where stride_product is the product of all previous strides.

**b)** Your 256×256 input maps to a 16×16 latent space. Each latent "pixel" should ideally "see" a 16×16 region of the input. Is your receptive field sufficient?

**c)** How could you increase the receptive field without adding more layers?

### Your Answer:

**a)** 

Layer 0 (input): RF = 1

Layer 1: RF = 

Layer 2: RF = 

Layer 3: RF = 

Layer 4: RF = 

**b)**

**c)**


## Q2.4: Skip Connections

**a)** What information do skip connections preserve that would otherwise be lost passing through the bottleneck?

**b)** For an autoencoder used for *compression* (not denoising), are skip connections helpful or harmful? Why?

**c)** If you have skip connections, how could you quantify how much information is "bypassing" the bottleneck versus going through it?

### Your Answer:

**a)**

**b)**

**c)**


## Q2.5: Activation Functions

**a)** Why use LeakyReLU(0.2) instead of ReLU in autoencoders? What problem does it address?

**b)** What is the gradient of LeakyReLU(0.2) for x < 0? Compare to ReLU.

**c)** Why use Sigmoid at the output layer when your target is normalized to [0, 1]? What would happen if you used no activation?

### Your Answer:

**a)**

**b)**

**c)**


---

# Part 2: Analyze Your Architecture (1.5 hours)

---

## Exercise 2.1: Instantiate and Count Parameters

**Your task:**

1. Instantiate your autoencoder model (check your `src/models/autoencoder.py` for the class name and required arguments).

2. Move it to your device.

3. Count total parameters using: `sum(p.numel() for p in model.parameters())`

4. Count trainable parameters by adding `if p.requires_grad` to the above.

5. Print both counts formatted with commas for readability.

In [None]:
# Instantiate model and count parameters


## Exercise 2.2: Layer-by-Layer Breakdown

**Your task:**

1. Iterate through `model.named_modules()` to get each layer.

2. For each Conv2d and ConvTranspose2d layer, print:
   - The layer name
   - The parameter count for that layer
   - Input/output channels and kernel size (access via `module.in_channels`, etc.)

3. Identify which layers have the most parameters.

In [None]:
# Print layer-by-layer breakdown


## Exercise 2.3: Shape Verification

**Your task:**

1. Create a random input tensor with shape (4, 1, 256, 256) on your device.

2. Pass it through your model to get the output.

3. Print both input and output shapes.

4. Assert that they match exactly. If they don't, investigate why.

5. Check the output range - if your model uses Sigmoid, values should be in [0, 1].

In [None]:
# Shape verification


## Exercise 2.4: Gradient Flow Test

This is crucial - if gradients don't flow properly, your model won't train.

**Your task:**

1. Put the model in training mode with `model.train()`.

2. Create an input tensor with `requires_grad=True`.

3. Forward pass through the model.

4. Compute a simple loss (e.g., sum of output).

5. Call `loss.backward()`.

6. Check every parameter with `model.named_parameters()`:
   - Is `param.grad` None? (Bad - no gradient reached this parameter)
   - Is `param.grad.abs().max()` very small (< 1e-10)? (Concerning - vanishing gradient)
   - Does `param.grad` contain NaN? (Bad - numerical instability)

7. Print any problematic layers.

In [None]:
# Gradient flow test


## Exercise 2.5: Trace Shapes Through Network

**Your task:**

This is a debugging exercise. You'll manually trace a tensor through your encoder to see shapes at each stage.

1. Create an input tensor (1, 1, 256, 256).

2. Access your encoder (might be `model.encoder` or similar).

3. Manually pass the tensor through each layer/block in the encoder, printing the shape after each one.

4. Do the same for the decoder, starting from the latent.

This helps you understand exactly what each layer does to the tensor dimensions.

In [None]:
# Trace shapes through encoder


In [None]:
# Trace shapes through decoder


---

# Part 3: Implement/Verify Skip Connections (1.5 hours)

---

## Exercise 2.6: Check for Existing Skip Connections

**Your task:**

1. Open your `src/models/encoder.py` and `src/models/decoder.py`.

2. Answer these questions by reading the code:
   - Does your encoder's `forward()` method return just the latent, or does it also return intermediate features?
   - Does your decoder's `forward()` method accept skip connections as an argument?
   - If skip connections exist, how are they combined? (concatenation? addition?)

Document what you find below.

### Your Findings:

**Encoder output:**

**Decoder input:**

**Skip connection method (if any):**


## Exercise 2.7: Implement Encoder with Skip Outputs

If your encoder doesn't already return skip connections, implement a version that does.

**Your task:**

Create a class `EncoderWithSkips` that:

1. Has the same layer structure as your current encoder.

2. In `forward()`, stores the output of each downsampling stage before applying the next one.

3. Returns a tuple: `(latent, [skip1, skip2, skip3, ...])`

The skip connections should be in order from highest resolution to lowest (but not including the latent itself).

Test it by passing a tensor through and printing all output shapes.

In [None]:
# Implement EncoderWithSkips


In [None]:
# Test it


## Exercise 2.8: Implement Decoder with Skip Inputs

**Your task:**

Create a class `DecoderWithSkips` that:

1. Accepts `forward(latent, skips)` where skips is the list from your encoder.

2. At each upsampling stage, concatenates the upsampled features with the corresponding skip connection along the channel dimension using `torch.cat([upsampled, skip], dim=1)`.

3. Adjusts the input channels of each layer to account for the concatenated skip features.

**Important:** The skip from encoder stage n (counting from input) should connect to decoder stage n (counting from output). Make sure resolutions match!

Test by passing the encoder outputs through and verifying the final output shape matches the original input.

In [None]:
# Implement DecoderWithSkips


In [None]:
# Test the full encoder-decoder with skips


---

# Part 4: Implement/Verify Residual Blocks (1 hour)

---

## Exercise 2.9: Check for Existing Residual Blocks

**Your task:**

Check your `src/models/blocks.py` for a ResidualBlock or similar class.

If it exists, note:
- What layers are in the residual path?
- Is batch normalization used?
- What activation function is used?
- Is the skip connection a simple addition, or does it have a projection?

### Your Findings:



## Exercise 2.10: Implement a Residual Block

If you don't have one, or want to understand how they work, implement a basic residual block.

**Your task:**

Create a class `ResidualBlock(nn.Module)` that:

1. Takes `channels` as a constructor argument (input channels = output channels for a basic block).

2. Contains two conv layers with same padding (so spatial dimensions don't change).

3. Uses BatchNorm2d after each convolution.

4. Uses LeakyReLU(0.2) as activation.

5. In `forward()`, computes `output = activation(x + F(x))` where F is the conv-bn-activation-conv-bn path.

Test that input and output shapes match, and that gradients flow through.

In [None]:
# Implement ResidualBlock


In [None]:
# Test it


## Exercise 2.11: Compare Parameters

**Your task:**

1. Count parameters in a single ResidualBlock with 64 channels.

2. If you add one ResidualBlock after each encoder/decoder stage (8 total), how many parameters does this add?

3. What percentage increase is this compared to your base model?

In [None]:
# Parameter comparison


---

# Day 2 Checklist

- [ ] Answered all theory questions (Q2.1 - Q2.5)
- [ ] Counted total and trainable parameters
- [ ] Generated layer-by-layer parameter breakdown
- [ ] Verified input/output shapes match
- [ ] Verified gradient flow to all parameters
- [ ] Traced shapes through encoder and decoder manually
- [ ] Documented existing skip connection implementation (or lack thereof)
- [ ] Implemented/verified EncoderWithSkips
- [ ] Implemented/verified DecoderWithSkips
- [ ] Implemented/verified ResidualBlock
- [ ] Compared parameter counts with/without additions

---

## Architecture Summary

*Fill this in based on your analysis:*

**Total parameters:**

**Input shape:** (B, 1, 256, 256)

**Latent shape:**

**Compression ratio:**

**Has skip connections:** Yes / No

**Has residual blocks:** Yes / No

**Output activation:** Sigmoid / None / Other

---

## Notes and Issues

1. 

2. 

3. 