# AT82.08 Computer Vision Midterm Exam

Date & Time: Nov 29, 2025. From 0900 - 1200

Exam Duration: 3 Hours

Total Score: 100 Points = TQ(50 points) + IQ (50 points)

## How to submit
1. Zip the downloaded folder, and it should contain
- jupyter notebook
- any images you added to the notebook should be in `assets/`.

*If the zip file is larger than 8 MB, you can split them into two zip files, one contains jupyter notebook, and other contains assets.*

2. Submit the zip on `TEAL`

# Theoretical Questions

## TQ1-[5 Points] : Stochastic Gradient Descent

Given the stochastic Gradient Descent

> **Algorithm**
> 1. Initialize weights randomly $\mathcal{N}(0,\sigma^2)$
> 2. Loop until convergence:
>   * Pick single data point *i*
>   * Compute gradient $\frac{\partial J_i(W)}{\partial W}$
>   * Update weights $W \larr W - \eta\frac{\partial J(W)}{\partial W}$ 
> 3. Return weights

**What are advantages and disadvantages of this method? How can we address the drawback of this method?**


Your answer here

## TQ2-[10 Points] : What are the purpose and properties of activation function?

Your answer here

## TQ3-[5 Points] : What is the effective receptive field size of each neuron in layer `L=5` with `3x3` kernel size in each layer ?

*Please show your solution, not just the answer*

Your answer here

## TQ4-[10 Points] : We've explored the evolution of the `R-CNN family`, encompassing `R-CNN`, `Fast R-CNN`, and `Faster R-CNN`. Analyze the key differences between these models, identifying their respective limitations and the specific challenges they were developed to overcome.

Your answer here

## TQ5-[10 Points] : What is Object Tracking? Why do we need tracking? What are elements of tracking? Briefly explain each element.

Your answer here

## TQ6-[10 Points] : What are 3D Representations that we have discussed in this course? Briefly explain each representations.

Your answer here

# Implementation Questions

## IQ1-[15 Points] : Mean Shift
As previously discussed, Mean Shift segmentation leverages Euclidean distance to cluster pixels in a feature space defined by pixel attributes. While we have utilized `RGB` color information as features, incorporating additional spatial information may enhance segmentation performance. By augmenting the feature space to include pixel coordinates `(X,Y)`, we can apply Mean Shift to this expanded representation.


#### Implement Mean Shift segmentation using the `RGBXY` space of the given image (`assets/labradors.jpg`).
- How does the performance compare to using only the RGB color space?
- Show the center of clusters, how many clusters are there?

In [None]:
# Your code here

## IQ2: GAN
In lecture 12, we learnt and implemented GAN in which the generator and discriminator were constructed with only linear layers, and trained using MNIST dataset. We observed that the quality of genereted images was not good, and needed further improvement.

[DCGAN](https://arxiv.org/pdf/1511.06434.pdf) is an extension of the GAN, in which it explicitly uses `convolutional` and `convTransposed` layers in the discriminator and generator, respectively. In other words, DCGAN replaces linear layers in GAN with `conv.` layers in the discriminator, and `convTranspose` layers in the generator.

### IQ2.1-[25 Points] : Implement DCGAN

Use the following guideline to `Implement DCGAN model`:
- `4 conv. layers` in the `discrimimator`.
- `4 convTranspose layers` in the `generator`.
- latent vector `z = 100` and being sample from a `normal` distribution
- Use `batchnorm` in both the generator and the discriminator.
- Use `ReLU` as activation function in generator for all layers except for the output, which uses `Tanh`.
- Use `LeakyReLU` as activation function in the discriminator for all layers.


Train the model
- `MNIST` dataset
- `25` epochs
- batch size of `128`

**`Report the following`**
- Plot both generator and discriminator `losses`.
- Show the visualization of the generated images of `every 5 epochs`

In [None]:
# Your Code Here

#### Config

In [None]:
device = 'cuda' if torch.cuda.is_available() else 'cpu'
print(device)

latent_z = 100

# Hyperparameters
glr = 2e-4
dlr= 2e-4

NUM_EPOCHS = 25
BATCH_SIZE = 128

logging_interval = 50

In [None]:
# Your Code Here

#### Training Loop

In [None]:
log_dict = {'train_generator_loss_per_batch': [],
              'train_discriminator_loss_per_batch': [],
              'train_discriminator_real_acc_per_batch': [],
              'train_discriminator_fake_acc_per_batch': [],
              'images_from_noise_per_epoch': []}

# Batch of latent (noise) vectors for
# evaluating / visualizing the training progress
# of the generator
# -----Your code here ------
fixed_z = # Your code here
# --------------------------

start_time = time.time()
for epoch in range(NUM_EPOCHS):

  gen.train()
  dis.train()
  for batch_idx, (features, _) in enumerate(train_loader):

    batch_size = features.size(0)

    # real images
    real_images = features.to(device)
    real_labels = torch.ones(batch_size, device=device) # real label = 1

    # generated (fake) images

    # -----Your code here ------
    z = # Your code here
    # --------------------------

    fake_images = gen(z)
    fake_labels = torch.zeros(batch_size, device=device) # fake label = 0
    flipped_fake_labels = real_labels # here, fake label = 1
    flipped_fake_labels.to(device)


    # --------------------------
    # Train Discriminator
    # --------------------------

    optimizer_D.zero_grad()

    # get discriminator loss on real images
    discr_pred_real = dis(real_images).view(-1) # Nx1 -> N
    real_loss = adversarial_loss(discr_pred_real, real_labels)

    # get discriminator loss on fake images
    discr_pred_fake = dis(fake_images.detach()).view(-1)
    fake_loss = adversarial_loss(discr_pred_fake, fake_labels)

    # combined loss
    discr_loss = 0.5*(real_loss + fake_loss)

    discr_loss.backward()
    optimizer_D.step()

    # --------------------------
    # Train Generator
    # --------------------------

    optimizer_G.zero_grad()

    # get discriminator loss on fake images with flipped labels
    discr_pred_fake = dis(fake_images).view(-1)
    gener_loss = adversarial_loss(discr_pred_fake, flipped_fake_labels)
    gener_loss.backward()

    optimizer_G.step()

    # --------------------------
    # Logging
    # --------------------------
    log_dict['train_generator_loss_per_batch'].append(gener_loss.item())
    log_dict['train_discriminator_loss_per_batch'].append(discr_loss.item())

    predicted_labels_real = torch.where(discr_pred_real.detach() > 0., 1., 0.)
    predicted_labels_fake = torch.where(discr_pred_fake.detach() > 0., 1., 0.)

    acc_real = (predicted_labels_real == real_labels).float().mean()*100.
    acc_fake = (predicted_labels_fake == fake_labels).float().mean()*100.

    log_dict['train_discriminator_real_acc_per_batch'].append(acc_real.item())
    log_dict['train_discriminator_fake_acc_per_batch'].append(acc_fake.item())

    if not batch_idx % logging_interval:
      print('Epoch: %03d/%03d | Batch %03d/%03d | Gen/Dis Loss: %.4f/%.4f'
              % (epoch+1, NUM_EPOCHS, batch_idx,
                len(train_loader), gener_loss.item(), discr_loss.item()))

  ### Save images for evaluation
  with torch.no_grad():
    fake_images = gen(fixed_z).detach().cpu()
    log_dict['images_from_noise_per_epoch'].append(make_grid(fake_images,
                                                              padding=2,
                                                              normalize=True))
  print('Time elapsed: %.2f min' % ((time.time() - start_time)/60))

print('Total Training Time: %.2f min' % ((time.time() - start_time)/60))

In [None]:
# Your code here

#### Visualize the results

In [None]:
##########################
### VISUALIZATION
##########################

for i in range(0, NUM_EPOCHS, 5):
  plt.figure(figsize=(8, 8))
  plt.axis('off')
  plt.title(f'Generated images at epoch {i}')
  plt.imshow(np.transpose(log_dict['images_from_noise_per_epoch'][i], (1, 2, 0)))
  plt.show()


plt.figure(figsize=(8, 8))
plt.axis('off')
plt.title(f'Generated images after last epoch')
plt.imshow(np.transpose(log_dict['images_from_noise_per_epoch'][-1], (1, 2, 0)))
plt.show()

### IQ2.2-[10 Points]: Qualitative Comparison


Recall from our GAN lab session, we implemented and trained GAN on MNIST dataset.

Conduct a comparative analysis of the image outputs generated by `GAN` and `DCGAN` models at the `25th` epoch. Determine which model produces better quality images and discuss the underlying reasons for its superior performance.

Your answer here