In [5]:
import torch
from torchinfo import summary

## What we do in this TP?

In this TP, we will delve into the process of fine-tuning a pre-trained model for a specific task. Fine-tuning is a transfer learning method where the weights of an existing pre-trained model serve as the foundation for a new model. This approach is especially beneficial when the new task shares similarities with the original task for which the model was trained. By doing so, we can utilize the learned features of the pre-trained model, thereby reducing the need for extensive data and computational resources. 

We will tackle an object detection problem using the VOC dataset, which consists of approximately 5000 images for both training and validation. The images have been preprocessed to have (224x224) and contain only one bounding box around the object of interest. We will employ a Resnet18 model pre-trained on the ImageNet dataset as a feature extractor, and develop a model to detect the label and location of the object within the image. 



In [7]:
# Load data
train = torch.load('./data/train_data.pt')
validation = torch.load('./data/validation_data.pt')
test = torch.load('./data/test_data.pt')

In [8]:
from utils import visualize_image

# Visualize a sample image
idx = 4055 # change this to visualize a different image
image, bbox, label = train[idx]
visualize_image(image, bbox, label)

SyntaxError: invalid syntax (utils.py, line 74)

Question: How is the data structured? What do the labels represent? What dimensions does an image have in the dataset?

In [14]:
mod = 29
n = 1
for i in range (29):
    print(i, n)
    n = (n * 2) % 29

0 1
1 2
2 4
3 8
4 16
5 3
6 6
7 12
8 24
9 19
10 9
11 18
12 7
13 14
14 28
15 27
16 25
17 21
18 13
19 26
20 23
21 17
22 5
23 10
24 20
25 11
26 22
27 15
28 1


In [17]:
(17 * 3 * 3) % 29

8

In [None]:
# Normalize the data
from utils import normalize_data
from torch.utils.data import DataLoader

train_dataset = normalize_data(train)
validation_dataset = normalize_data(validation)

trainloader = #TODO
validationloader = #TODO

In object detection tasks, we need to develop both a classifier and a regressor. The classifier's role is to identify the object's label, while the regressor's role is to estimate the object's coordinates.

Our classifier and regressor are built as follows:

`Classifier`: It consists of two blocks, each containing a sequence of a Linear layer, a ReLU activation function, and a Dropout layer. This is followed by a final fully connected layer.

`Regressor`: It consists of two blocks of Linear layer and ReLU activation function, with the final block being a Linear layer and Sigmoid Activation function.


You are tasked to complete the model construction `ResNetObjectDetector` in the `model.py` file. Please note that the output from `self.features` has a shape of `(batch_size, 512)` and the number of classes is `20`. 

In [None]:
from model import ResnetObjectDetector
model = ResnetObjectDetector(nb_classes=20)
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
# Print model summary
summary(model, input_size=(32, 3, 224, 224))

After completing the model definition, you can now write a train_loop and validation loop to train the model. Complete the function `train_loop` and `validation_loop` in `utils.py`.

[OPTIONAL] We will see how to use TensorBoard for monitoring the training process. If you are using VSCode, click on Launch TensorBoard Session, it will then install a tensorboard extension.

In [None]:
from utils import train_loop
from torch.utils.tensorboard import SummaryWriter
from datetime import datetime
# Train the model
now = datetime.now()
optimizer = #TODO Define optimizer
epoch = #TODO Define number of epochs
writer =  SummaryWriter(f'runs/tp6-object-recognition/{now.strftime("%Y-%m-%d_%H-%M-%S")}/')
losses, val_losses, acc, val_acc = #TODO

writer.flush() # Write to disk
writer.close() # Close the writer

After the training process, you can generate plots for both the loss and accuracy curves (assuming you're not using TensorBoard). What conclusions can you draw from these visualizations?

By executing the cell below, you can examine the predictions made by the model. Do you see any potential for improving the model, and if so, how would you go about it?

In [None]:
from utils import predict

# Normalize the test data
test_dataset = normalize_data(test)
# Predict on test data
for i in range(10,20):
    img = test_dataset[i][0]
    model.to('cpu')
    predict(model,image, show=True)