In [1]:
import sys
sys.path.append("..")
from model import model
from model import dataset as ds
from torch.utils.data import DataLoader
import os
import torch
DEVICE = "cuda" if torch.cuda.is_available() else "cpu"
DEVICE
train_dataset = ds.ImageDataset(
    annotations_path="../datasets/annotations/",
    images_path="../../frames/",
    channels=3
)


len(train_dataset)
train_loader = DataLoader(
    train_dataset,
    batch_size=len(train_dataset.images),
    shuffle=True,
    num_workers=os.cpu_count()
    )
"""
What I will want to have in order to work with several bounding boxes per image is to have
the same image repeated for each bbox annotation. This is actually logical in some sense, as
it is a different datapoint. We are working in a supervised learning setting, so basically
we have {x_n, y_n} as datapoint with x_n the feature tensor and y_n the target tensor (I say
tensor to account for the most general structure of numbers). Then following this formulation
it makes sense that we will have different datapoints {x_n, y_n} where the same feature tensor x_n
maps to a different target tensor.

There's nothing wrong with this, we are just defining a one to many mapping which could also happen
in tabular (structured) data.

As I'm writing this though, I'm thinking another potential issue. How will I then, at prediction time
be able to predict more than one bounding box if the regressor head of the neural network has a fixed output size of 4?

A possible idea is to create "artificial" classes (i.e. also use a classifier head). The classifier head can have a certain 
number of classes, let's say 5 classes. Then what I can do at training time is to associate each example to the same class, that is, the bounding box of image 1
at the same time belongs to class 1 , 2,3,4,5 and so on for each image bounding box combination. Then at prediction time I would get effectively what I want.

#TODO Read: https://d2l.ai/chapter_computer-vision/anchor.html
"""
model = model.BoxRegressor().to(DEVICE)
bbox_loss_func = torch.nn.MSELoss()
optimizator = torch.optim.Adam(model.parameters(), lr=.01)
train_loss = []

for epoch in range(5):
    model.train()
    loss = 0
    for (images, bboxes) in train_loader:
        (images, bboxes) = (images.to(DEVICE), bboxes.to(DEVICE))
        predictions = model(images)
        bbox_loss = bbox_loss_func(predictions, bboxes)
        loss += bbox_loss
        # ...