## The goal is to prepare the data in our /videos with labels_dataframe.csv to make it ready for finetuning Faster-R CNN

we first take a look at how faster r-cnn finetune data is structered

we can go for the coco json annotation:
```dataset/
├── train/
│   ├── images/
│   │   ├── image1.jpg
│   │   ├── image2.jpg
│   └── annotations/
│       └── instances_train.json
├── val/
│   ├── images/
│   │   ├── image1.jpg
│   │   ├── image2.jpg
│   └── annotations/
│       └── instances_val.json

Sample COCO JSON annotation for one image:
{
    "images": [
        {"id": 1, "file_name": "image1.jpg", "height": 480, "width": 640}
    ],
    "annotations": [
        {"id": 1, "image_id": 1, "category_id": 1, "bbox": [100, 200, 50, 50], "area": 2500, "iscrowd": 0}
    ],
    "categories": [
        {"id": 1, "name": "object_class"}
    ]
}


or the Pascal VOC Format:
dataset/
├── train/
│   ├── images/
│   │   ├── image1.jpg
│   │   ├── image2.jpg
│   ├── annotations/
│       ├── image1.xml
│       ├── image2.xml

Sample XML annotation for one image:
<annotation>
    <folder>images</folder>
    <filename>image1.jpg</filename>
    <size>
        <width>640</width>
        <height>480</height>
    </size>
    <object>
        <name>object_class</name>
        <bndbox>
            <xmin>100</xmin>
            <ymin>200</ymin>
            <xmax>150</xmax>
            <ymax>250</ymax>
        </bndbox>
    </object>
</annotation>```


Right now the data looks like this:

In [3]:
import pandas as pd

df = pd.read_csv('data/labels_dataframe.csv')
df.head(20)

Unnamed: 0,Task ID,Task Name,Job Id,Source,Frames,Absolute Frame,Relative Frame,XTL,YTL,XBR,YBR,Code,Issue
0,138,Task1,133,1690279852.mp4,730,54,54,29.87,506.88,190.69,554.96,83/2789,
1,138,Task1,133,1690279852.mp4,730,55,55,65.26,504.87,225.5,552.95,83/2789,
2,138,Task1,133,1690279852.mp4,730,56,56,131.98,503.67,291.63,551.76,83/2789,
3,138,Task1,133,1690279852.mp4,730,57,57,198.69,502.48,357.76,550.57,83/2789,
4,138,Task1,133,1690279852.mp4,730,58,58,241.62,498.68,400.1,546.77,83/2789,
5,138,Task1,133,1690279852.mp4,730,59,59,302.7,496.68,460.59,544.77,83/2789,
6,138,Task1,133,1690279852.mp4,730,60,60,363.79,494.68,521.09,542.78,83/2789,
7,138,Task1,133,1690279852.mp4,730,61,61,424.87,492.68,581.58,540.78,83/2789,
8,138,Task1,133,1690279852.mp4,730,62,62,485.95,490.69,642.08,538.78,83/2789,
9,138,Task1,133,1690279852.mp4,730,63,63,547.03,488.69,702.57,536.78,83/2789,


The columns we need for COCO are:

images:
- ID
- File name
- height/width of the picture

annotations:
- ID
- Image_id
- category ID
- bounding box (coordinates)
- area (oppervlakte van bounding box)
- iscrowd (used to indicate whether an object is part of a "crowd" or a group of objects that cannot be easily separated)

categories:
- ID
- Category class (string)

We have all of these classes we just need to format them, except for the file_name but we can get this by loading the video and then get the frame of that video and then get the video file addressed to this

### Let's see what we need to do in order to fine tune Faster R-CNN

In [5]:
%pip install torch torchvision
import torch
from torchvision import models, transforms
from torch.utils.data import DataLoader
from torchvision.datasets import CocoDetection

# Load pre-trained Faster R-CNN model
model = models.detection.fasterrcnn_resnet50_fpn(pretrained=True)

# Modify the classifier for custom classes
num_classes = 2  # 1 class + background
in_features = model.roi_heads.box_predictor.cls_score.in_features
model.roi_heads.box_predictor = models.detection.faster_rcnn.FastRCNNPredictor(in_features, num_classes)

# Dataset and DataLoader
train_dataset = CocoDetection(root='dataset/train/images',
                              annFile='dataset/train/annotations/instances_train.json',
                              transform=transforms.ToTensor())

train_loader = DataLoader(train_dataset, batch_size=4, shuffle=True, collate_fn=lambda x: tuple(zip(*x)))

# Optimizer
optimizer = torch.optim.SGD(model.parameters(), lr=0.005, momentum=0.9, weight_decay=0.0005)

# Training Loop
device = torch.device('cuda') if torch.cuda.is_available() else torch.device('cpu')
model.to(device)

for epoch in range(10):
    model.train()
    for images, targets in train_loader:
        images = [img.to(device) for img in images]
        targets = [{k: v.to(device) for k, v in t.items()} for t in targets]

        loss_dict = model(images, targets)
        losses = sum(loss for loss in loss_dict.values())

        optimizer.zero_grad()
        losses.backward()
        optimizer.step()

    print(f"Epoch {epoch}, Loss: {losses.item()}")


Collecting torch
  Downloading torch-2.5.1-cp310-cp310-win_amd64.whl.metadata (28 kB)
Collecting torchvision
  Downloading torchvision-0.20.1-cp310-cp310-win_amd64.whl.metadata (6.2 kB)
Collecting filelock (from torch)
  Using cached filelock-3.16.1-py3-none-any.whl.metadata (2.9 kB)
Collecting networkx (from torch)
  Using cached networkx-3.4.2-py3-none-any.whl.metadata (6.3 kB)
Collecting fsspec (from torch)
  Using cached fsspec-2024.10.0-py3-none-any.whl.metadata (11 kB)
Collecting sympy==1.13.1 (from torch)
  Using cached sympy-1.13.1-py3-none-any.whl.metadata (12 kB)
Collecting mpmath<1.4,>=1.1.0 (from sympy==1.13.1->torch)
  Using cached mpmath-1.3.0-py3-none-any.whl.metadata (8.6 kB)
Collecting pillow!=8.3.*,>=5.3.0 (from torchvision)
  Downloading pillow-11.0.0-cp310-cp310-win_amd64.whl.metadata (9.3 kB)
Downloading torch-2.5.1-cp310-cp310-win_amd64.whl (203.1 MB)
   ---------------------------------------- 0.0/203.1 MB ? eta -:--:--
   ----------------------------------------

Downloading: "https://download.pytorch.org/models/fasterrcnn_resnet50_fpn_coco-258fb6c6.pth" to C:\Users\ewald/.cache\torch\hub\checkpoints\fasterrcnn_resnet50_fpn_coco-258fb6c6.pth
100%|██████████| 160M/160M [00:19<00:00, 8.64MB/s] 


ModuleNotFoundError: No module named 'pycocotools'

We already have some images on Kaggle: