## Fine-tuning YOLO on drone footage data for object detection

This is an example notebook showing how a simple YOLO model can be fine-tuned to obtain better performance on drone footage data.

The idea is to use transfer learning. We train on only the final 2 fully-connected layers in the YOLO architecture, leaving the convolutional weights unchanged.

#### Import libraries

- We use PyTorch for deep learning.
- NumPy, Pandas, Plotly, and Matplotlib are just generally useful.
- Einops is used for tensor operations.

In [41]:
import torch
import numpy as np
import pandas as pd
import einops
import matplotlib.pyplot as plt
import plotly.graph_objects as go
import torchvision
import kagglehub
import os

device = "cuda" if torch.cuda.is_available() else "cpu"

#### Load data

In this section we will:
- Import data from Kaggle (VisDrone dataset).
- Create a Dataset object which allows sampling of images and labels as transformed tensors.
- Create DataLoader objects for doing minibatch stochastic gradient descent.

In [50]:
# Import VisDrone dataset from Kaggle (about 2GB)
# Very annoyingly, kagglehub does not allow you to specify the download path...
# You should manually move the downloaded files to the current directory.

path = kagglehub.dataset_download("kushagrapandya/visdrone-dataset")
print(path)

C:\Users\moosa\.cache\kagglehub\datasets\kushagrapandya\visdrone-dataset\versions\1


#### Load pretrained model

Let's load the pretrained model and test it out on a sample image.

In [35]:
model = torch.hub.load("ultralytics/yolov5", "yolov5s", pretrained=True)

Using cache found in C:\Users\moosa/.cache\torch\hub\ultralytics_yolov5_master
YOLOv5  2024-10-26 Python-3.12.5 torch-2.4.1+cu118 CUDA:0 (NVIDIA GeForce RTX 3050 Laptop GPU, 4096MiB)

Fusing layers... 
YOLOv5s summary: 213 layers, 7225885 parameters, 0 gradients, 16.4 GFLOPs
Adding AutoShape... 
