#  Object Detection with YOLO

*HFT Stuttgart, 2025 Summer Term, Michael Mommert (michael.mommert@hft-stuttgart.de)*

Object Detection is able to classify instances of objects in an image and to approximately locate these instances. In this Notebook, we use the YOLO model to perform object detection on cars from aerial imagery of the city of Stuttgart. For a streamlined implementation, we use the [ultralytics YOLO framework](https://docs.ultralytics.com/) and fine-tune a pretrained model on our dataset. If you want some more resources on using this framework for training on a custom dataset, you can start [here](https://docs.ultralytics.com/yolov5/tutorials/train_custom_data/).

In [None]:
%pip install numpy \
    matplotlib \
    pandas \
    scikit-image \
    ultralytics

In [None]:
import os
import zipfile
import numpy as np
import pandas as pd
from skimage import io
import matplotlib.pyplot as plt
from matplotlib.patches import Rectangle

from ultralytics import YOLO

## Dataset handling

We will use the "Cars in Stuttgart" dataset in this Notebook. Let's download and unpack the dataset.

In [None]:
# download dataset
!curl -O https://zenodo.org/records/15019408/files/cars_in_stuttgart.zip

# extract dataset zipfile
with zipfile.ZipFile('cars_in_stuttgart.zip', 'r') as zip_ref:
    zip_ref.extractall('./')

# rename dataset directory
os.rename('cars_in_stuttgart/', 'data/')

Let's read in the training image filenames and display one example image:

In [None]:
train_filenames = []
for f in sorted(os.listdir('data/train')):
    if f.endswith('.png'):  # consider only files ending in .png
        train_filenames.append(os.path.join('data/train', f))

img = io.imread(train_filenames[2])
plt.imshow(img)

For each image file, there is a corresonding label file. For this specific the label file looks like this:

In [None]:
with open(train_filenames[42].replace('.png', '.txt'), 'r') as f:
    for line in f.readlines():
        print(line, end='')

This file uses the YOLO convention to identify **bounding boxes**. Each line corresponds to a different bounding box and the columns have the following meanings:

1. Class label: In this case, there's only a single class, so all bounding boxes use label 0. If there were different classes, there would be different ids.
2. X center of the bounding box
3. Y center of the bounding box
4. Width of the bounding box
5. Height of the bounding box

Parameter 2-5 are given in normalized pixel coordinates, not pixels. For instance, a coordinate 0.8 corresponds to pixel 80 in a 100 pixel image or pixel 40 in a 50 pixel image.

Let's read in the bounding box information and plot the boxes on the image. Feel free to check out other image indices as well.

In [None]:
i = 2 # image index

# read in bounding boxes
bbs = pd.read_csv(train_filenames[i].replace('.png', '.txt'), sep=' ',
                  header=None, names=['class', 'x', 'y', 'w', 'h'])

# plot image
img = io.imread(train_filenames[i])
f, ax = plt.subplots(1, 1)
ax.imshow(img)

# add bounding boxes
h, w, c = img.shape
for i, bb in bbs.iterrows():
    # draw a rectangle for each bounding box; rectangle expects top left corner coordinates, width and heigth
    ax.add_patch(Rectangle(((bb.x-bb.w/2)*w, (bb.y-bb.h/2)*h),
                           bb.w*w, bb.h*h,
                 edgecolor='red', facecolor='none'))

## Training Process

The training process is extremely convenient using the YOLO framework. To setup the training process, we have to create a **dataset configuration file** that contains information on the following things:
* the path to the dataset root directory
* the names of the different dataset splits (those should be directories under the root path)
* the class labels (remember that we only have a single class in this example)

We write this file using some cell magic. Note that you might have to adjust the value of `path:` to be in accordance with your environment.

In [None]:
%%writefile dataset.yaml

# Train/val/test sets
path: ../data # dataset root dir (Colab: use /content/data, bwJupyter: ../data)
train: train # train images (relative to 'path')
val: val # val images (relative to 'path')

# define classes
names:
    0: car

Now we have to **choose an appropriate model**. The YOLO architecture comes in different [sizes](https://github.com/ultralytics/ultralytics). The term size relates here to the number of learnable parameters that this model has. Typically, the more parameters, the more powerful the fully trained model.

We will use the smallest available model, `yolo11n`, with 2.6 million parameters and load a pretrained version of this model.

In [None]:
model = YOLO("yolo11n.pt")

Now we can stat the actual **model training or fine-tuning process**. Naturally, this process is complex and comes with a lot of parameters that can be set. We will set some of these parameters below and use the default parameters for the rest of them. For a full discussion of all available parameters, please visit the [ultralytics YOLO documentation](https://docs.ultralytics.com/usage/cfg/).

In [None]:
results = model.train(data='dataset.yaml', # define the dataset configuration file
                      task='detect', # set the task as object detection
                      epochs=15, # set the number of training epochs
                      seed=42, # set a fixed seed value for reproducibility
                      imgsz=128, # provide the size of the images
                      batch=16, # batch size 16 images
                      optimizer='Adam', # use the Adam optimizer
                      lr0=0.001, # learning rate at epoch 0
                      plots=True, # plot analytics during training
                      )

The training process generates a lot of output, which includes all the parameters and seetings (including the default ones we did not touch), data disgnostics (e.g., corrupt bounding boxes) and, of course, information on the training progress.

In addition, YOLO creates a lot of files in a new directory called `runs/`. Each training run gets a new directory that are labeled accordingly: `train`, `train1`, `train2`... Here you can find diagnostic plots and logfiles, example predictions and the trained model checkpoints. You might have to adjust the path in the following code cell to pick the right training run.

You will also find a plot named `results.png` that summarizes the training progress:

In [None]:
f, ax = plt.subplots(1, 1, figsize=(12,6))
ax.imshow(io.imread("runs/detect/train/results.png"))
plt.axis('off')

There's quite a number of plots here. Let's see what we have...

* The **box loss** describes the value of the loss function for localizing bounding boxes. The plot shows the box loss separately for the train and val datasets.
* The **cls loss** describes the value of the loss function for identifying the correct classes of the bounding boxes we found. There is a separate cls loss for the train and the val loss.
* The **dfl loss** is the distributed focal loss, which focuses on detecting difficult-to-find objects. Again, there is one for the train dataset and one for the val dataset.
* We also have the precision metric,
* the recall metric,
* the mean Average Precision 50 (mAP50) and 50-95 (mAP50-95) metrics.

All plots indicate a good learning progress. Let's perform an evaluation on the test dataset.

## Evaluation

To evaluate our model on the test dataset, we simply create a new dataset configuration file, but provide the test dataset as our validation dataset (you might have to adjust `path:` again):

In [None]:
%%writefile dataset_test.yaml

# Train/val/test sets
path: ../data # dataset root dir (Colab: use /content/data, bwJupyter: ../data)
train: train # train images (relative to 'path')
val: test # this time we use the test dataset for evaluations (relative to 'path')

# define classes
names:
    0: car

Now we simply rerun the validation step (but this time it will utilize the test dataset).

In [None]:
test_results = model.val(data="dataset_test.yaml")

The results of the evaluation are written to files. Another way to access them is through the output of the `val` method. Let's have a look at the results.

In [None]:
f, ax = plt.subplots(2, 2, figsize=(15,15))
ax = np.ravel(ax)

for i in range(len(test_results.curves)):
    ax[i].plot(test_results.curves_results[i][0], test_results.curves_results[i][1][0])
    ax[i].set_xlabel(test_results.curves_results[i][2])
    ax[i].set_ylabel(test_results.curves_results[i][3])
    ax[i].set_title(test_results.curves[i])

In [None]:
for metric, value in test_results.results_dict.items():
    print(metric, ':', value)

## Inference

Let's perform inference for one image from our training dataset. In the end, we simply provide a list of filenames to our model, which will return a `result` object for each sample. For a full discussion of the prediction step, please review the [predict guide](https://docs.ultralytics.com/modes/predict/).

In [None]:
i = 2 # image index

# Perform inference on one image
results = model([train_filenames[i]],
                conf=0.4, # detection confidence threshold
                iou=0.6, # detection iou threshold)
               )

# extract bounding boxes and confidences
bbs = results[0].boxes.xywh.cpu().numpy()
confs = results[0].boxes.conf.cpu().numpy()

# plot image
img = io.imread(train_filenames[i])
f, ax = plt.subplots(1, 1)
ax.imshow(img)

# add bounding boxes to plot
for i in range(bbs.shape[0]):
    # draw a rectangle for each bounding box; rectangle expects top left corner coordinates, width and heigth
    ax.add_patch(Rectangle(((bbs[i][0]-bbs[i][2]/2), (bbs[i][1]-bbs[i][3]/2)),
                           bbs[i][2], bbs[i][3], edgecolor='yellow', facecolor='none'))
    # add confidence values
    ax.annotate('{:.1f}%'.format(confs[i]*100), xy=(bbs[i][0]-bbs[i][2]*0.3, bbs[i][1]), fontsize=12, color='yellow')