# Training a YOLO Model Using the NuImages Dataset

Welcome! This notebook will guide you through converting the nuImages `v1.0-mini` dataset into the YOLO format required for training an object detection model.

We will perform the following steps:
1.  **Setup:** Install and import necessary libraries.
2.  **Configuration:** Define paths to our input data and output directories.
3.  **Helper Functions:** Define the functions that will help us convert formats.
4.  **Load & Split:** Load the `v1.0-mini` data and split it into `train` and `val` sets.
5.  **Process Data:**
    * Copy the images into new `train/images` and `val/images` folders.
    * Convert the annotations into YOLO-format `.txt` files and save them in `train/labels` and `val/labels`.
6.  **Verification:** Check our new dataset structure.
7. **Model Training**: Use our formatted dataset to a YOLO model.
8. **Detecting Using a pretrained model**: We will use a pretrained model to show the results of a trained model from nuImages dataset

## Setup and Imports
*This section outlines the tools, libraries, and methodologies necessary for processing and analyzing the dataset.*

Before you begin, make sure you have the required libraries installed. Below includes the list of libraries that we will use to run the notebook.

1. **Dataset Manipulation**
- `nuimages`: This is library developed by Motional the author behind the nuScenes and nuImages dataset. This library will be used to manipulate and make changes to the dataset. To be able to format it according to the YOLO format.

2. **Progress Bars**
- `tqdm`: This library is used for progress bars. This would help determine the progress of converting the dataset. Since converting large amount of images would take a long time.

3. **Data Processing and Splitting**
- `train_test_split`: This is used to determine the split of the dataset. This can be configured by the user but this notebook will use a split of 80% for training and 20% for validation. This is done to show that you can split the dataset.

- `numpy`:A core library for numerical operations in Python. It's used by our helper script (utils.py) to perform the mathematical conversions for bounding boxes.

4. **Core Python & System Utilities**
- `logging`: This is a built-in Python library for emitting status messages and warnings. We use it to get clean, informative output about the script's progress.

- `path`: A modern, built-in Python library for handling file system paths. We use its Path object to easily create, join, and manage directories and file paths in a way that works on all operating systems

- `shutil`:This is Python's "shell utilities" library. We use it for high-level file operations, specifically shutil.copy(), to copy the images from the original dataset into our new YOLO-formatted folders.

5. **Model Training**
- `ultralytics`: This is the official library for the YOLOv8 model. While we won't use it for the conversion part of this notebook, it is included in the requirements because it will be used in a later step to train our model on the dataset we are creating.

6. **Visualization and Verification**

- `matplotlib`: This is the most popular plotting library in Python. We will use it to display our images and comparison plots directly inside the Jupyter notebook.

- `cv2` (OpenCV): We'll use it to read images from disk and, more importantly, to draw the bounding boxes and labels onto them.

- `random`: This is a built-in Python library that we'll use to select a random image from our validation set for our sanity check.

In [None]:
import logging
from pathlib import Path
from tqdm import tqdm
import numpy as np
import shutil
from sklearn.model_selection import train_test_split

from nuimages import NuImages
from classes import simplify_nuimage_labels, NuImageSimpleCategory, NuImageSimplerCategory, NuImageSimplestCategory, LabelMappingTypes
from utils import PxyXY_to_Nxcycwh

import cv2
import matplotlib.pyplot as plt
import random

logging.basicConfig(level=logging.INFO)
logger = logging.getLogger()


## Setting up path

Here we will setup the different paths needed in splitting the dataset.

- `NUIM_ROOT`: This is the directory where the nuImages dataset will be. For this notebook, we will be using `nuImages_mini` which only contains 50 images with labels.

- `OUTPUT_ROOT`: This is the directory of the YOLO Formatted nuImages dataset.

- `LABEL_MAPPING`: This determines how the classes of the formatted dataset will be categorized. There are 3 ways that the dataset can be formatted. `FAITHFUL` Retains all original classes that came with the nuImages dataset. `SIMPLER` turns the nuImages dataset into 4 classes. And finally we have `SIMPLEST`, Which contains 2 classes.

In [None]:
NUIM_ROOT = Path("nuImagesMini")

OUTPUT_ROOT = Path("nuImagesMini_YOLO")

LABEL_MAPPING = "FAITHFUL"

VAL_SPLIT_SIZE = 0.2
RANDOM_STATE = 42

print(f"Input root:  {NUIM_ROOT.resolve()}")
print(f"Output root: {OUTPUT_ROOT.resolve()}")
print(f"Label map:   {LABEL_MAPPING}")

## Declaring Helper Functions

- `mkdir_output_dirs`: This is a simple utility function that creates the final directory structure that YOLOv8 expects. It will create `train/images`, `train/labels`, `val/images`, and `val/labels` inside our `OUTPUT_ROOT` directory.

- `get_filename_no_suffix`: A small helper that takes a nuImages annotation and finds the original filename of the camera image it belongs to (`n008-2018-08-27-11-48-54-500__CAM_FRONT__1535385057512404`)

- `convert_annotation`: This is the most important function. It takes a single nuImages annotation and does three things:

  - Finds the original bounding box. For example: `[100, 200, 150, 250]`.
  - Converts it to the YOLO format `[0.5, 0.6, 0.02, 0.05]` using the PxyXY_to_Nxcycwh function from our `utils.py` file.
  - Maps the complex nuImages category to a simple YOLO class ID using our `classes.py` file.

- `append_txt`: This function takes the converted YOLO annotation and writes it as a new line in the correct `.txt` file.


In [None]:
def mkdir_output_dirs(p: Path):
    """Creates the train/val output directories for YOLO."""
    logger.info(f"Creating output directories at: {p}...")

    # We only need train and val for the mini-dataset
    (p / 'train' / 'images').mkdir(parents=True, exist_ok=True)
    (p / 'train' / 'labels').mkdir(parents=True, exist_ok=True)

    (p / 'val' / 'images').mkdir(parents=True, exist_ok=True)
    (p / 'val' / 'labels').mkdir(parents=True, exist_ok=True)

def get_filename_no_suffix(annotation, nuim):
    
    # get token that connects image and annotation
    sample_data_token = annotation['sample_data_token']

    # retrieves a dictionary containing details about the image
    sample_data = nuim.get("sample_data", sample_data_token)
    return Path(sample_data['filename']).with_suffix('').name

def convert_annotation(annotation, nuim, label_mapping):
    
    # raw bounding box coordinates
    xyXY = annotation['bbox']

    # checks if a segmentation mask exists
    if annotation['mask'] is None:
        return None, None, None
    
    # extracts the image height and width
    height = annotation['mask']['size'][0]
    width = annotation['mask']['size'][1]

    # convert the coordinates
    yolo_bbox_coords = PxyXY_to_Nxcycwh(xyXY, width, height)
    if yolo_bbox_coords is None:
        return None, None, None

    yolo_bbox = list(yolo_bbox_coords)

    # look up raw category name and attribute
    nu_cat = nuim.get('category', annotation['category_token'])['name']

    if annotation['attribute_tokens']:
        attribute_token = annotation['attribute_tokens'][0]
        attribute = nuim.get('attribute', attribute_token)['name']
    else:
        attribute = None

    yolo_cat = simplify_nuimage_labels(nu_cat, attribute, label_mapping)

    # get filename without suffix
    filename_no_suffix = get_filename_no_suffix(annotation, nuim)

    return yolo_cat, yolo_bbox, filename_no_suffix

def append_txt(cat: str, bbox: list, filename_no_suffix: str, set_type: str, output_root: Path, label_mapping: str):

    #build the full path to the text file
    pa = output_root / set_type / 'labels'
    fi = (pa / filename_no_suffix).with_suffix('.txt')

    # look up the integer ID from an Enum class based on the strategy
    if label_mapping == "FAITHFUL":
        cat_index = NuImageSimpleCategory[cat].value
    elif label_mapping == "SIMPLER":
        cat_index = NuImageSimplerCategory[cat].value
    elif label_mapping == "SIMPLEST":
        cat_index = NuImageSimplestCategory[cat].value
    else:
        logger.error(f"Unknown value for label_mapping: {label_mapping}. Exiting.")
        return
    
    # unpack bbox
    xc, yc, w, h = bbox

    # write to file
    with open(fi, 'a') as f:
        f.write(f"{cat_index} {xc} {yc} {w} {h}\n")

## Load & Split the Data

Now that we have our helper functions, it's time to load the v1.0-mini dataset and prepare it for processing. This cell accomplishes four key steps:

1. **Create Directories**: It first calls `mkdir_output_dirs` to create the empty `train/` and `val/` folders inside your `OUTPUT_ROOT`.

2. **Load nuImages Data**: It uses the `NuImages` class to load the `v1.0-mini` dataset. We pass `lazy=True` for efficiency, so it only loads data from the JSON files as we need it, rather than all at once.

3. **Split the Dataset**: It takes the list of all samples from the mini-dataset and uses `train_test_split` to divide them into a training set (80% of the data) and a validation set (20% of the data).

4. **Create Splits Dictionary**: Finally, it creates a simple dictionary named `splits`. This just makes it easier in the following steps to loop over our `train_samples` and `val_samples` lists.


In [None]:
# create output directories
mkdir_output_dirs(OUTPUT_ROOT)

# load the  dataset
print(f"Loading nuImages v1.0-mini from {NUIM_ROOT.resolve()}")
nuim = NuImages(dataroot=NUIM_ROOT.resolve(), version="v1.0-mini", verbose=False, lazy=True)
logger.info(f"Successfully loaded {len(nuim.sample)} samples from v1.0-mini.")

# split the samples into train val sets
train_samples, val_samples = train_test_split(
    nuim.sample,
    test_size=VAL_SPLIT_SIZE,
    random_state=RANDOM_STATE
)

logger.info(f"Splitting into {len(train_samples)} train and {len(val_samples)} val samples.")

# create dictionary to iterate over
splits = {
    "train": train_samples,
    "val": val_samples
}

## Process and Copy Images

With our data split into `train` and `val` sets, we can now populate our new YOLO directories. This first processing step focuses only on the images.

This code block loops through our `splits` dictionary (first for "train", then for "val"). For each sample in each set, it performs these steps:

1. **Find the Image**: It gets the record for the main "key camera" image.

2. **Get Original Path**: It reads the original file path from the nuImages metadata

3. **Get Destination Path**: It builds the new path where the image should go

4. **Copy File**: Finally, it uses `shutil.copy()` to copy the image from its original location to our new `train/images` or `val/images` folder.

In [None]:
for set_type, samples in splits.items():
    logger.info(f"processing {set_type} images")

    destination_dir = OUTPUT_ROOT / set_type / 'images'

    for sample in tqdm(samples, desc=f"copying {set_type} images"):
        # get sample_data token for the key camera
        sample_data_token = sample['key_camera_token']

        # get sample_data record
        sample_data = nuim.get("sample_data", sample_data_token)

        # original image path
        origin_jpg_path = NUIM_ROOT / sample_data['filename']

        # destination path
        destination_jpg_path = destination_dir / origin_jpg_path.name

        # copy
        if not origin_jpg_path.exists():
            logger.warning(f"Image not found at {origin_jpg_path}. Skipping.")
            continue

        shutil.copy(origin_jpg_path, destination_jpg_path)

print("\nImage copying complete!")

## Process and Write Annotations

With our images copied, we now need to create the matching `.txt` label files.

1. **Create a Lookup Map**: First, it creates a dictionary called `token_to_split_map`. This is a very fast way to check which split each image token belongs to. We do this before looping for efficiency.

2. **Iterate All Annotations**: The code then loops through `nuim.object_ann`, which is the master list of every single annotation in the `v1.0-mini` dataset.

3. **Check & Skip**: For each annotation, it checks if its `sample_data_token` (the image it belongs to) is in our `token_to_split_map`. If it's not, it means the annotation is for an image we didn't include in our splits, so we simply ignore it and continue.

4. **Convert & Write**: If the annotation is in our `map`, we know it's one we need.
    * It calls `convert_annotation()` to get the YOLO-formatted bounding box and class ID.
    * It then calls `append_txt()` to write that information as a new line in the correct `.txt` file.

In [None]:
# create a lookup map to know which split an annotation belongs to.
# map the sample_data_token (which images are annotated) to a split ("train" or "val").
token_to_split_map = {}
for sample in train_samples:
    token_to_split_map[sample['key_camera_token']] = "train"
for sample in val_samples:
    token_to_split_map[sample['key_camera_token']] = "val"

logger.info(f"Converting and writing {len(nuim.object_ann)} total annotations")

# iterate through all object annotations in the mini-dataset
for annotation in tqdm(nuim.object_ann, desc="Converting annotations"):

    # find out which split this annotation's image belongs to
    sample_data_token = annotation['sample_data_token']
    if sample_data_token not in token_to_split_map:
        continue

    set_type = token_to_split_map[sample_data_token]

    # convert annotation
    cat, bbox, filename_no_suffix = convert_annotation(annotation, nuim, LABEL_MAPPING)

    if cat is None:
        continue

    # Write the annotation line to the correct file
    append_txt(cat, bbox, filename_no_suffix, set_type, OUTPUT_ROOT, LABEL_MAPPING)

print("\nAnnotation conversion complete")

## Sanity Check: Visualizing the Conversion (Side-by-Side)

After all that work, how do we know if our conversion was successful? The best way is to see it.

We will write a script to:
1.  Pick a random image from our new validation set.
2.  Load the original image.
3.  Create two copies:
    * **Image 1:** We will draw the **original nuImages annotations** on it in **BLUE**.
    * **Image 2:** We will load our **new YOLO-format .txt file** and draw those annotations on it in **RED**.
4.  Display them side-by-side.

If our conversion worked, the red boxes on the right should perfectly match the blue boxes on the left.

In [None]:
logger.info("Running visual sanity check...")

if LABEL_MAPPING == "FAITHFUL":
    yolo_class_names = {cat.value: cat.name for cat in NuImageSimpleCategory}
else:
    yolo_class_names = {0: "class_0", 1: "class_1"}

random_sample = random.choice(val_samples)
sample_data_token = random_sample['key_camera_token']
sample_data = nuim.get("sample_data", sample_data_token)

yolo_img_path = str(OUTPUT_ROOT / "val" / "images" / Path(sample_data['filename']).name)
yolo_label_path = str(OUTPUT_ROOT / "val" / "labels" / Path(sample_data['filename']).with_suffix('.txt').name)


img = cv2.imread(yolo_img_path)
if img is None:
    logger.error(f"Could not read image at {yolo_img_path}")
else:
    img_height, img_width, _ = img.shape
    logger.info(f"Checking image: {Path(yolo_img_path).name}")
    
    img_original = img.copy()
    img_yolo = img.copy()

    original_anns = [ann for ann in nuim.object_ann if ann['sample_data_token'] == sample_data_token]
    
    for ann in original_anns:

        x1, y1, x2, y2 = [int(c) for c in ann['bbox']]

        cat_token = ann['category_token']
        cat_name = nuim.get('category', cat_token)['name'].split('.')[-1]

        cv2.rectangle(img_original, (x1, y1), (x2, y2), (255, 0, 0), 2) # BLUE
        cv2.putText(img_original, cat_name, (x1, y1 - 10), cv2.FONT_HERSHEY_SIMPLEX, 0.5, (255, 0, 0), 1)

    try:
        with open(yolo_label_path, 'r') as f:
            for line in f:

                class_id, x_center, y_center, w_norm, h_norm = [float(x) for x in line.strip().split()]
                class_id = int(class_id)
 
                box_w = w_norm * img_width
                box_h = h_norm * img_height
                x_min = int((x_center * img_width) - (box_w / 2))
                y_min = int((y_center * img_height) - (box_h / 2))
                x_max = int(x_min + box_w)
                y_max = int(y_min + box_h)

                label_name = yolo_class_names.get(class_id, "UNKNOWN")

                cv2.rectangle(img_yolo, (x_min, y_min), (x_max, y_max), (0, 0, 255), 2) # RED
                cv2.putText(img_yolo, label_name, (x_min, y_min - 10), cv2.FONT_HERSHEY_SIMPLEX, 0.5, (0, 0, 255), 1)

    except FileNotFoundError:
        logger.warning(f"No YOLO label file found at: {yolo_label_path}")
        
    logger.info("Displaying comparison: Left (Original), Right (YOLO)")
    
    img_original_rgb = cv2.cvtColor(img_original, cv2.COLOR_BGR2RGB)
    img_yolo_rgb = cv2.cvtColor(img_yolo, cv2.COLOR_BGR2RGB)

    fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(24, 12))

    ax1.imshow(img_original_rgb)
    ax1.set_title("Original nuImages Annotations (BLUE)", fontsize=16)
    ax1.axis('off')
    
    ax2.imshow(img_yolo_rgb)
    ax2.set_title("Converted YOLO Annotations (RED)", fontsize=16)
    ax2.axis('off')
    
    plt.tight_layout()
    plt.show()

## Creating the yaml file

 **Auto-generate the `data.yaml` file:** We will write a script to create the dataset configuration file that YOLOv8 needs.

In [None]:
logger.info("Creating data.yaml file...")

# 1. Define the path where the YAML file will be saved
yaml_path = OUTPUT_ROOT / "nuimages_mini.yaml"

# 2. Get the class names based on the mapping you chose
class_names = []
if LABEL_MAPPING == "FAITHFUL":
    class_names = [cat.name for cat in NuImageSimpleCategory]
elif LABEL_MAPPING == "SIMPLER":
    class_names = [cat.name for cat in NuImageSimplerCategory]
elif LABEL_MAPPING == "SIMPLEST":
    class_names = [cat.name for cat in NuImageSimplestCategory]

if not class_names:
    logger.error("Could not determine class names. Make sure LABEL_MAPPING is set.")
else:
    # --- THIS IS THE FIX ---
    # We will get the full, absolute path to our output directory
    # .resolve() converts "nuimagesMini_YOLO" to "D:\Academic Workshop\nuimagesMini_YOLO"
    absolute_path = OUTPUT_ROOT.resolve()
    
    # Create the YAML content using the absolute path
    yaml_content = f"""
path: {absolute_path}
train: train/images
val: val/images

# Class names
names:
"""
    # Add all class names to the file
    for i, name in enumerate(class_names):
        yaml_content += f"  {i}: {name}\n"

    # 3. Write the content to the file
    try:
        with open(yaml_path, 'w') as f:
            f.write(yaml_content)
        
        logger.info(f"Successfully created data configuration file at: {yaml_path}")
        print(f"\n--- Contents of {yaml_path} ---")
        print(yaml_content)
        
    except Exception as e:
        logger.error(f"Failed to write YAML file: {e}")

### Finally we have all the requirements to start training and predicting with the model. Let us now move on to the `Training` portion of the workshop

# References

- `classes.py` and `utils.py` is referenced from this repository: https://github.com/tensorturtle/yolov8-on-nuimages
- `nuImages devkit` can be found in this repository: https://github.com/nutonomy/nuscenes-devkit