# Work zone detection: ROADWork to YOLO setup

## 1. Notebook goals

In this notebook we will:

1. Load the ROADWork annotations in COCO style format.
2. Define the work zone object categories that we want to detect first.
3. Convert the COCO annotations into YOLO format (image and label folders).
4. Create a YAML configuration file to train a YOLO detector later.

This will give us a clean dataset ready for real time work zone detection experiments.

In [2]:
# %% Imports

from pathlib import Path
import json
from dataclasses import dataclass
from collections import defaultdict
import shutil
import os
from typing import Dict, List

## 2. Paths and basic configuration

Here we point to the ROADWork root folder and the annotation files we want to use.
We also define the output structure for the YOLO dataset, which will have:

- `workzone_yolo/images/train`
- `workzone_yolo/images/val`
- `workzone_yolo/labels/train`
- `workzone_yolo/labels/val`

In [3]:
# %% Paths and config

# Assuming this notebook lives in the WORKINGZONE root
ROOT = Path("data").resolve()

IMG_DIR = ROOT / "images"
ANN_DIR = ROOT / "annotations"

train_ann_file = ANN_DIR / "instances_train_gps_split_with_signs.json"
val_ann_file   = ANN_DIR / "instances_val_gps_split_with_signs.json"

print("Train annotation file exists:", train_ann_file.exists())
print("Val annotation file exists:", val_ann_file.exists())
print("Images folder exists:", IMG_DIR.exists())

# YOLO output root
YOLO_ROOT = ROOT / "workzone_yolo"
IMG_TRAIN_DIR = YOLO_ROOT / "images" / "train"
IMG_VAL_DIR   = YOLO_ROOT / "images" / "val"
LBL_TRAIN_DIR = YOLO_ROOT / "labels" / "train"
LBL_VAL_DIR   = YOLO_ROOT / "labels" / "val"

for d in [IMG_TRAIN_DIR, IMG_VAL_DIR, LBL_TRAIN_DIR, LBL_VAL_DIR]:
    d.mkdir(parents=True, exist_ok=True)

YOLO_ROOT

Train annotation file exists: True
Val annotation file exists: True
Images folder exists: True


PosixPath('/data/RoadWork/workzone/data/workzone_yolo')

## 3. Load ROADWork annotations

We now load the training and validation splits with sign information.
We also build helper dictionaries:

- `id2img`: image id to image info
- `id2cat`: category id to category name
- `anns_by_image`: image id to list of annotations

In [4]:
# %% Load annotations and helper maps

with open(train_ann_file, "r") as f:
    train_data = json.load(f)

with open(val_ann_file, "r") as f:
    val_data = json.load(f)

print("Train images:", len(train_data["images"]))
print("Train annotations:", len(train_data["annotations"]))
print("Val images:", len(val_data["images"]))
print("Val annotations:", len(val_data["annotations"]))

id2img_train = {img["id"]: img for img in train_data["images"]}
id2img_val   = {img["id"]: img for img in val_data["images"]}
id2cat       = {c["id"]: c["name"] for c in train_data["categories"]}

print("Number of categories:", len(id2cat))
for cid, name in list(id2cat.items())[:49]:
    print(cid, name)

# group annotations by image id
def build_anns_by_image(ann_list):
    d = defaultdict(list)
    for ann in ann_list:
        d[ann["image_id"]].append(ann)
    return d

anns_by_image_train = build_anns_by_image(train_data["annotations"])
anns_by_image_val   = build_anns_by_image(val_data["annotations"])


Train images: 5318
Train annotations: 60757
Val images: 2098
Val annotations: 21261
Number of categories: 49
1 Police Officer
2 Police Vehicle
3 Cone
4 Fence
5 Drum
6 Barricade
7 Barrier
8 Work Vehicle
9 Vertical Panel
10 Tubular Marker
11 Arrow Board
12 Bike Lane
13 Work Equipment
14 Worker
15 Other Roadwork Objects
16 Temporary Traffic Control Message Board
17 Temporary Traffic Control Sign
19 Temporary Traffic Control Sign: left arrow
20 Temporary Traffic Control Sign: right arrow
21 Temporary Traffic Control Sign: up arrow
22 Temporary Traffic Control Sign: left chevron
23 Temporary Traffic Control Sign: right lane ends sign
24 Temporary Traffic Control Sign: two lane shift arrows
25 Temporary Traffic Control Sign: right chevron
26 Temporary Traffic Control Sign: lane shift arrow
27 Temporary Traffic Control Sign: up diagonal right arrow
28 Temporary Traffic Control Sign: left lane ends sign
29 Temporary Traffic Control Sign: bent left arrow
30 Temporary Traffic Control Sign: flagg

## 4. Choose work zone classes for the detector

We start with a focused set of categories that capture the core work zone structure:

- Cone
- Drum
- Barricade
- Barrier
- Vertical Panel
- Work Vehicle
- Worker
- Arrow Board
- Temporary Traffic Control Sign
- Temporary Traffic Control Message Board

We map these ROADWork category ids to contiguous YOLO class indices starting at zero.

In [7]:
# %% Define mapping from ROADWork category id to YOLO class index

# ROADWork ids from your previous all category inspection
from matplotlib.pyplot import arrow


ROADWORK_TO_KEEP = {
    1: "Police Officer",
    2: "Police Vehicle",
    3: "Cone",
    4: "Fence",
    5: "Drum",
    6: "Barricade",
    7: "Barrier",
    8: "Work Vehicle",
    9: "Vertical Panel",
    11: "Arrow Board",
    10: "Tubular Marker",
    11: "Arrow Board",
    12: "Bike Lane",
    13: "Work Equipment",
    14: "Worker",
    16: "Temporary Traffic Control Message Board",
    17: "Temporary Traffic Control Sign",
    19: "Temporary Traffic Control Sign: left arrow",
    20: "Temporary Traffic Control Sign: right arrow",
    21: "Temporary Traffic Control Sign: up arrow",
    22: "Temporary Traffic Control Sign: left chevron",
    23: "Temporary Traffic Control Sign: right lane ends sign",
    24: "Temporary Traffic Control Sign: two lane shift arrows",
    25: "Temporary Traffic Control Sign: right chevron",
    26: "Temporary Traffic Control Sign: lane shift arrow",
    27: "Temporary Traffic Control Sign: up diagonal right arrow",
    28: "Temporary Traffic Control Sign: left lane ends sign",
    29: "Temporary Traffic Control Sign: bent left arrow",
    30: "Temporary Traffic Control Sign: flagger",
    31: "Temporary Traffic Control Sign: bent right arrow",
    32: "Temporary Traffic Control Sign: no left turn",
    33: "Temporary Traffic Control Sign: pedestrian: right arrow",
    34: "Temporary Traffic Control Sign: pedestrian: left arrow",
    35: "Temporary Traffic Control Sign: up diagonal left arrow",
    36: "Temporary Traffic Control Sign: pedestrian",
    37: "Temporary Traffic Control Sign: no right turn",
    38: "Temporary Traffic Control Sign: bi-directional arrow",
    39: "Temporary Traffic Control Sign: two upward diagonal arrows",
    40: "Temporary Traffic Control Sign: curved right arrow",
    41: "Temporary Traffic Control Sign: down diagonal left arrow",
    42: "Temporary Traffic Control Sign: do not enter sign",
    43: "Temporary Traffic Control Sign: worker",
    44: "Temporary Traffic Control Sign: bicycle",
    45: "Temporary Traffic Control Sign: two downward diagonal arrows",
    46: "Temporary Traffic Control Sign: curved left arrow",
    47: "Temporary Traffic Control Sign: curved left and curved right arrow",
    48: "Temporary Traffic Control Sign: work vehicle",
    49: "Temporary Traffic Control Sign: traffic signal",
    50: "Temporary Traffic Control Sign: up arrow / stop sign",
}

# Create YOLO index mapping
yolo_names: List[str] = []
roadwork_to_yolo: Dict[int, int] = {}

for road_id, name in ROADWORK_TO_KEEP.items():
    yolo_idx = len(yolo_names)
    yolo_names.append(name)
    roadwork_to_yolo[road_id] = yolo_idx

print("YOLO classes:")
for road_id, yolo_idx in roadwork_to_yolo.items():
    print(f"ROADWork {road_id:2d} -> YOLO {yolo_idx:2d}  ({id2cat[road_id]})")

print("\nNames list:", yolo_names)


YOLO classes:
ROADWork  1 -> YOLO  0  (Police Officer)
ROADWork  2 -> YOLO  1  (Police Vehicle)
ROADWork  3 -> YOLO  2  (Cone)
ROADWork  4 -> YOLO  3  (Fence)
ROADWork  5 -> YOLO  4  (Drum)
ROADWork  6 -> YOLO  5  (Barricade)
ROADWork  7 -> YOLO  6  (Barrier)
ROADWork  8 -> YOLO  7  (Work Vehicle)
ROADWork  9 -> YOLO  8  (Vertical Panel)
ROADWork 11 -> YOLO  9  (Arrow Board)
ROADWork 10 -> YOLO 10  (Tubular Marker)
ROADWork 12 -> YOLO 11  (Bike Lane)
ROADWork 13 -> YOLO 12  (Work Equipment)
ROADWork 14 -> YOLO 13  (Worker)
ROADWork 16 -> YOLO 14  (Temporary Traffic Control Message Board)
ROADWork 17 -> YOLO 15  (Temporary Traffic Control Sign)
ROADWork 19 -> YOLO 16  (Temporary Traffic Control Sign: left arrow)
ROADWork 20 -> YOLO 17  (Temporary Traffic Control Sign: right arrow)
ROADWork 21 -> YOLO 18  (Temporary Traffic Control Sign: up arrow)
ROADWork 22 -> YOLO 19  (Temporary Traffic Control Sign: left chevron)
ROADWork 23 -> YOLO 20  (Temporary Traffic Control Sign: right lane end

## 5. COCO to YOLO conversion

For each image in the train and val splits we will:

1. Check all annotations for that image.
2. Keep only those whose category id is in our selected mapping.
3. Convert the COCO bbox `[x, y, width, height]` into YOLO format  
   with normalized `(x_center, y_center, width, height)`.
4. Write one `.txt` file per image in `labels/train` or `labels/val`.
5. Copy the image into `images/train` or `images/val`.

Label format for YOLO:

`class_idx x_center y_center width height`

All coordinates are normalized by image width and height.


In [8]:
# %% Helper: convert COCO bbox to YOLO format

def coco_to_yolo_bbox(bbox, img_width, img_height):
    """
    bbox: [x_min, y_min, w, h] in absolute pixels
    returns: (x_center, y_center, w_norm, h_norm) in [0, 1]
    """
    x, y, w, h = bbox
    x_center = x + w / 2.0
    y_center = y + h / 2.0

    x_center_norm = x_center / img_width
    y_center_norm = y_center / img_height
    w_norm = w / img_width
    h_norm = h / img_height

    return x_center_norm, y_center_norm, w_norm, h_norm


In [9]:
# %% Conversion function for a split

def convert_split_coco_to_yolo(
    images,
    anns_by_image,
    id2img,
    img_src_dir: Path,
    img_out_dir: Path,
    lbl_out_dir: Path,
    split_name: str,
):
    num_images = 0
    num_labels = 0

    for img_info in images:
        img_id = img_info["id"]
        file_name = img_info["file_name"]
        width = img_info["width"]
        height = img_info["height"]

        anns = anns_by_image.get(img_id, [])
        yolo_lines = []

        for ann in anns:
            cat_id = ann["category_id"]
            if cat_id not in roadwork_to_yolo:
                continue

            yolo_cls = roadwork_to_yolo[cat_id]
            x_c, y_c, w_n, h_n = coco_to_yolo_bbox(ann["bbox"], width, height)

            # skip zero sized boxes
            if w_n <= 0 or h_n <= 0:
                continue

            line = f"{yolo_cls} {x_c:.6f} {y_c:.6f} {w_n:.6f} {h_n:.6f}"
            yolo_lines.append(line)

        # If nothing to label, skip this image
        if not yolo_lines:
            continue

        # write label file
        stem = Path(file_name).stem
        lbl_path = lbl_out_dir / f"{stem}.txt"
        with open(lbl_path, "w") as f:
            f.write("\n".join(yolo_lines))

        # copy image
        src_img_path = img_src_dir / file_name
        dst_img_path = img_out_dir / file_name
        dst_img_path.parent.mkdir(parents=True, exist_ok=True)
        shutil.copy2(src_img_path, dst_img_path)

        num_images += 1
        num_labels += len(yolo_lines)

    print(f"[{split_name}] images with labels:", num_images)
    print(f"[{split_name}] total boxes written:", num_labels)


In [10]:
# %% Run conversion for train and val splits

convert_split_coco_to_yolo(
    images=train_data["images"],
    anns_by_image=anns_by_image_train,
    id2img=id2img_train,
    img_src_dir=IMG_DIR,
    img_out_dir=IMG_TRAIN_DIR,
    lbl_out_dir=LBL_TRAIN_DIR,
    split_name="train",
)

convert_split_coco_to_yolo(
    images=val_data["images"],
    anns_by_image=anns_by_image_val,
    id2img=id2img_val,
    img_src_dir=IMG_DIR,
    img_out_dir=IMG_VAL_DIR,
    lbl_out_dir=LBL_VAL_DIR,
    split_name="val",
)


[train] images with labels: 5316
[train] total boxes written: 60673
[val] images with labels: 2098
[val] total boxes written: 21238


## 6. Create YOLO data YAML

We now write a `workzone_yolo.yaml` file that YOLO will use.
It specifies:

- the dataset root
- train and val image folders
- class names

In [14]:
# %% Write YOLO data YAML
import yaml

data_yaml = {
    "path": str(YOLO_ROOT),
    "train": "images/train",
    "val": "images/val",
    "names": {i: name for i, name in enumerate(yolo_names)},
}

yaml_path = YOLO_ROOT / "workzone_yolo.yaml"
with open(yaml_path, "w") as f:
    yaml.dump(data_yaml, f)

print("Wrote YAML to:", yaml_path)
print(yaml_path.read_text())

Wrote YAML to: /data/RoadWork/workzone/data/workzone_yolo/workzone_yolo.yaml
names:
  0: Police Officer
  1: Police Vehicle
  2: Cone
  3: Fence
  4: Drum
  5: Barricade
  6: Barrier
  7: Work Vehicle
  8: Vertical Panel
  9: Arrow Board
  10: Tubular Marker
  11: Bike Lane
  12: Work Equipment
  13: Worker
  14: Temporary Traffic Control Message Board
  15: Temporary Traffic Control Sign
  16: 'Temporary Traffic Control Sign: left arrow'
  17: 'Temporary Traffic Control Sign: right arrow'
  18: 'Temporary Traffic Control Sign: up arrow'
  19: 'Temporary Traffic Control Sign: left chevron'
  20: 'Temporary Traffic Control Sign: right lane ends sign'
  21: 'Temporary Traffic Control Sign: two lane shift arrows'
  22: 'Temporary Traffic Control Sign: right chevron'
  23: 'Temporary Traffic Control Sign: lane shift arrow'
  24: 'Temporary Traffic Control Sign: up diagonal right arrow'
  25: 'Temporary Traffic Control Sign: left lane ends sign'
  26: 'Temporary Traffic Control Sign: bent l