# Step 1: Convert Raw Dataset to YOLO Format

This step involves converting the dataset from its original COCO format to the YOLO format required for training. 

### Key Actions:
1. The COCO dataset annotations are loaded from `datasets/raw/probe_labels.json`.
2. Add `category_id = 1` for every annotation, as required for the `convert_coco` function.
3. The `convert_coco` function from the Ultralytics library is used to transform the COCO annotations into YOLO-compatible label files, which are saved in the `datasets/yolo_temp` directory.


In [1]:
import os
import random
import shutil
import json
from ultralytics.data.converter import convert_coco

In [2]:
with open("datasets/raw/probe_labels.json") as file:
    data = json.load(file)

annotations = data["annotations"]

for annotation in annotations:
    annotation["category_id"] = 1
    

with open("datasets/raw/probe_labels.json", "w", encoding="utf-8") as file:
    json.dump(data, file, indent=4)

In [None]:
convert_coco(
    labels_dir="datasets/raw/",
    save_dir="datasets/yolo_temp",
    cls91to80=False
)

# Step 2: Train and Validation Split

This step creates separate training and validation datasets by splitting the images and labels into two groups based on a specified ratio (80% training and 20% validation in this case).

### Key Actions:
1. Source directories for images and labels are defined.
2. Destination directories for training and validation sets are created.
3. The images are shuffled randomly for a fair split, ensuring reproducibility using a fixed random seed (`seed=42`).
4. The dataset is split into training and validation subsets.
5. Images and corresponding label files are copied to their respective folders.
6. The temporary YOLO folder (`datasets/yolo_temp`) is cleaned up after the split is complete.

In [4]:
images_source_folder = "datasets/raw/probe_images"
labels_source_folder = "datasets/yolo_temp/labels/probe_labels"

images_train_folder = "datasets/yolo/images/train"
images_val_folder = "datasets/yolo/images/val"

labels_train_folder = "datasets/yolo/labels/train"
labels_val_folder = "datasets/yolo/labels/val"

split_ratio = 0.8

In [5]:
os.makedirs(images_train_folder, exist_ok=True)
os.makedirs(images_val_folder, exist_ok=True)
os.makedirs(labels_train_folder, exist_ok=True)
os.makedirs(labels_val_folder, exist_ok=True)

In [6]:
images = [img for img in os.listdir(images_source_folder) if img.endswith(".jpg")]

In [7]:
random.seed(42)
random.shuffle(images)

In [8]:
train_count = int(len(images) * split_ratio)
train_images = images[:train_count]
val_images = images[train_count:]

print(train_count)
print(len(train_images))
print(len(val_images))

246
246
62


In [9]:
for image in train_images:
    label = image.replace("jpg", "txt")
    
    shutil.copy(f"{images_source_folder}/{image}", f"{images_train_folder}/{image}")
    shutil.copy(f"{labels_source_folder}/{label}", f"{labels_train_folder}/{label}")


for image in val_images:
    label = image.replace("jpg", "txt")

    shutil.copy(f"{images_source_folder}/{image}", f"{images_val_folder}/{image}")
    shutil.copy(f"{labels_source_folder}/{label}", f"{labels_val_folder}/{label}")

In [10]:
shutil.rmtree("datasets/yolo_temp")

In [11]:
print(len(os.listdir(f"{images_train_folder}")))
print(len(os.listdir(f"{labels_train_folder}")))
print(len(os.listdir(f"{images_val_folder}")))
print(len(os.listdir(f"{labels_val_folder}")))

246
246
62
62


# Step 3: Create `data.yaml` File

This file is a configuration file required by YOLO for training. It specifies the dataset paths and the class names.

### Content of `data.yaml`:
- **`path:`** The root directory for the dataset.
- **`train:`** Path to the training images folder, relative to the root directory.
- **`val:`** Path to the validation images folder, relative to the root directory.
- **`names:`** A mapping of class indices to their names (e.g., `0: probe`).

The file is created in `datasets/yolo/data.yaml`.


In [12]:
%%writefile datasets/yolo/data.yaml

path: yolo # dataset root directory
train: images/train # training images (relative to 'path')
val: images/val # validation images (relative to 'path')
names:
    0: probe

Overwriting datasets/yolo/data.yaml


# Step 4: Train YOLO 11 Nano

### Key Actions:
1. A pretrained YOLO 11 Nano model (`yolo11n.pt`) is loaded.
2. Training is initiated using the following configurations:
   - Dataset: `datasets/yolo/data.yaml`
   - Number of epochs: 20
   - Image size: 640
   - Device: GPU (`cuda`)
   - Experiment name: `probe_train`
   - Random seed: 42
   - Enable training plots.

### Output:
After training, the model checkpoints and results are saved in the directory `runs/detect/probe_train`. This includes metrics, plots, and the final trained model ready for inference.

In [None]:
from ultralytics import YOLO

model = YOLO("yolo11n.pt")  # Load a pretrained model
results = model.train(
    data="datasets/yolo/data.yaml", 
    epochs=20, 
    imgsz=640, 
    device="cuda",
    name="probe_train",
    seed=42,
    plots=True
)

# Step 5: Validation

This step validates the trained model on the validation dataset to evaluate its performance.

### Key Actions:
1. Load the best-performing weights from training (`runs/detect/probe_train/weights/best.pt`).
2. Perform validation using the validation dataset defined in `data.yaml`.
3. Configure validation parameters:
   - Image size: 640
   - Device: GPU (`cuda`)
   - Experiment name: `probe_val`
   - Random seed: 42

### Output:
Validation results, including performance metrics, are saved in the directory `runs/detect/probe_val`.


In [None]:
from ultralytics import YOLO

model = YOLO("runs/detect/probe_train/weights/best.pt")

validation_results = model.val(
    data="datasets/yolo/data.yaml", 
    imgsz=640,
    device="cuda",
    name="probe_val",
    seed=42
)

# Step 6: Inference

This step uses the trained model to perform inference on new images or videos.

### Key Actions:
1. Load the trained model weights (`runs/detect/probe_train/weights/best.pt`).
2. Define the directory containing images and videos for inference (`datasets/yolo/images/val`).
3. Run inference using the model, and save the results to disk.
4. Configure output directory: `runs/detect/probe_inference`.

### Output:
Inference results are saved as images with detected objects in the directory `runs/detect/probe_inference`.


In [None]:
from ultralytics import YOLO
import os

model = YOLO("runs/detect/probe_train/weights/best.pt")

# Define path to directory containing images and videos for inference
source = "datasets/yolo/images/val"

# Run inference on the source
results = model(
    source,
    # stream=True # generator of Results objects
)

os.makedirs("runs/detect/probe_inference", exist_ok=True)

for index, result in enumerate(results):
    boxes = result.boxes  # Boxes object for bounding box outputs
    result.save(filename=f"runs/detect/probe_inference/result_{index}.jpg")  # save to disk