# Project Notes (Sanitized for Git)

This repository contains a **sanitized** version of the Gracity Insects YOLOv8 Classification notebooks.
All tenant-specific identifiers (bucket names, namespaces, OCIDs, local absolute paths) have been replaced by placeholders.

**Author:** Cristina Varas Menadas  
**Last updated:** 2026-02-19

> To run these notebooks, set the configuration values in the first "Configuration" section of each notebook.


# Gracity Insects â€” 03. Train YOLOv8n Classification Model

We train `yolov8n-cls`.


## Configuration

Update these variables for your tenancy/project.

- **Bucket**: `<BUCKET_NAME>`
- **Dataset prefix** (images): `<PROJECT_PREFIX>/v1/raw/datasets/insects_kaggle_v1/`
- **Labels prefix** (metadata/manifests): `<PROJECT_PREFIX>/v1/labels/insects_kaggle_v1/`
- **Runs prefix** (artifacts): `<PROJECT_PREFIX>/yolo/runs/insects_kaggle_v1/`

We intentionally keep **`test/` as validation** for this starter project (to match your current bucket structure).

## 3.1 Imports

In [None]:
from __future__ import annotations

import time
from pathlib import Path

from ultralytics import YOLO

## 3.2 Dataset paths (local)

In [None]:
DATASET_ROOT: str = "<LOCAL_PATH> Gracity/gracity-insects-yolo-cls/outputs/dataset"  # <-- update
dataset_root = Path(DATASET_ROOT)
train_dir = dataset_root / "train"
test_dir = dataset_root / "test"  # used as validation
assert train_dir.exists(), train_dir
assert test_dir.exists(), test_dir

## 3.3 Training config

In [None]:
MODEL_NAME: str = "yolov8n-cls.pt"
IMGSZ: int = 224
EPOCHS: int = 30
BATCH: int = 64
DEVICE: str = "0"  # set 'cpu' if no GPU
PROJECT_DIR: str = "./runs"
RUN_NAME: str = f"insects_kaggle_cls_{int(time.time())}"
SEED: int = 42

## 3.4 Train

Ultralytics expects `train/` and `val/`. We create a local `val/` symlink to `test/`.

In [None]:
val_alias = dataset_root / "val"
if not val_alias.exists():
    val_alias.symlink_to(test_dir, target_is_directory=True)
    print("Created symlink:", val_alias, "->", test_dir)

model = YOLO(MODEL_NAME)

results = model.train(
    data=str(dataset_root),
    imgsz=IMGSZ,
    epochs=EPOCHS,
    batch=BATCH,
    device=DEVICE,
    project=PROJECT_DIR,
    name=RUN_NAME,
    seed=SEED,
)
results

## 3.5 Artifacts

In [None]:
run_dir = Path(PROJECT_DIR) / RUN_NAME
print("Run dir:", run_dir)
print("Weights:", list((run_dir / "weights").glob("*")))