# Evaluating Your Own Custom Dataset with ibbi.Evaluator
This notebook demonstrates how to evaluate your **own custom dataset** using the `ibbi.Evaluator`.

A common use case, as highlighted by our reviewer, is an ecologist who has their own set of annotated images. This tutorial shows you what data format the `ibbi.Evaluator` expects and how to run an evaluation on it.

### The Required Data Structure

The `ibbi.Evaluator` expects an **iterable (like a list) of dictionaries**. Each dictionary in the list represents one image and its annotations.

Each dictionary **must** conform to the following structure:

```python
[
    {
        "image": <PIL.Image.Image object>,
        "objects": {
            "category": ["species_label_1", "species_label_2"],
            "bbox": [
                [x1, y1, w1, h1],  # BBox 1 in [x_min, y_min, width, height] format
                [x2, y2, w2, h2]   # BBox 2 in [x_min, y_min, width, height] format
            ]
        }
    },
    {
        "image": <PIL.Image.Image object_2>,
        "objects": { ... } # Annotations for the second image
    }
    # ... and so on for all images in your dataset
]
```

**Key Points:**

  * `"image"`: Must be a `PIL.Image.Image` object. You can load this from a file path using `Image.open(your_path)`.
  * `"objects"`: Must be a dictionary.
  * `"category"`: A `list` of strings, where each string is the species name for an object.
  * `"bbox"`: A `list` of lists. Each inner list must be in `[x_min, y_min, width, height]` format.

How you load your data (from a CSV, JSON, or XML files) is up to you, but it **must** be formatted into this list of dictionaries before being passed to the `Evaluator`.

-----

### 1\. Create a Mock Custom Dataset

To demonstrate, we will create an artificial dataset from scratch that matches this structure. We'll generate a few `PIL.Image` objects, draw some "beetles" (rectangles) on them, and store them in our required list format.



In [None]:
import numpy as np
from PIL import Image, ImageDraw

import ibbi

# --- 1. Create an artificial dataset ---
print("Creating a mock custom dataset in the required format...")

# This will be our final dataset, a list of dictionaries
my_custom_dataset = []

# Get some real species names to use as labels
species_list = ["Ips_acuminatus", "Hylurgus_ligniperda", "Tomicus_destruens", "Dryocoetes_autographus"]

# --- Image 1 (One Beetle) ---
img1 = Image.new("RGB", (640, 480), color="#334433")  # A dark green image
draw1 = ImageDraw.Draw(img1)
label1 = "Ips_acuminatus"
bbox1 = [100, 150, 50, 80]  # [x, y, w, h]
# Draw the rectangle: [x1, y1, x2, y2]
draw1.rectangle([bbox1[0], bbox1[1], bbox1[0] + bbox1[2], bbox1[1] + bbox1[3]], outline="yellow", width=3)
draw1.text((bbox1[0], bbox1[1] - 15), label1, fill="yellow")

# Add to our dataset
my_custom_dataset.append({"image": img1, "objects": {"category": [label1], "bbox": [bbox1]}})

# --- Image 2 (Two Beetles) ---
img2 = Image.new("RGB", (640, 480), color="#554433")  # A dark brown image
draw2 = ImageDraw.Draw(img2)
# Object 1
label2a = "Hylurgus_ligniperda"
bbox2a = [200, 250, 40, 60]
draw2.rectangle([bbox2a[0], bbox2a[1], bbox2a[0] + bbox2a[2], bbox2a[1] + bbox2a[3]], outline="cyan", width=3)
draw2.text((bbox2a[0], bbox2a[1] - 15), label2a, fill="cyan")
# Object 2
label2b = "Tomicus_destruens"
bbox2b = [400, 100, 55, 55]
draw2.rectangle([bbox2b[0], bbox2b[1], bbox2b[0] + bbox2b[2], bbox2b[1] + bbox2b[3]], outline="magenta", width=3)
draw2.text((bbox2b[0], bbox2b[1] - 15), label2b, fill="magenta")

# Add to our dataset
my_custom_dataset.append({"image": img2, "objects": {"category": [label2a, label2b], "bbox": [bbox2a, bbox2b]}})

# --- Image 3 (One Beetle, different species) ---
img3 = Image.new("RGB", (640, 480), color="#444444")  # A gray image
draw3 = ImageDraw.Draw(img3)
label3 = "Dryocoetes_autographus"
bbox3 = [300, 200, 70, 40]
draw3.rectangle([bbox3[0], bbox3[1], bbox3[0] + bbox3[2], bbox3[1] + bbox3[3]], outline="red", width=3)
draw3.text((bbox3[0], bbox3[1] - 15), label3, fill="red")

# Add to our dataset
my_custom_dataset.append({"image": img3, "objects": {"category": [label3], "bbox": [bbox3]}})

print(f"Mock dataset created with {len(my_custom_dataset)} images.")
print("\nShowing the first mock image:")
my_custom_dataset[0]["image"].show()  # This will open the image in a new window

print("\nVerifying data structure for the first image:")
print(my_custom_dataset[0])

### 2\. Create a Model and Evaluator

This step is the same as before. We load a pre-trained `ibbi` model and initialize the `Evaluator` with it.

In [None]:
# Let's evaluate the multi-class YOLOv8 model.
model = ibbi.create_model("yolov8x_bb_multi_class_detect_model", pretrained=True)
evaluator = ibbi.Evaluator(model)

### 3\. Unified Object Classification Performance (on Custom Data)

Now, we pass our `my_custom_dataset` list directly to the evaluator's methods.

In [None]:
print("\n--- Evaluating Object Classification Performance on Custom Dataset ---")
# We can test across multiple IoU thresholds for the mAP calculation.
iou_thresholds = np.arange(0.5, 1.0, 0.05)

# Pass our custom list directly to the evaluator
performance = evaluator.object_classification(my_custom_dataset, iou_thresholds=iou_thresholds)

# --- Display Key Metrics ---
print("\n--- Key Object-Classification Performance Metrics (Custom Dataset) ---")
# The 'per_class_AP_at_last_iou' will show AP for our mock species
print(f"Overall mAP: {performance['mAP']:.4f}")
print("Per-class AP scores:")
print(performance["per_class_AP_at_last_iou"])

### 4\. Embedding Quality (on Custom Data)

We can also evaluate the embedding quality using our custom dataset.

In [None]:
print("\n--- Evaluating Embedding Quality on Custom Dataset ---")
# This will perform UMAP, HDBSCAN, and then calculate clustering metrics.
# We'll use the 'object' level to get embeddings for each bounding box.
embedding_performance = evaluator.embeddings(my_custom_dataset, evaluation_level="object", min_cluster_size=2)

# --- Display Key Embedding Metrics ---
print("\n--- Key Embedding Performance Metrics (Custom Dataset) ---")
print("\nInternal Validation (how good are the clusters?):")
print(embedding_performance["internal_cluster_validation"])
print("\nExternal Validation (do clusters match true labels?):")
print(embedding_performance["external_cluster_validation"])
print("\nMantel Correlation (do clusters match phylogeny?):")
# Note: The Mantel test requires at least 3 unique species with valid data.
# Our mock dataset has 4 species, so it should run.
print(embedding_performance.get("mantel_correlation", "Not calculated (requires >= 3 species)"))