# üìä YOLOv11 Model Evaluation Pipeline

This notebook evaluates the performance of a trained YOLO model against a test dataset.

**What this notebook does:**
1.  Loads your trained weights (`.pt` file).
2.  Runs validation on the test set defined in `TestData.yaml`.
3.  Calculates key metrics: **Precision, Recall, F1 Score, and mAP**.
4.  Visualizes the **Confusion Matrix** to show detection errors.

## 1. Prerequisites & Dataset Structure

Ensure your Google Drive is mounted and your files are organized.

**1. Directory Structure:**
The evaluation script needs to find the `TestData.yaml` and the images it points to.
```text
/content/drive/MyDrive/datasets/
‚îÇ
‚îú‚îÄ‚îÄ TestData.yaml          <-- Config file
‚îú‚îÄ‚îÄ best_xl.pt             <-- Your trained model (move it here or update path)
‚îî‚îÄ‚îÄ test/                  <-- (Or 'val') Images to test against
    ‚îú‚îÄ‚îÄ images/
    ‚îî‚îÄ‚îÄ labels/

In [None]:
### **Cell 3: [Code] - Installation & Setup**
# @title 2. Install Dependencies & Mount Drive
%pip install ultralytics -q

import torch
import os
import glob
from ultralytics import YOLO
from google.colab import drive
from IPython.display import Image, display

# Mount Google Drive
drive.mount('/content/drive')

# Check GPU
print(f"‚úÖ GPU Available: {torch.cuda.is_available()}")

In [None]:
# @title 3. Configuration
# --- USER INPUTS ---
# 1. Base Directory
WORKING_DIR = '/content/drive/MyDrive/datasets'

# 2. Model Name/Path
# If you just finished training, the path might be inside 'runs/detect/...'
# Otherwise, copy your 'best.pt' to the WORKING_DIR and define it here.
MODEL_WEIGHTS = 'best_xl.pt'

# 3. Dataset Config
DATA_YAML = 'TestData.yaml'

# --- SETUP ---
if os.path.exists(WORKING_DIR):
    os.chdir(WORKING_DIR)
    print(f"‚úÖ Working Directory set to: {os.getcwd()}")

    if not os.path.exists(MODEL_WEIGHTS):
        print(f"‚ö†Ô∏è WARNING: Model file '{MODEL_WEIGHTS}' not found in current directory.")
        print("   Please upload your .pt file or update the MODEL_WEIGHTS path.")
    else:
        print(f"‚úÖ Found model: {MODEL_WEIGHTS}")
else:
    print(f"‚ùå Error: Path not found: {WORKING_DIR}")

## 4. Run Evaluation

We will now run the `model.val()` command. This compares the model's predictions against the ground truth labels in your dataset.

**Understanding the Metrics:**
* **Precision:** How accurate are the positive predictions? (Low precision = lots of False Positives/Hallucinations).
* **Recall:** How many actual objects did we find? (Low recall = lots of Missed Detections).
* **mAP50:** Mean Average Precision at 50% overlap. This is the standard "grade" for object detection models.
* **F1 Score:** A balanced score combining Precision and Recall.

In [None]:
# @title Run Validation & Calculate Metrics
if os.path.exists(MODEL_WEIGHTS):
    # 1. Load the model
    print(f"üîÑ Loading model: {MODEL_WEIGHTS}...")
    model = YOLO(MODEL_WEIGHTS)

    # 2. Run Validation
    # We set split='test' to use the test set (if defined in YAML), otherwise it uses 'val'
    print("üöÄ Running validation (this may take a moment)...")
    metrics = model.val(data=DATA_YAML, split='test', verbose=True)

    # 3. Extract Metrics (Replicating logic from Eval.ipynb)
    # YOLOv11 stores results in the 'box' attribute
    map50 = metrics.box.map50
    map50_95 = metrics.box.map
    precision = metrics.box.mp  # Mean Precision
    recall = metrics.box.mr     # Mean Recall

    # Calculate F1 Score manually
    # Formula: 2 * (P * R) / (P + R)
    if (precision + recall) > 0:
        f1_score = 2 * (precision * recall) / (precision + recall)
    else:
        f1_score = 0.0

    # 4. Print Summary
    print("\n" + "="*40)
    print("üèÜ FINAL EVALUATION REPORT")
    print("="*40)
    print(f"Precision:   {precision:.4f}")
    print(f"Recall:      {recall:.4f}")
    print(f"F1 Score:    {f1_score:.4f}")
    print(f"mAP @ 0.5:   {map50:.4f}")
    print(f"mAP @ 0.5:0.95: {map50_95:.4f}")
    print("="*40)

else:
    print("‚ùå Cannot run evaluation: Model file not found.")

## 5. Visualizations

YOLO automatically generates several plots to help analyze performance.

* **Confusion Matrix:** Shows where the model is confused. (e.g., Is it mistaking an Elephant for the background?).
* **PR_curve:** Precision-Recall Curve.
* **F1_curve:** Shows how the F1 score changes with different confidence thresholds.

In [None]:
# @title Display Evaluation Plots
# Find the most recent validation folder created by YOLO
# usually runs/detect/val, val2, val3 etc.
output_dir = 'runs/detect'
if os.path.exists(output_dir):
    val_folders = sorted([f for f in os.listdir(output_dir) if 'val' in f],
                         key=lambda x: os.path.getmtime(os.path.join(output_dir, x)))

    if val_folders:
        latest_val = os.path.join(output_dir, val_folders[-1])
        print(f"üìÇ Displaying plots from: {latest_val}")

        # List of standard YOLO plot names
        plots = ['confusion_matrix.png', 'F1_curve.png', 'PR_curve.png', 'labels.jpg']

        for plot_name in plots:
            path = os.path.join(latest_val, plot_name)
            if os.path.exists(path):
                print(f"\n--- {plot_name} ---")
                display(Image(filename=path, width=600))
    else:
        print("‚ö†Ô∏è No validation folders found in runs/detect/")
else:
    print("‚ö†Ô∏è 'runs/detect' folder not found.")

In [None]:
# @title 6. Test on a Single Random Image
# Run this to visually verify a detection on one image from your dataset

import random

# Get list of images
test_images_path = os.path.join(WORKING_DIR, 'test/images') # Update if your images are elsewhere
if not os.path.exists(test_images_path):
    test_images_path = os.path.join(WORKING_DIR, 'val/images')

if os.path.exists(test_images_path):
    images = glob.glob(os.path.join(test_images_path, '*.jpg')) + glob.glob(os.path.join(test_images_path, '*.png'))

    if images:
        # Pick random image
        random_img = random.choice(images)
        print(f"üîé Detecting on: {os.path.basename(random_img)}")

        # Predict
        results = model.predict(random_img, conf=0.5)

        # Show result
        for r in results:
            im_array = r.plot()  # plot a BGR numpy array of predictions
            im_rgb = cv2.cvtColor(im_array, cv2.COLOR_BGR2RGB)
            display(Image(data=cv2.imencode('.jpg', im_array)[1].tobytes(), width=600))
    else:
        print("No images found to test.")
else:
    print(f"Image folder not found: {test_images_path}")