# YOLO Object Detection Evaluation for Peatland Navigation

## Overview
This notebook implements an evaluation framework for the YOLO-based object detection model trained for peatland navigation. The evaluation systematically assesses the model's ability to detect and localize key navigational elements in peatland environments.

## Evaluation Framework

### 1. Detection Performance Metrics
- **Mean Average Precision (mAP)**
  - Multiple IoU thresholds (0.50-0.95)
  - Per-class performance analysis
  - Precision-Recall characteristics

### 2. Testing Methodology
- Dedicated test set validation
- Confidence threshold analysis
- Non-Maximum Suppression assessment
- Real-world scenario testing

### 3. Analysis Components
- Statistical performance metrics
- Visual result inspection
- Error pattern analysis
- Performance bottleneck identification

## Output Organization
- Structured metric logging
- Visualization generation
- Detailed performance reports
- Error analysis documentation

## Significance
This evaluation provides critical insights into the model's reliability for autonomous navigation tasks in peatland environments, where accurate object detection is crucial for safe and efficient operation.

## 1. Imports

In [1]:
# Object detection framework
from ultralytics import YOLO

# File and path handling
from pathlib import Path

## 2. Evaluation Configuration

In [5]:
# Model identification
RUN_NAME = "finetuned"     # Name of the training run
PROJECT_DIR = "./training/metrics/detection"       # Base metrics directory

# Path configuration
MODEL_PATH = Path(PROJECT_DIR) / RUN_NAME / "weights/best.pt"     # Best model weights
DATA_YAML_PATH = Path("./data/processed/finetuning_v1/data.yaml")    # Dataset config

# Display configuration
print(f"Loading model from: {MODEL_PATH}")
print(f"Using dataset configuration: {DATA_YAML_PATH}")

Loading model from: training/metrics/detection/finetuned/weights/best.pt
Using dataset configuration: data/processed/finetuning_v1/data.yaml


## 3. Model Loading

In [6]:
# Initialize YOLO model with trained weights
model = YOLO(MODEL_PATH)

print("Trained model loaded successfully.")

Trained model loaded successfully.


## 4. Model Evaluation Process

### Evaluation Protocol

1. **Test Set Validation**
   - Independent test set assessment
   - Unbiased performance measurement
   - Real-world scenario simulation
   - Batch processing for efficiency

2. **Performance Metrics**
   - **Mean Average Precision (mAP)**
     * IoU thresholds from 0.50 to 0.95
     * Standard COCO evaluation protocol
     * Class-specific performance analysis
   
   - **Precision Metrics**
     * Per-class precision curves
     * Confidence threshold analysis
     * False positive analysis
   
   - **Recall Metrics**
     * Per-class recall curves
     * Miss rate analysis
     * Scale-based performance

3. **Results Documentation**
   - Automated metric logging
   - Performance visualization
   - Statistical analysis
   - Error pattern documentation

### Generated Artifacts

1. **Metric Files**
   - Detailed CSV reports
   - Performance summaries
   - Class-wise statistics

2. **Visualizations**
   - Precision-Recall curves
   - Confusion matrices
   - Example detections
   - Error case analysis

3. **Analysis Tools**
   - Confidence analysis
   - Scale sensitivity study
   - Occlusion impact assessment

The evaluation provides comprehensive insights into model behavior and reliability for deployment decisions.

In [7]:
# Execute model validation on test set
metrics = model.val(
    data=str(DATA_YAML_PATH),          # Dataset configuration
    split='test',                      # Use test set for evaluation
    name=f'{RUN_NAME}_test_evaluation' # Output directory name
)

# Display evaluation results
print("\n--- Evaluation Complete ---")

# Print key metrics
print(f"mAP50-95: {metrics.box.map}")    # Mean AP across IoU thresholds
print(f"mAP50: {metrics.box.map50}")     # AP at IoU=0.50

# Indicate results location
print(f"\nDetailed metrics and plots saved in 'metrics/detection/{RUN_NAME}_test_evaluation'")

Ultralytics 8.3.173 🚀 Python-3.11.8 torch-2.7.1 CPU (Apple M2 Pro)
YOLO11m summary (fused): 125 layers, 20,032,345 parameters, 0 gradients, 67.7 GFLOPs
YOLO11m summary (fused): 125 layers, 20,032,345 parameters, 0 gradients, 67.7 GFLOPs
[34m[1mval: [0mFast image access ✅ (ping: 0.0±0.0 ms, read: 152.9±16.9 MB/s, size: 767.0 KB)
[34m[1mval: [0mFast image access ✅ (ping: 0.0±0.0 ms, read: 152.9±16.9 MB/s, size: 767.0 KB)


[34m[1mval: [0mScanning /Users/stahlma/Desktop/01_Studium/09_Vision_Project/peatland_navigation/data/processed/finetuning_v1/labels/test... 5 images, 0 backgrounds, 0 corrupt: 100%|██████████| 5/5 [00:00<00:00, 734.66it/s]

[34m[1mval: [0mNew cache created: /Users/stahlma/Desktop/01_Studium/09_Vision_Project/peatland_navigation/data/processed/finetuning_v1/labels/test.cache



                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 1/1 [00:00<00:00,  1.47it/s]



                   all          5          9      0.821      0.592      0.796      0.567
                 bench          3          3      0.494      0.333      0.665       0.52
                  cone          2          2          1      0.943      0.995       0.73
                  sign          4          4      0.969        0.5      0.728      0.453
Speed: 0.7ms preprocess, 113.8ms inference, 0.0ms loss, 1.5ms postprocess per image
Results saved to [1mruns/detect/finetuned_test_evaluation[0m

--- Evaluation Complete ---
mAP50-95: 0.5673932811019154
mAP50: 0.7961111111111112

Detailed metrics and plots saved in 'metrics/detection/finetuned_test_evaluation'
                 bench          3          3      0.494      0.333      0.665       0.52
                  cone          2          2          1      0.943      0.995       0.73
                  sign          4          4      0.969        0.5      0.728      0.453
Speed: 0.7ms preprocess, 113.8ms inference, 0.0ms loss, 1.5ms p

### Results Analysis and Interpretation

The evaluation generates a comprehensive set of metrics that provide deep insights into the model's detection capabilities. Understanding these metrics is crucial for assessing model deployment readiness.

#### Key Performance Indicators

1. **mAP50-95 (Primary Metric)**
   - **Definition**: Mean Average Precision across IoU thresholds from 0.50 to 0.95
   - **Significance**: 
     * Comprehensive measure of detection quality
     * Accounts for localization accuracy
     * Standard COCO evaluation metric
   - **Interpretation**:
     * Higher values indicate better overall performance
     * Considers both precise and loose detections
     * More stringent than single-threshold metrics

2. **mAP50 (Secondary Metric)**
   - **Definition**: Average Precision at IoU threshold of 0.50
   - **Significance**:
     * Standard benchmark in object detection
     * More lenient than mAP50-95
     * Useful for applications with lower precision requirements
   - **Interpretation**:
     * Higher values indicate good general detection
     * May miss subtle localization errors
     * Suitable for initial model assessment

#### Additional Analysis Components

1. **Precision-Recall Curves**
   - Show detection performance across confidence thresholds
   - Identify optimal confidence settings
   - Reveal trade-offs between precision and recall

2. **Confusion Matrices**
   - Highlight class-specific performance
   - Identify common misclassifications
   - Guide potential model improvements

3. **Example Detections**
   - Visual validation of model behavior
   - Error case documentation
   - Performance verification

All detailed results are stored in the metrics directory for thorough analysis and documentation.