## 1. Required Libraries

Import necessary packages for:
- Deep learning (PyTorch)
- Image processing
- Metrics computation
- Visualization
- Data handling

# Semantic Segmentation Model Evaluation

This notebook performs comprehensive evaluation of the trained semantic segmentation model for peatland terrain classification. It provides detailed metrics and visualizations to assess model performance.

Key Components:
1. Quantitative Evaluation:
   - Pixel-wise accuracy
   - Per-class IoU (Intersection over Union)
   - Mean IoU across classes
   
2. Qualitative Analysis:
   - Side-by-side visualization of predictions
   - Color-coded class representation
   - Sample-wise inspection

The evaluation uses the held-out test set to ensure unbiased assessment of model performance.

In [None]:
# Standard library imports
import os
from pathlib import Path

# Deep learning and numerical computing
import torch
from torch.utils.data import Dataset, DataLoader
import segmentation_models_pytorch as smp
from torchmetrics import JaccardIndex

# Image processing and data handling
from PIL import Image
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

# Data augmentation
import albumentations as A
from albumentations.pytorch import ToTensorV2

# Progress tracking
from tqdm import tqdm

  from .autonotebook import tqdm as notebook_tqdm


## 2. Dataset Implementation

The evaluation dataset provides:

1. Data Loading:
   - Test set images and masks
   - Proper format conversion
   - Memory-efficient processing

2. Image Processing:
   - RGB image handling
   - Label mask loading
   - Transformation application

3. Evaluation Features:
   - Consistent ordering
   - Batch processing support
   - Ground truth alignment

In [None]:
class PeatlandDataset(Dataset):
    """Custom PyTorch Dataset for evaluating peatland segmentation models.
    
    This dataset handles the loading and processing of test images and their
    corresponding segmentation masks for model evaluation.
    
    Args:
        images_dir (str or Path): Directory containing input RGB images
        masks_dir (str or Path): Directory containing segmentation masks
        transform (callable, optional): Transformations to apply to image-mask pairs
    """
    def __init__(self, images_dir, masks_dir, transform=None):
        # Initialize paths and transforms
        self.images_dir = Path(images_dir)
        self.masks_dir = Path(masks_dir)
        self.transform = transform
        
        # Ensure consistent file ordering
        self.image_filenames = sorted(os.listdir(self.images_dir))

    def __len__(self):
        """Returns the total number of test samples."""
        return len(self.image_filenames)

    def __getitem__(self, idx):
        """Retrieves a test image-mask pair by index.
        
        Args:
            idx (int): Index of the image-mask pair to retrieve
            
        Returns:
            tuple: (image, mask) where image is a transformed RGB tensor and
                  mask is a long tensor with class labels
        """
        # Get file paths
        img_name = self.image_filenames[idx]
        img_path = self.images_dir / img_name
        mask_path = self.masks_dir / img_name
        
        # Load and preprocess data
        image = np.array(Image.open(img_path).convert("RGB"))
        mask = np.array(Image.open(mask_path))
        
        # Apply transformations if specified
        if self.transform:
            augmented = self.transform(image=image, mask=mask)
            image = augmented['image']
            mask = augmented['mask']
            
        # Ensure proper tensor type for evaluation
        mask = mask.long()
        return image, mask

## 3. Evaluation Transforms

Configure image transformations for consistent evaluation:

1. Image Sizing:
   - Fixed dimensions: 480x640
   - Consistent aspect ratio
   - No random augmentations

2. Normalization:
   - ImageNet mean and std values
   - Scale to [0,1] range
   - Channel-wise processing

3. Tensor Conversion:
   - PyTorch format
   - Proper channel ordering
   - Memory layout optimization

In [None]:
# Define image dimensions for evaluation
IMG_HEIGHT = 480
IMG_WIDTH = 640

# ImageNet normalization parameters
IMAGENET_MEAN = [0.485, 0.456, 0.406]
IMAGENET_STD = [0.229, 0.224, 0.225]

# Evaluation transform pipeline
eval_transform = A.Compose([
    A.Resize(height=IMG_HEIGHT, width=IMG_WIDTH),           # Consistent sizing
    A.Normalize(mean=IMAGENET_MEAN, std=IMAGENET_STD,       # Standardization
               max_pixel_value=255.0),
    ToTensorV2(),                                          # Convert to PyTorch tensor
])

## 4. Evaluation Configuration

Setup evaluation parameters:

1. Model Selection:
   - Specific training run
   - Model checkpoint path
   - Batch processing settings

2. Hardware Configuration:
   - Automatic device selection
   - GPU acceleration when available
   - Memory optimization

3. Path Management:
   - Metrics directory structure
   - Results organization
   - Output path handling

In [None]:
# Model identification and paths
RUN_NAME = "2025-08-03_15-56-59"  # Timestamp of the training run to evaluate
METRICS_DIR = Path("./metrics") / RUN_NAME
MODEL_PATH = METRICS_DIR / "peatland_segmentation_model.pth"

# Evaluation parameters
BATCH_SIZE = 4  # Batch size for efficient processing

# Hardware configuration
if torch.cuda.is_available():
    DEVICE = "cuda"      # Use GPU if available
elif torch.backends.mps.is_available():
    DEVICE = "mps"       # Use Apple Silicon if available
else:
    DEVICE = "cpu"       # Fall back to CPU

# Display configuration
print(f"Evaluating model from run: {RUN_NAME}")
print(f"Model path: {MODEL_PATH}")
print(f"Using device: {DEVICE}")

Evaluating model from run: 2025-08-03_15-56-59
Model path: metrics/2025-08-03_15-56-59/peatland_segmentation_model.pth
Using device: mps


## 5. Test Dataset Preparation

Initialize the test dataset:

1. Data Organization:
   - Dedicated test set directory
   - Separate image and mask folders
   - Structured file hierarchy

2. DataLoader Configuration:
   - Sequential processing
   - Memory pinning for GPU
   - Efficient batch handling

3. Testing Pipeline:
   - Transform application
   - Batch creation
   - Device optimization

In [None]:
# Define test data paths
BASE_PROCESSED_DIR = Path("../data/processed/segmentation")
TEST_IMG_DIR = BASE_PROCESSED_DIR / "test" / "images"
TEST_MASK_DIR = BASE_PROCESSED_DIR / "test" / "masks"

# Initialize test dataset with transforms
test_dataset = PeatlandDataset(
    images_dir=TEST_IMG_DIR,
    masks_dir=TEST_MASK_DIR,
    transform=eval_transform
)

# Create test dataloader
test_loader = DataLoader(
    dataset=test_dataset,
    batch_size=BATCH_SIZE,
    shuffle=False,             # Sequential processing for testing
    num_workers=0,            # Single process data loading
    pin_memory=(DEVICE == "cuda")  # Pin memory for faster GPU transfer
)

print(f"Loaded {len(test_dataset)} samples for testing.")

Loaded 204 samples for testing.


## 6. Model Initialization

Load the trained model:

1. Architecture Setup:
   - U-Net with ResNet34 backbone
   - Multi-class segmentation head
   - No pretraining weights

2. State Loading:
   - Checkpoint restoration
   - Device placement
   - Evaluation mode

3. Model Configuration:
   - RGB input channels
   - Five-class output
   - Memory optimization

In [None]:
# Initialize model architecture
model = smp.Unet(
    encoder_name="resnet34",     # Encoder backbone
    encoder_weights=None,        # No pretraining for evaluation
    in_channels=3,              # RGB input
    classes=5,                  # Number of segmentation classes
).to(DEVICE)                    # Move to appropriate device

# Load trained model weights
model.load_state_dict(torch.load(MODEL_PATH, map_location=torch.device(DEVICE)))

# Set model to evaluation mode
model.eval()

print("Model loaded successfully.")

Model loaded successfully.


## 7. Model Evaluation

Execute comprehensive evaluation:

1. Metric Computation:
   - Per-class IoU
   - Mean IoU (mIoU)
   - Pixel-wise accuracy

2. Class Handling:
   - Five-class segmentation
   - Balanced evaluation
   - Ignore class consideration

3. Batch Processing:
   - Memory-efficient inference
   - Accumulated metrics
   - Progress tracking

In [None]:
# Define evaluation parameters
NUM_CLASSES = 5
CLASS_NAMES = ["PATH", "NATURAL_GROUND", "TREE", "VEGETATION", "IGNORE"]

# Initialize metric calculators
jaccard = JaccardIndex(
    task="multiclass",
    num_classes=NUM_CLASSES,
    average=None
).to(DEVICE)

# Initialize accuracy tracking
total_correct_pixels = 0
total_pixels = 0

# Evaluation loop
with torch.no_grad():
    for images, masks in tqdm(test_loader, desc="Evaluating on Test Set"):
        # Move data to device
        images, masks = images.to(DEVICE), masks.to(DEVICE)
        
        # Generate predictions
        outputs = model(images)
        preds = torch.argmax(outputs, dim=1)
        
        # Update metrics
        jaccard.update(preds, masks)
        total_correct_pixels += (preds == masks).sum().item()
        total_pixels += torch.numel(masks)

# Calculate final metrics
pixel_accuracy = (total_correct_pixels / total_pixels) * 100
iou_per_class = jaccard.compute()
mean_iou = iou_per_class.mean()

# Display results
print("\n--- Evaluation Complete ---")
print(f"Overall Pixel Accuracy: {pixel_accuracy:.2f}%")
print(f"Mean IoU (mIoU): {mean_iou:.4f}")
print("\nIoU per Class:")
for i, class_name in enumerate(CLASS_NAMES):
    print(f"  - {class_name}: {iou_per_class[i]:.4f}")

Evaluating on Test Set: 100%|██████████| 51/51 [00:24<00:00,  2.11it/s]



--- Evaluation Complete ---
Overall Pixel Accuracy: 88.13%
Mean IoU (mIoU): 0.7111

IoU per Class:
  - PATH: 0.6828
  - NATURAL_GROUND: 0.7905
  - TREE: 0.4184
  - VEGETATION: 0.7965
  - IGNORE: 0.8675


## 8. Metrics Storage

Save evaluation results:

1. Metric Organization:
   - Overall accuracy
   - Mean IoU
   - Per-class performance

2. Data Formatting:
   - Structured DataFrame
   - Clear metric naming
   - Consistent value types

3. File Management:
   - CSV format storage
   - Run-specific directory
   - Organized hierarchy

In [None]:
# Prepare metrics data structure
metrics_data = {
    'Metric': ['Pixel Accuracy', 'Mean IoU'] + [f'IoU_{name}' for name in CLASS_NAMES],
    'Value': [pixel_accuracy, mean_iou.item()] + iou_per_class.cpu().numpy().tolist()
}

# Create organized DataFrame
metrics_df = pd.DataFrame(metrics_data)

# Save metrics to CSV
output_csv_path = METRICS_DIR / "test_set_evaluation.csv"
metrics_df.to_csv(output_csv_path, index=False)

print(f"\nEvaluation metrics saved to: {output_csv_path}")


Evaluation metrics saved to: metrics/2025-08-03_15-56-59/test_set_evaluation.csv


## 9. Results Visualization

Generate qualitative evaluation visualizations:

1. Sample Selection:
   - Random test set images
   - Multi-sample comparison
   - Representative coverage

2. Visualization Components:
   - Original RGB image
   - Ground truth segmentation
   - Model predictions

3. Output Generation:
   - Color-coded classes
   - Side-by-side comparison
   - High-resolution figures

4. Color Scheme:
   - Path: Purple
   - Natural Ground: Light Purple
   - Tree: Light Blue
   - Vegetation: Yellow
   - Ignore: Black

In [None]:
def visualize_predictions(dataset, model, device, num_samples=5):
    """Generates and saves visualization of model predictions.
    
    Args:
        dataset: The test dataset containing image-mask pairs
        model: The trained segmentation model
        device: Device to run inference on
        num_samples: Number of samples to visualize
    """
    # Create visualization directory
    vis_dir = METRICS_DIR / "visualizations"
    vis_dir.mkdir(exist_ok=True)
    
    # Define class color map (BGR format for OpenCV)
    color_map = np.array([
        [60, 16, 152],     # Path (Purple)
        [132, 41, 246],    # Natural Ground (Light Purple)
        [110, 193, 228],   # Tree (Light Blue)
        [254, 221, 58],    # Vegetation (Yellow)
        [0, 0, 0]          # Ignore (Black)
    ], dtype=np.uint8)

    # Generate predictions and visualizations
    model.eval()
    with torch.no_grad():
        for i in range(num_samples):
            # Load and process sample
            image_tensor, gt_mask = dataset[i]
            input_tensor = image_tensor.unsqueeze(0).to(device)
            
            # Generate prediction
            output = model(input_tensor)
            pred_mask = torch.argmax(output, dim=1).squeeze(0).cpu().numpy()
            
            # Prepare visualization data
            display_image = image_tensor.permute(1, 2, 0).cpu().numpy()
            display_image = (display_image * IMAGENET_STD) + IMAGENET_MEAN
            display_image = np.clip(display_image, 0, 1)
            
            # Apply color mapping
            gt_mask_color = color_map[gt_mask.cpu().numpy()]
            pred_mask_color = color_map[pred_mask]

            # Create comparison plot
            fig, ax = plt.subplots(1, 3, figsize=(20, 6))
            fig.suptitle(f"Sample {i}", fontsize=16)
            
            # Original image
            ax[0].imshow(display_image)
            ax[0].set_title("Original Image")
            ax[0].axis('off')
            
            # Ground truth
            ax[1].imshow(gt_mask_color)
            ax[1].set_title("Ground Truth Mask")
            ax[1].axis('off')
            
            # Model prediction
            ax[2].imshow(pred_mask_color)
            ax[2].set_title("Model Prediction")
            ax[2].axis('off')
            
            # Save visualization
            plt.savefig(vis_dir / f"sample_{i}_comparison.png")
            plt.close()

    print(f"Saved {num_samples} visualization samples to: {vis_dir}")

# Generate visualizations
visualize_predictions(test_dataset, model, DEVICE)

Saved 5 visualization samples to: metrics/2025-08-03_15-56-59/visualizations
