In [1]:
!pip install roboflow

from roboflow import Roboflow
rf = Roboflow(api_key="qxxPbuRLdmGnVgKsd0WL")
project = rf.workspace("machinelearning-u4gmu").project("cracks-3ii36-jivvr")
version = project.version(1)
dataset = version.download("yolov12")
                

loading Roboflow workspace...
loading Roboflow project...


Downloading Dataset Version Zip in cracks-1 to yolov12:: 100%|██████████| 300888/300888 [06:05<00:00, 823.90it/s] 





Extracting Dataset Version Zip to cracks-1 in yolov12:: 100%|██████████| 10750/10750 [00:14<00:00, 750.03it/s] 


In [2]:
!pip install roboflow

from roboflow import Roboflow
rf = Roboflow(api_key="qxxPbuRLdmGnVgKsd0WL")
project = rf.workspace("machinelearning-u4gmu").project("drywall-join-detect-ioimg")
version = project.version(2)
dataset = version.download("yolov12")
                

loading Roboflow workspace...
loading Roboflow project...


Downloading Dataset Version Zip in Drywall-Join-Detect-2 to yolov12:: 100%|██████████| 69930/69930 [01:09<00:00, 1004.78it/s]





Extracting Dataset Version Zip to Drywall-Join-Detect-2 in yolov12:: 100%|██████████| 4983/4983 [00:04<00:00, 1024.35it/s]


In [None]:
# Two-Stage Pipeline: YOLOv8 Detection + SAM Segmentation

This notebook implements a complete pipeline with two main parts:
- **Part A: Training Phase** - Train YOLOv8 on merged datasets (one-time process)
- **Part B: Inference Pipeline** - Use trained model + SAM for prompt-based segmentation (user experience)

## Part A: Training Phase (One-Time Process)

### Step 1: Install Required Dependencies

In [None]:
!pip install ultralytics segment-anything opencv-python matplotlib pyyaml shutil

### Step 2: Merge Datasets

We'll merge the two datasets (cracks-1 and Drywall-Join-Detect-2) into a unified dataset with two classes.

In [None]:
import os
import shutil
import yaml
from pathlib import Path

# Define paths
cracks_dataset = 'cracks-1'
drywall_dataset = 'Drywall-Join-Detect-2'
merged_dataset = 'merged_dataset'

# Create merged dataset directory structure
for split in ['train', 'valid', 'test']:
    os.makedirs(f'{merged_dataset}/{split}/images', exist_ok=True)
    os.makedirs(f'{merged_dataset}/{split}/labels', exist_ok=True)

# Function to copy and relabel dataset
def merge_dataset(source_dir, target_dir, class_id):
    """
    Copy images and labels from source to target, updating class IDs
    source_dir: Path to source dataset (e.g., 'cracks-1')
    target_dir: Path to merged dataset (e.g., 'merged_dataset')
    class_id: New class ID (0 for cracks, 1 for drywall-join)
    """
    for split in ['train', 'valid', 'test']:
        # Copy images
        src_images = f'{source_dir}/{split}/images'
        dst_images = f'{target_dir}/{split}/images'
        
        if os.path.exists(src_images):
            for img_file in os.listdir(src_images):
                src_path = os.path.join(src_images, img_file)
                dst_path = os.path.join(dst_images, img_file)
                if os.path.isfile(src_path):
                    shutil.copy2(src_path, dst_path)
        
        # Copy and update labels
        src_labels = f'{source_dir}/{split}/labels'
        dst_labels = f'{target_dir}/{split}/labels'
        
        if os.path.exists(src_labels):
            for label_file in os.listdir(src_labels):
                src_path = os.path.join(src_labels, label_file)
                dst_path = os.path.join(dst_labels, label_file)
                
                if os.path.isfile(src_path):
                    # Read and update class IDs
                    with open(src_path, 'r') as f:
                        lines = f.readlines()
                    
                    updated_lines = []
                    for line in lines:
                        parts = line.strip().split()
                        if parts:
                            # Update class ID (first element)
                            parts[0] = str(class_id)
                            updated_lines.append(' '.join(parts) + '\n')
                    
                    # Write updated labels
                    with open(dst_path, 'w') as f:
                        f.writelines(updated_lines)

# Merge datasets
# Class 0: cracks
# Class 1: drywall-join
print("Merging cracks dataset (class 0)...")
merge_dataset(cracks_dataset, merged_dataset, class_id=0)

print("Merging drywall-join dataset (class 1)...")
merge_dataset(drywall_dataset, merged_dataset, class_id=1)

# Create data.yaml for merged dataset
data_config = {
    'train': f'../{merged_dataset}/train/images',
    'val': f'../{merged_dataset}/valid/images',
    'test': f'../{merged_dataset}/test/images',
    'nc': 2,
    'names': ['crack', 'drywall-join']
}

with open(f'{merged_dataset}/data.yaml', 'w') as f:
    yaml.dump(data_config, f, default_flow_style=False)

print("\nDataset merge complete!")
print(f"Merged dataset location: {merged_dataset}")
print(f"Classes: {data_config['names']}")

### Step 3: Train YOLOv8 Model

Train the YOLOv8 model on the merged dataset. This creates our specialized object detector.

In [None]:
from ultralytics import YOLO

# Load a pre-trained YOLOv8 model
model = YOLO('yolov8n.pt')  # Using nano model for faster training, can use yolov8s.pt, yolov8m.pt for better accuracy

# Train the model
results = model.train(
    data=f'{merged_dataset}/data.yaml',
    epochs=100,  # Adjust based on your needs
    imgsz=640,
    batch=16,
    name='crack_drywall_detector',
    patience=20,  # Early stopping patience
    save=True,
    device=0  # Use GPU if available, otherwise set to 'cpu'
)

print("\nTraining complete!")
print(f"Best model saved at: runs/detect/crack_drywall_detector/weights/best.pt")

### Step 4: Validate the Trained Model

Check the performance of the trained model.

In [None]:
# Load the best trained model
best_model = YOLO('runs/detect/crack_drywall_detector/weights/best.pt')

# Validate the model
metrics = best_model.val()

print(f"\nValidation Results:")
print(f"mAP50: {metrics.box.map50}")
print(f"mAP50-95: {metrics.box.map}")

---

## Part B: Inference Pipeline (User Experience)

### Step 5: Setup SAM Model

Download and initialize the Segment Anything Model (SAM).

In [None]:
import torch
import cv2
import numpy as np
from segment_anything import sam_model_registry, SamPredictor
import matplotlib.pyplot as plt

# Download SAM checkpoint (run once)
# You can download from: https://github.com/facebookresearch/segment-anything#model-checkpoints
# For this example, we'll use sam_vit_h_4b8939.pth (ViT-H SAM model)

# Initialize SAM
sam_checkpoint = "sam_vit_h_4b8939.pth"  # Update path if needed
model_type = "vit_h"

device = "cuda" if torch.cuda.is_available() else "cpu"

sam = sam_model_registry[model_type](checkpoint=sam_checkpoint)
sam.to(device=device)
sam_predictor = SamPredictor(sam)

print(f"SAM model loaded on {device}")

### Step 6: Implement Prompt Mapping System

Map user prompts to target classes.

In [None]:
class PromptMapper:
    """
    Maps user text prompts to specific class names that YOLO understands
    """
    def __init__(self):
        # Define mapping rules for each class
        self.mappings = {
            'crack': [
                'crack', 'cracks', 'fracture', 'fissure', 'split',
                'segment crack', 'detect crack', 'find crack',
                'show crack', 'identify crack'
            ],
            'drywall-join': [
                'drywall', 'joint', 'join', 'tape', 'taping area',
                'seam', 'drywall seam', 'drywall joint', 'joint/tape',
                'segment taping area', 'segment joint', 'segment drywall',
                'detect joint', 'find tape', 'show seam'
            ]
        }
        
        # Flatten for easy lookup
        self.prompt_to_class = {}
        for class_name, prompts in self.mappings.items():
            for prompt in prompts:
                self.prompt_to_class[prompt.lower()] = class_name
    
    def map_prompt(self, user_prompt):
        """
        Convert user prompt to target class
        
        Args:
            user_prompt (str): User's text input
            
        Returns:
            str: Target class name ('crack' or 'drywall-join') or None if no match
        """
        user_prompt = user_prompt.lower().strip()
        
        # Direct match
        if user_prompt in self.prompt_to_class:
            return self.prompt_to_class[user_prompt]
        
        # Partial match - check if any keyword is in the prompt
        for keyword, class_name in self.prompt_to_class.items():
            if keyword in user_prompt:
                return class_name
        
        return None
    
    def get_class_id(self, class_name):
        """
        Get numeric class ID for a class name
        
        Args:
            class_name (str): Class name ('crack' or 'drywall-join')
            
        Returns:
            int: Class ID (0 for crack, 1 for drywall-join)
        """
        class_mapping = {
            'crack': 0,
            'drywall-join': 1
        }
        return class_mapping.get(class_name)

# Initialize the prompt mapper
prompt_mapper = PromptMapper()

# Test the mapper
test_prompts = [
    "segment taping area",
    "segment joint/tape",
    "segment drywall seam",
    "find cracks",
    "detect crack"
]

print("Prompt Mapping Examples:")
for prompt in test_prompts:
    target_class = prompt_mapper.map_prompt(prompt)
    class_id = prompt_mapper.get_class_id(target_class)
    print(f"  '{prompt}' -> {target_class} (ID: {class_id})")

### Step 7: Complete Inference Pipeline

Implement the full pipeline: Prompt → Detection → Filtering → Segmentation

In [None]:
class TwoStagePipeline:
    """
    Complete pipeline combining YOLOv8 detection and SAM segmentation
    """
    def __init__(self, yolo_model_path, sam_predictor, prompt_mapper):
        """
        Initialize the pipeline
        
        Args:
            yolo_model_path (str): Path to trained YOLO model weights
            sam_predictor: Initialized SAM predictor
            prompt_mapper: Initialized PromptMapper instance
        """
        self.yolo_model = YOLO(yolo_model_path)
        self.sam_predictor = sam_predictor
        self.prompt_mapper = prompt_mapper
        
    def process_image(self, image_path, user_prompt, conf_threshold=0.25):
        """
        Main inference pipeline
        
        Args:
            image_path (str): Path to input image
            user_prompt (str): User's text prompt
            conf_threshold (float): Confidence threshold for YOLO detections
            
        Returns:
            dict: Contains masks, boxes, and visualization
        """
        # Step 1: Map prompt to target class
        target_class_name = self.prompt_mapper.map_prompt(user_prompt)
        if target_class_name is None:
            print(f"Warning: Could not map prompt '{user_prompt}' to a known class")
            return None
        
        target_class_id = self.prompt_mapper.get_class_id(target_class_name)
        print(f"Prompt: '{user_prompt}' → Target class: {target_class_name} (ID: {target_class_id})")
        
        # Step 2: Load and prepare image
        image = cv2.imread(image_path)
        image_rgb = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
        
        # Step 3: Run YOLO detection
        results = self.yolo_model(image_path, conf=conf_threshold)[0]
        
        # Step 4: Filter detections by target class
        boxes = []
        scores = []
        
        for box in results.boxes:
            class_id = int(box.cls[0])
            if class_id == target_class_id:
                # Convert to [x1, y1, x2, y2] format
                xyxy = box.xyxy[0].cpu().numpy()
                boxes.append(xyxy)
                scores.append(float(box.conf[0]))
        
        print(f"Found {len(boxes)} {target_class_name} detections")
        
        if len(boxes) == 0:
            print(f"No {target_class_name} detected in the image")
            return {
                'image': image_rgb,
                'boxes': [],
                'masks': [],
                'final_mask': np.zeros(image_rgb.shape[:2], dtype=bool)
            }
        
        # Step 5: Use SAM for segmentation with filtered boxes
        self.sam_predictor.set_image(image_rgb)
        
        all_masks = []
        for box in boxes:
            # SAM expects boxes in xyxy format
            mask, _, _ = self.sam_predictor.predict(
                box=box,
                multimask_output=False
            )
            all_masks.append(mask[0])
        
        # Step 6: Merge all masks into final binary mask
        final_mask = np.zeros(image_rgb.shape[:2], dtype=bool)
        for mask in all_masks:
            final_mask = final_mask | mask
        
        print(f"Generated {len(all_masks)} segmentation masks")
        
        return {
            'image': image_rgb,
            'boxes': boxes,
            'masks': all_masks,
            'final_mask': final_mask,
            'target_class': target_class_name,
            'scores': scores
        }
    
    def visualize_results(self, results, figsize=(15, 5)):
        """
        Visualize the pipeline results
        
        Args:
            results (dict): Output from process_image
            figsize (tuple): Figure size for visualization
        """
        if results is None:
            print("No results to visualize")
            return
        
        fig, axes = plt.subplots(1, 3, figsize=figsize)
        
        # Original image with bounding boxes
        ax = axes[0]
        ax.imshow(results['image'])
        for i, box in enumerate(results['boxes']):
            x1, y1, x2, y2 = box
            rect = plt.Rectangle((x1, y1), x2-x1, y2-y1, 
                                fill=False, edgecolor='red', linewidth=2)
            ax.add_patch(rect)
            ax.text(x1, y1-5, f"{results['target_class']} {results['scores'][i]:.2f}", 
                   color='red', fontsize=10, weight='bold',
                   bbox=dict(facecolor='white', alpha=0.7))
        ax.set_title('YOLO Detections (Filtered)')
        ax.axis('off')
        
        # Individual masks
        ax = axes[1]
        ax.imshow(results['image'])
        for mask in results['masks']:
            colored_mask = np.zeros_like(results['image'])
            colored_mask[mask] = [255, 0, 0]  # Red color for masks
            ax.imshow(colored_mask, alpha=0.5)
        ax.set_title('SAM Segmentation Masks')
        ax.axis('off')
        
        # Final merged mask
        ax = axes[2]
        ax.imshow(results['image'])
        colored_final = np.zeros_like(results['image'])
        colored_final[results['final_mask']] = [0, 255, 0]  # Green color
        ax.imshow(colored_final, alpha=0.6)
        ax.set_title('Final Merged Mask')
        ax.axis('off')
        
        plt.tight_layout()
        plt.show()

# Initialize the complete pipeline
pipeline = TwoStagePipeline(
    yolo_model_path='runs/detect/crack_drywall_detector/weights/best.pt',
    sam_predictor=sam_predictor,
    prompt_mapper=prompt_mapper
)

print("Pipeline initialized successfully!")

### Step 8: Test the Pipeline

Run the pipeline with different prompts on test images.

In [None]:
# Example 1: Segment taping area
test_image_path = 'path/to/your/test/image.jpg'  # Update with actual path
user_prompt = "segment taping area"

results = pipeline.process_image(test_image_path, user_prompt, conf_threshold=0.3)
pipeline.visualize_results(results)

In [None]:
# Example 2: Find cracks
user_prompt = "find cracks"

results = pipeline.process_image(test_image_path, user_prompt, conf_threshold=0.3)
pipeline.visualize_results(results)

### Step 9: Batch Processing

Process multiple images with the same or different prompts.

In [None]:
import glob

def batch_process(image_folder, prompts, output_folder='results', conf_threshold=0.3):
    """
    Process multiple images with different prompts
    
    Args:
        image_folder (str): Folder containing images
        prompts (list): List of prompts to try on each image
        output_folder (str): Where to save results
        conf_threshold (float): Confidence threshold
    """
    os.makedirs(output_folder, exist_ok=True)
    
    # Get all images
    image_extensions = ['*.jpg', '*.jpeg', '*.png']
    image_paths = []
    for ext in image_extensions:
        image_paths.extend(glob.glob(os.path.join(image_folder, ext)))
    
    print(f"Found {len(image_paths)} images")
    
    for img_path in image_paths:
        img_name = os.path.basename(img_path)
        print(f"\nProcessing: {img_name}")
        
        for prompt in prompts:
            results = pipeline.process_image(img_path, prompt, conf_threshold)
            
            if results and results['final_mask'].any():
                # Save the final mask
                output_name = f"{os.path.splitext(img_name)[0]}_{results['target_class']}_mask.png"
                output_path = os.path.join(output_folder, output_name)
                
                # Convert mask to image
                mask_img = (results['final_mask'] * 255).astype(np.uint8)
                cv2.imwrite(output_path, mask_img)
                print(f"  Saved: {output_name}")

# Example batch processing
# Uncomment and update paths to use
"""
batch_process(
    image_folder='merged_dataset/test/images',
    prompts=['segment taping area', 'find cracks'],
    output_folder='pipeline_results'
)
"""

### Step 10: Export Final Mask Function

Utility function to save masks in different formats.

In [None]:
def export_mask(results, output_path, format='binary'):
    """
    Export segmentation mask in various formats
    
    Args:
        results (dict): Pipeline results
        output_path (str): Output file path
        format (str): 'binary', 'overlay', or 'colored'
    """
    if results is None or not results['final_mask'].any():
        print("No mask to export")
        return
    
    if format == 'binary':
        # Binary mask (0 or 255)
        mask_img = (results['final_mask'] * 255).astype(np.uint8)
        cv2.imwrite(output_path, mask_img)
        
    elif format == 'overlay':
        # Original image with colored overlay
        overlay = results['image'].copy()
        colored_mask = np.zeros_like(overlay)
        colored_mask[results['final_mask']] = [0, 255, 0]  # Green
        overlay = cv2.addWeighted(overlay, 0.7, colored_mask, 0.3, 0)
        cv2.imwrite(output_path, cv2.cvtColor(overlay, cv2.COLOR_RGB2BGR))
        
    elif format == 'colored':
        # Just the colored mask
        colored_mask = np.zeros_like(results['image'])
        colored_mask[results['final_mask']] = [0, 255, 0]
        cv2.imwrite(output_path, cv2.cvtColor(colored_mask, cv2.COLOR_RGB2BGR))
    
    print(f"Mask exported to: {output_path}")

# Example usage
"""
results = pipeline.process_image('test.jpg', 'segment taping area')
export_mask(results, 'output_binary.png', format='binary')
export_mask(results, 'output_overlay.png', format='overlay')
export_mask(results, 'output_colored.png', format='colored')
"""

---

## Summary and Usage Guide

### Pipeline Architecture

**Part A - Training Phase (One-Time)**:
1. ✅ Download datasets from Roboflow (cracks-1, Drywall-Join-Detect-2)
2. ✅ Merge datasets with unified class labels (crack=0, drywall-join=1)
3. ✅ Train YOLOv8 model on merged dataset
4. ✅ Save best weights (`best.pt`)

**Part B - Inference Pipeline (Real-Time)**:
1. ✅ User provides image + text prompt
2. ✅ Prompt mapper translates text → target class
3. ✅ YOLO detects all objects → filters by target class
4. ✅ Filtered boxes → SAM generates pixel-level masks
5. ✅ Merge all masks → return final binary mask

### Quick Start

```python
# After training, use the pipeline:
results = pipeline.process_image(
    image_path='your_image.jpg',
    user_prompt='segment taping area',  # or 'find cracks'
    conf_threshold=0.3
)

# Visualize
pipeline.visualize_results(results)

# Export
export_mask(results, 'output.png', format='overlay')
```

### Key Features

- **Flexible Prompting**: Multiple ways to describe same object (e.g., "taping area", "drywall joint", "seam")
- **Two-Class Detection**: Handles both cracks and drywall joints
- **Precise Segmentation**: SAM provides pixel-perfect masks
- **Batch Processing**: Process multiple images efficiently
- **Multiple Export Formats**: Binary, overlay, or colored masks

### Important Notes

1. **SAM Model**: Download SAM checkpoint from [here](https://github.com/facebookresearch/segment-anything#model-checkpoints)
2. **GPU Recommended**: Both training and inference benefit from GPU acceleration
3. **Confidence Threshold**: Adjust `conf_threshold` based on your precision/recall needs
4. **Custom Prompts**: Add new prompt mappings in `PromptMapper` class

---

## 📊 Model Evaluation: mIoU & Dice Score

This section evaluates the segmentation quality using two key metrics:
- **mIoU (mean Intersection over Union)**: Measures overlap between predicted and ground truth masks
- **Dice Score**: Harmonic mean of precision and recall (2×overlap / total pixels)

We'll evaluate using **two different prompt types**:
1. **Bounding Box Prompts**: Direct ground truth boxes → SAM (ideal scenario)
2. **Text Prompts**: Natural language → YOLO → SAM (realistic user scenario)

### Step 11: Metric Calculation Functions

In [None]:
def calculate_iou(pred_mask, gt_mask):
    """
    Calculate Intersection over Union (IoU)
    
    Args:
        pred_mask (np.array): Predicted binary mask
        gt_mask (np.array): Ground truth binary mask
        
    Returns:
        float: IoU score (0 to 1)
    """
    intersection = np.logical_and(pred_mask, gt_mask).sum()
    union = np.logical_or(pred_mask, gt_mask).sum()
    
    if union == 0:
        return 1.0 if intersection == 0 else 0.0
    
    return intersection / union


def calculate_dice(pred_mask, gt_mask):
    """
    Calculate Dice coefficient (F1 score for segmentation)
    
    Args:
        pred_mask (np.array): Predicted binary mask
        gt_mask (np.array): Ground truth binary mask
        
    Returns:
        float: Dice score (0 to 1)
    """
    intersection = np.logical_and(pred_mask, gt_mask).sum()
    pred_sum = pred_mask.sum()
    gt_sum = gt_mask.sum()
    
    if (pred_sum + gt_sum) == 0:
        return 1.0 if intersection == 0 else 0.0
    
    return (2.0 * intersection) / (pred_sum + gt_sum)


def yolo_to_bbox(label_line, img_width, img_height):
    """
    Convert YOLO format (normalized) to pixel bounding box coordinates
    
    Args:
        label_line (str): YOLO format line (class x_center y_center width height)
        img_width (int): Image width in pixels
        img_height (int): Image height in pixels
        
    Returns:
        tuple: (class_id, x1, y1, x2, y2) in pixel coordinates
    """
    parts = label_line.strip().split()
    class_id = int(parts[0])
    x_center = float(parts[1]) * img_width
    y_center = float(parts[2]) * img_height
    width = float(parts[3]) * img_width
    height = float(parts[4]) * img_height
    
    x1 = max(0, int(x_center - width / 2))
    y1 = max(0, int(y_center - height / 2))
    x2 = min(img_width, int(x_center + width / 2))
    y2 = min(img_height, int(y_center + height / 2))
    
    return class_id, x1, y1, x2, y2


def create_mask_from_bbox(bbox, img_shape):
    """
    Create a binary mask from bounding box (used as ground truth approximation)
    
    Args:
        bbox (tuple): (x1, y1, x2, y2) in pixels
        img_shape (tuple): (height, width)
        
    Returns:
        np.array: Binary mask
    """
    mask = np.zeros(img_shape, dtype=bool)
    x1, y1, x2, y2 = bbox
    mask[y1:y2, x1:x2] = True
    return mask

print("✓ Metric calculation functions loaded!")

### Step 12: Evaluation Method 1 - SAM with Bounding Box Prompts

**How it works:**
1. Load ground truth bounding boxes from YOLO labels
2. Use these boxes directly as prompts for SAM
3. Compare SAM's predicted masks with ground truth boxes
4. Calculate mIoU and Dice scores

This represents the **best-case scenario** - SAM gets perfect bounding box prompts.

In [None]:
def evaluate_sam_with_bbox_prompts(test_images_dir, test_labels_dir, sam_predictor, max_samples=None):
    """
    Evaluate SAM using ground truth bounding boxes as prompts
    
    Args:
        test_images_dir (str): Path to test images folder
        test_labels_dir (str): Path to test labels folder
        sam_predictor: Initialized SAM predictor
        max_samples (int): Max images to evaluate (None = all)
        
    Returns:
        dict: Evaluation metrics
    """
    print("\n" + "="*70)
    print("EVALUATION METHOD 1: SAM with Ground Truth Bounding Box Prompts")
    print("="*70)
    
    # Get all test images
    image_files = sorted([f for f in os.listdir(test_images_dir) 
                         if f.endswith(('.jpg', '.jpeg', '.png'))])
    
    if max_samples:
        image_files = image_files[:max_samples]
    
    # Storage for metrics
    all_iou_scores = []
    all_dice_scores = []
    class_metrics = {
        0: {'iou': [], 'dice': [], 'name': 'crack'},
        1: {'iou': [], 'dice': [], 'name': 'drywall-join'}
    }
    
    processed_count = 0
    
    for idx, img_file in enumerate(image_files):
        # Load image
        img_path = os.path.join(test_images_dir, img_file)
        image = cv2.imread(img_path)
        if image is None:
            continue
            
        image_rgb = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
        img_height, img_width = image.shape[:2]
        
        # Load corresponding label
        label_file = os.path.splitext(img_file)[0] + '.txt'
        label_path = os.path.join(test_labels_dir, label_file)
        
        if not os.path.exists(label_path):
            continue
        
        # Set image for SAM
        sam_predictor.set_image(image_rgb)
        
        # Read all bounding boxes from label file
        with open(label_path, 'r') as f:
            lines = f.readlines()
        
        if not lines:
            continue
            
        processed_count += 1
        
        for line in lines:
            if not line.strip():
                continue
            
            # Parse YOLO format to get bbox
            class_id, x1, y1, x2, y2 = yolo_to_bbox(line, img_width, img_height)
            
            # Create ground truth mask from bbox
            gt_mask = create_mask_from_bbox((x1, y1, x2, y2), (img_height, img_width))
            
            # Use SAM with ground truth bbox as prompt
            bbox_array = np.array([x1, y1, x2, y2])
            try:
                pred_mask, _, _ = sam_predictor.predict(
                    box=bbox_array,
                    multimask_output=False
                )
                pred_mask = pred_mask[0]
                
                # Calculate metrics
                iou = calculate_iou(pred_mask, gt_mask)
                dice = calculate_dice(pred_mask, gt_mask)
                
                all_iou_scores.append(iou)
                all_dice_scores.append(dice)
                class_metrics[class_id]['iou'].append(iou)
                class_metrics[class_id]['dice'].append(dice)
                
            except Exception as e:
                print(f"Error processing {img_file}: {e}")
                continue
        
        # Progress update
        if (idx + 1) % 10 == 0:
            print(f"  Processed {idx + 1}/{len(image_files)} images...")
    
    # Calculate results
    results = {
        'method': 'BBox Prompts',
        'total_images': processed_count,
        'total_instances': len(all_iou_scores),
        'mean_iou': np.mean(all_iou_scores) if all_iou_scores else 0.0,
        'mean_dice': np.mean(all_dice_scores) if all_dice_scores else 0.0,
        'std_iou': np.std(all_iou_scores) if all_iou_scores else 0.0,
        'std_dice': np.std(all_dice_scores) if all_dice_scores else 0.0,
    }
    
    # Per-class metrics
    for class_id in [0, 1]:
        class_name = class_metrics[class_id]['name']
        results[f'{class_name}_iou'] = np.mean(class_metrics[class_id]['iou']) if class_metrics[class_id]['iou'] else 0.0
        results[f'{class_name}_dice'] = np.mean(class_metrics[class_id]['dice']) if class_metrics[class_id]['dice'] else 0.0
        results[f'{class_name}_count'] = len(class_metrics[class_id]['iou'])
    
    # Print results
    print("\n" + "="*70)
    print("RESULTS:")
    print("="*70)
    print(f"Images Processed: {results['total_images']}")
    print(f"Total Instances: {results['total_instances']}")
    print(f"\n📊 Overall Metrics:")
    print(f"  Mean IoU:  {results['mean_iou']:.4f} (±{results['std_iou']:.4f})")
    print(f"  Mean Dice: {results['mean_dice']:.4f} (±{results['std_dice']:.4f})")
    print(f"\n📊 Per-Class Metrics:")
    print(f"  Crack (n={results['crack_count']}):")
    print(f"    IoU:  {results['crack_iou']:.4f}")
    print(f"    Dice: {results['crack_dice']:.4f}")
    print(f"  Drywall-Join (n={results['drywall-join_count']}):")
    print(f"    IoU:  {results['drywall-join_iou']:.4f}")
    print(f"    Dice: {results['drywall-join_dice']:.4f}")
    print("="*70)
    
    return results

# Run evaluation with bounding box prompts
bbox_results = evaluate_sam_with_bbox_prompts(
    test_images_dir='merged_dataset/test/images',
    test_labels_dir='merged_dataset/test/labels',
    sam_predictor=sam_predictor,
    max_samples=50  # Set to None to evaluate all test images
)

### Step 13: Evaluation Method 2 - Complete Pipeline with Text Prompts

**How it works:**
1. User provides natural language text prompt (e.g., "find cracks")
2. Prompt mapper converts text → target class
3. YOLO model detects objects and filters by target class
4. SAM uses YOLO's predicted boxes to generate masks
5. Compare final masks with ground truth
6. Calculate mIoU and Dice scores

This represents the **real-world scenario** - end-to-end pipeline performance.

In [None]:
def evaluate_pipeline_with_text_prompts(test_images_dir, test_labels_dir, pipeline, 
                                        prompts_per_class, conf_threshold=0.25, max_samples=None):
    """
    Evaluate complete pipeline (YOLO + SAM) using text prompts
    
    Args:
        test_images_dir (str): Path to test images folder
        test_labels_dir (str): Path to test labels folder
        pipeline: Initialized TwoStagePipeline instance
        prompts_per_class (dict): Dictionary mapping class_id to list of text prompts
        conf_threshold (float): YOLO confidence threshold
        max_samples (int): Max images to evaluate (None = all)
        
    Returns:
        dict: Evaluation metrics
    """
    print("\n" + "="*70)
    print("EVALUATION METHOD 2: Complete Pipeline with Text Prompts")
    print("="*70)
    
    # Get all test images
    image_files = sorted([f for f in os.listdir(test_images_dir) 
                         if f.endswith(('.jpg', '.jpeg', '.png'))])
    
    if max_samples:
        image_files = image_files[:max_samples]
    
    # Storage for metrics per prompt
    prompt_results = {}
    
    # Process each class and its prompts
    for class_id, prompts in prompts_per_class.items():
        class_name = 'crack' if class_id == 0 else 'drywall-join'
        
        for prompt in prompts:
            print(f"\n📝 Testing prompt: '{prompt}' (class: {class_name})")
            
            prompt_iou = []
            prompt_dice = []
            processed_count = 0
            matched_instances = 0
            
            for idx, img_file in enumerate(image_files):
                # Load image
                img_path = os.path.join(test_images_dir, img_file)
                image = cv2.imread(img_path)
                if image is None:
                    continue
                    
                image_rgb = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
                img_height, img_width = image.shape[:2]
                
                # Load ground truth labels
                label_file = os.path.splitext(img_file)[0] + '.txt'
                label_path = os.path.join(test_labels_dir, label_file)
                
                if not os.path.exists(label_path):
                    continue
                
                # Read ground truth boxes for this class
                gt_boxes_for_class = []
                with open(label_path, 'r') as f:
                    for line in f:
                        if not line.strip():
                            continue
                        gt_class_id, x1, y1, x2, y2 = yolo_to_bbox(line, img_width, img_height)
                        if gt_class_id == class_id:
                            gt_boxes_for_class.append((x1, y1, x2, y2))
                
                if not gt_boxes_for_class:
                    continue  # No instances of this class in image
                
                processed_count += 1
                
                # Run pipeline with text prompt
                try:
                    results = pipeline.process_image(img_path, prompt, conf_threshold)
                    
                    if results is None or not results['final_mask'].any():
                        continue  # No detections
                    
                    # Create combined ground truth mask for this class
                    gt_combined_mask = np.zeros((img_height, img_width), dtype=bool)
                    for bbox in gt_boxes_for_class:
                        gt_mask = create_mask_from_bbox(bbox, (img_height, img_width))
                        gt_combined_mask = np.logical_or(gt_combined_mask, gt_mask)
                    
                    # Calculate metrics against combined mask
                    iou = calculate_iou(results['final_mask'], gt_combined_mask)
                    dice = calculate_dice(results['final_mask'], gt_combined_mask)
                    
                    prompt_iou.append(iou)
                    prompt_dice.append(dice)
                    matched_instances += len(gt_boxes_for_class)
                    
                except Exception as e:
                    print(f"    Error on {img_file}: {e}")
                    continue
                
                # Progress update
                if (idx + 1) % 20 == 0:
                    print(f"    Processed {idx + 1}/{len(image_files)} images...")
            
            # Store results for this prompt
            prompt_results[prompt] = {
                'class_id': class_id,
                'class_name': class_name,
                'images_processed': processed_count,
                'instances_matched': matched_instances,
                'mean_iou': np.mean(prompt_iou) if prompt_iou else 0.0,
                'mean_dice': np.mean(prompt_dice) if prompt_dice else 0.0,
                'std_iou': np.std(prompt_iou) if prompt_iou else 0.0,
                'std_dice': np.std(prompt_dice) if prompt_dice else 0.0,
            }
            
            print(f"    ✓ Results: IoU={prompt_results[prompt]['mean_iou']:.4f}, "
                  f"Dice={prompt_results[prompt]['mean_dice']:.4f} "
                  f"({processed_count} images, {matched_instances} instances)")
    
    # Calculate overall metrics
    all_iou = [r['mean_iou'] for r in prompt_results.values() if r['mean_iou'] > 0]
    all_dice = [r['mean_dice'] for r in prompt_results.values() if r['mean_dice'] > 0]
    
    overall_results = {
        'method': 'Text Prompts (Pipeline)',
        'num_prompts': len(prompt_results),
        'overall_mean_iou': np.mean(all_iou) if all_iou else 0.0,
        'overall_mean_dice': np.mean(all_dice) if all_dice else 0.0,
        'prompt_details': prompt_results
    }
    
    # Print summary
    print("\n" + "="*70)
    print("RESULTS SUMMARY:")
    print("="*70)
    print(f"Total Prompts Tested: {len(prompt_results)}")
    print(f"\n📊 Overall Metrics (averaged across all prompts):")
    print(f"  Mean IoU:  {overall_results['overall_mean_iou']:.4f}")
    print(f"  Mean Dice: {overall_results['overall_mean_dice']:.4f}")
    print(f"\n📊 Per-Prompt Results:")
    for prompt, res in prompt_results.items():
        print(f"\n  '{prompt}' ({res['class_name']}):")
        print(f"    Images: {res['images_processed']}, Instances: {res['instances_matched']}")
        print(f"    IoU:  {res['mean_iou']:.4f} (±{res['std_iou']:.4f})")
        print(f"    Dice: {res['mean_dice']:.4f} (±{res['std_dice']:.4f})")
    print("="*70)
    
    return overall_results

# Define test prompts for each class
test_prompts = {
    0: ['find cracks', 'detect crack', 'segment crack'],  # Class 0: crack
    1: ['segment taping area', 'segment joint', 'detect drywall']  # Class 1: drywall-join
}

# Run evaluation with text prompts
text_results = evaluate_pipeline_with_text_prompts(
    test_images_dir='merged_dataset/test/images',
    test_labels_dir='merged_dataset/test/labels',
    pipeline=pipeline,
    prompts_per_class=test_prompts,
    conf_threshold=0.3,
    max_samples=50  # Set to None to evaluate all test images
)

### Step 14: Comparison and Visualization

Compare both evaluation methods side-by-side.

In [None]:
import pandas as pd

def compare_evaluation_methods(bbox_results, text_results):
    """
    Create a comparison table and visualization for both evaluation methods
    
    Args:
        bbox_results (dict): Results from bounding box prompt evaluation
        text_results (dict): Results from text prompt evaluation
    """
    print("\n" + "="*70)
    print("COMPARATIVE ANALYSIS: BBox Prompts vs Text Prompts")
    print("="*70)
    
    # Create comparison dataframe
    comparison_data = {
        'Method': [
            'SAM with BBox Prompts',
            'Pipeline with Text Prompts'
        ],
        'Mean IoU': [
            bbox_results['mean_iou'],
            text_results['overall_mean_iou']
        ],
        'Mean Dice': [
            bbox_results['mean_dice'],
            text_results['overall_mean_dice']
        ],
        'Instances': [
            bbox_results['total_instances'],
            sum([r['instances_matched'] for r in text_results['prompt_details'].values()])
        ]
    }
    
    df_comparison = pd.DataFrame(comparison_data)
    
    print("\n📊 Overall Comparison:")
    print(df_comparison.to_string(index=False))
    
    # Performance difference
    iou_diff = bbox_results['mean_iou'] - text_results['overall_mean_iou']
    dice_diff = bbox_results['mean_dice'] - text_results['overall_mean_dice']
    
    print(f"\n📉 Performance Gap:")
    print(f"  IoU difference:  {iou_diff:+.4f} ({iou_diff/bbox_results['mean_iou']*100:+.2f}%)")
    print(f"  Dice difference: {dice_diff:+.4f} ({dice_diff/bbox_results['mean_dice']*100:+.2f}%)")
    
    # Visualization
    fig, axes = plt.subplots(1, 2, figsize=(14, 5))
    
    # IoU comparison
    ax = axes[0]
    methods = ['BBox\nPrompts', 'Text\nPrompts']
    iou_scores = [bbox_results['mean_iou'], text_results['overall_mean_iou']]
    bars = ax.bar(methods, iou_scores, color=['#2ecc71', '#3498db'], alpha=0.8, edgecolor='black')
    ax.set_ylabel('Mean IoU Score', fontsize=12, fontweight='bold')
    ax.set_title('Mean IoU Comparison', fontsize=14, fontweight='bold')
    ax.set_ylim([0, 1.0])
    ax.grid(axis='y', alpha=0.3)
    
    # Add value labels on bars
    for bar, score in zip(bars, iou_scores):
        height = bar.get_height()
        ax.text(bar.get_x() + bar.get_width()/2., height,
                f'{score:.4f}',
                ha='center', va='bottom', fontweight='bold', fontsize=11)
    
    # Dice comparison
    ax = axes[1]
    dice_scores = [bbox_results['mean_dice'], text_results['overall_mean_dice']]
    bars = ax.bar(methods, dice_scores, color=['#e74c3c', '#9b59b6'], alpha=0.8, edgecolor='black')
    ax.set_ylabel('Mean Dice Score', fontsize=12, fontweight='bold')
    ax.set_title('Mean Dice Comparison', fontsize=14, fontweight='bold')
    ax.set_ylim([0, 1.0])
    ax.grid(axis='y', alpha=0.3)
    
    # Add value labels on bars
    for bar, score in zip(bars, dice_scores):
        height = bar.get_height()
        ax.text(bar.get_x() + bar.get_width()/2., height,
                f'{score:.4f}',
                ha='center', va='bottom', fontweight='bold', fontsize=11)
    
    plt.tight_layout()
    plt.show()
    
    print("\n" + "="*70)
    print("KEY INSIGHTS:")
    print("="*70)
    print("1. BBox Prompts: Upper bound performance (SAM with perfect inputs)")
    print("2. Text Prompts: Real-world performance (includes YOLO detection errors)")
    print("3. Gap indicates YOLO detection quality impact on final segmentation")
    print("="*70)

# Run comparison
compare_evaluation_methods(bbox_results, text_results)

### Step 15: Visual Sample Comparison

Show side-by-side predictions from both methods on sample images.

In [None]:
def visualize_sample_predictions(image_path, label_path, sam_predictor, pipeline, 
                                 text_prompt, conf_threshold=0.3):
    """
    Visualize predictions from both methods on a sample image
    
    Args:
        image_path (str): Path to test image
        label_path (str): Path to corresponding label file
        sam_predictor: SAM predictor instance
        pipeline: TwoStagePipeline instance
        text_prompt (str): Text prompt to use
        conf_threshold (float): YOLO confidence threshold
    """
    # Load image
    image = cv2.imread(image_path)
    image_rgb = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
    img_height, img_width = image.shape[:2]
    
    # Load ground truth
    gt_boxes = []
    with open(label_path, 'r') as f:
        for line in f:
            if line.strip():
                class_id, x1, y1, x2, y2 = yolo_to_bbox(line, img_width, img_height)
                gt_boxes.append((class_id, x1, y1, x2, y2))
    
    # Method 1: SAM with GT BBox prompts
    sam_predictor.set_image(image_rgb)
    bbox_masks = []
    for class_id, x1, y1, x2, y2 in gt_boxes:
        bbox_array = np.array([x1, y1, x2, y2])
        mask, _, _ = sam_predictor.predict(box=bbox_array, multimask_output=False)
        bbox_masks.append(mask[0])
    
    bbox_final_mask = np.zeros((img_height, img_width), dtype=bool)
    for mask in bbox_masks:
        bbox_final_mask = np.logical_or(bbox_final_mask, mask)
    
    # Method 2: Pipeline with text prompt
    text_results = pipeline.process_image(image_path, text_prompt, conf_threshold)
    text_final_mask = text_results['final_mask'] if text_results else np.zeros((img_height, img_width), dtype=bool)
    
    # Ground truth combined mask
    gt_combined_mask = np.zeros((img_height, img_width), dtype=bool)
    for class_id, x1, y1, x2, y2 in gt_boxes:
        gt_mask = create_mask_from_bbox((x1, y1, x2, y2), (img_height, img_width))
        gt_combined_mask = np.logical_or(gt_combined_mask, gt_mask)
    
    # Calculate metrics
    bbox_iou = calculate_iou(bbox_final_mask, gt_combined_mask)
    bbox_dice = calculate_dice(bbox_final_mask, gt_combined_mask)
    text_iou = calculate_iou(text_final_mask, gt_combined_mask)
    text_dice = calculate_dice(text_final_mask, gt_combined_mask)
    
    # Visualization
    fig, axes = plt.subplots(2, 3, figsize=(18, 12))
    
    # Row 1: Method 1 - BBox Prompts
    # Original with GT boxes
    ax = axes[0, 0]
    ax.imshow(image_rgb)
    for class_id, x1, y1, x2, y2 in gt_boxes:
        rect = plt.Rectangle((x1, y1), x2-x1, y2-y1, fill=False, edgecolor='lime', linewidth=3)
        ax.add_patch(rect)
    ax.set_title('Ground Truth Boxes', fontsize=12, fontweight='bold')
    ax.axis('off')
    
    # SAM output with bbox prompts
    ax = axes[0, 1]
    ax.imshow(image_rgb)
    colored_mask = np.zeros_like(image_rgb)
    colored_mask[bbox_final_mask] = [255, 0, 0]
    ax.imshow(colored_mask, alpha=0.5)
    ax.set_title('Method 1: SAM + BBox Prompts', fontsize=12, fontweight='bold')
    ax.axis('off')
    
    # Metrics for method 1
    ax = axes[0, 2]
    ax.text(0.5, 0.6, f'IoU: {bbox_iou:.4f}', ha='center', va='center', fontsize=16, fontweight='bold')
    ax.text(0.5, 0.4, f'Dice: {bbox_dice:.4f}', ha='center', va='center', fontsize=16, fontweight='bold')
    ax.set_title('Method 1 Metrics', fontsize=12, fontweight='bold')
    ax.axis('off')
    
    # Row 2: Method 2 - Text Prompts
    # Original with YOLO detections
    ax = axes[1, 0]
    ax.imshow(image_rgb)
    if text_results and text_results['boxes']:
        for box in text_results['boxes']:
            x1, y1, x2, y2 = box
            rect = plt.Rectangle((x1, y1), x2-x1, y2-y1, fill=False, edgecolor='yellow', linewidth=3)
            ax.add_patch(rect)
    ax.set_title(f'YOLO Detections\n(prompt: "{text_prompt}")', fontsize=12, fontweight='bold')
    ax.axis('off')
    
    # SAM output with text prompts
    ax = axes[1, 1]
    ax.imshow(image_rgb)
    colored_mask = np.zeros_like(image_rgb)
    colored_mask[text_final_mask] = [0, 255, 0]
    ax.imshow(colored_mask, alpha=0.5)
    ax.set_title('Method 2: Pipeline + Text Prompts', fontsize=12, fontweight='bold')
    ax.axis('off')
    
    # Metrics for method 2
    ax = axes[1, 2]
    ax.text(0.5, 0.6, f'IoU: {text_iou:.4f}', ha='center', va='center', fontsize=16, fontweight='bold')
    ax.text(0.5, 0.4, f'Dice: {text_dice:.4f}', ha='center', va='center', fontsize=16, fontweight='bold')
    ax.set_title('Method 2 Metrics', fontsize=12, fontweight='bold')
    ax.axis('off')
    
    plt.tight_layout()
    plt.show()
    
    print(f"Sample: {os.path.basename(image_path)}")
    print(f"Method 1 (BBox):  IoU={bbox_iou:.4f}, Dice={bbox_dice:.4f}")
    print(f"Method 2 (Text):  IoU={text_iou:.4f}, Dice={text_dice:.4f}")
    print(f"Difference:       IoU={bbox_iou-text_iou:+.4f}, Dice={bbox_dice-text_dice:+.4f}")

# Example: Visualize a sample image (update paths as needed)
"""
sample_image = 'merged_dataset/test/images/sample.jpg'
sample_label = 'merged_dataset/test/labels/sample.txt'

visualize_sample_predictions(
    image_path=sample_image,
    label_path=sample_label,
    sam_predictor=sam_predictor,
    pipeline=pipeline,
    text_prompt='find cracks',  # or 'segment taping area'
    conf_threshold=0.3
)
"""
print("Visual comparison function ready! Uncomment and update paths to use.")

---

## 📋 Evaluation Summary & Guide

### What We're Evaluating

We measure segmentation quality using two metrics:

1. **mIoU (mean Intersection over Union)**
   - Formula: `Intersection / Union`
   - Range: 0 to 1 (higher is better)
   - Measures: How much predicted and ground truth masks overlap
   - Interpretation:
     - > 0.9: Excellent
     - 0.7-0.9: Good
     - 0.5-0.7: Moderate
     - < 0.5: Poor

2. **Dice Score (F1 Score)**
   - Formula: `2 × Intersection / (Predicted + Ground Truth)`
   - Range: 0 to 1 (higher is better)
   - Measures: Harmonic mean of precision and recall
   - More sensitive to false positives/negatives than IoU

### Two Evaluation Methods

#### Method 1: SAM with Bounding Box Prompts ✅
**Purpose**: Measure SAM's segmentation quality with perfect inputs

**Process**:
1. Load ground truth bounding boxes from YOLO labels
2. Feed boxes directly to SAM
3. Compare SAM masks with ground truth
4. Calculate mIoU and Dice

**This tells us**: Upper bound performance - how good SAM is when given perfect bounding boxes

---

#### Method 2: Complete Pipeline with Text Prompts 🔄
**Purpose**: Measure real-world end-to-end performance

**Process**:
1. User provides text prompt (e.g., "find cracks")
2. Prompt mapper converts to class
3. YOLO detects and filters objects
4. SAM segments using YOLO's boxes
5. Compare final masks with ground truth
6. Calculate mIoU and Dice

**This tells us**: Real-world performance including YOLO detection errors

---

### How to Run the Evaluation

**Prerequisites** (must be run first):
- Complete Part A (Training) to get trained YOLO model
- Complete Part B steps 1-7 to initialize SAM and pipeline

**Then run**:
1. **Step 11**: Load metric calculation functions
2. **Step 12**: Run Method 1 evaluation (BBox prompts)
3. **Step 13**: Run Method 2 evaluation (Text prompts)
4. **Step 14**: Compare both methods
5. **Step 15**: Visualize sample predictions (optional)

**Adjust these parameters**:
- `max_samples`: Number of test images (None = all, 50 = faster testing)
- `conf_threshold`: YOLO confidence (0.3 is balanced, higher = fewer detections)
- `test_prompts`: Add/modify text prompts to test

---

### Interpreting Results

**Performance Gap Analysis**:
- Gap = Method 1 score - Method 2 score
- **Small gap** (< 5%): YOLO detection is excellent
- **Medium gap** (5-15%): Some detection errors affecting segmentation
- **Large gap** (> 15%): YOLO needs improvement or lower confidence threshold

**Per-Class Analysis**:
- Compare crack vs drywall-join performance
- Identifies which class is harder to detect/segment
- May need class-specific threshold tuning

---

### Expected Results

**Typical Performance Ranges**:
- **Method 1 (BBox)**: IoU 0.85-0.95, Dice 0.90-0.97
  - SAM is very good at segmentation with good boxes
  
- **Method 2 (Text)**: IoU 0.70-0.85, Dice 0.80-0.92
  - Depends heavily on YOLO detection quality
  - Text prompt quality also matters

**If your scores are low**:
1. Check if YOLO model trained properly
2. Verify SAM model loaded correctly
3. Try adjusting confidence threshold
4. Ensure test data has good annotations
5. Check if prompt mapping works correctly

### Quick Start: Complete Workflow

```python
# ============================================================
# COMPLETE EXECUTION ORDER
# ============================================================

# --- PART A: ONE-TIME TRAINING (Run these in order) ---
# Step 1: Install dependencies
# Step 2: Merge datasets
# Step 3: Train YOLO model (takes time!)
# Step 4: Validate model

# --- PART B: INFERENCE SETUP (Run these in order) ---
# Step 5: Load SAM model
# Step 6: Initialize prompt mapper
# Step 7: Create pipeline

# --- EVALUATION (Run these in order) ---
# Step 11: Load metric functions
# Step 12: Evaluate with BBox prompts
# Step 13: Evaluate with Text prompts
# Step 14: Compare results
# Step 15: (Optional) Visualize samples

# ============================================================
# TYPICAL RESULTS YOU SHOULD SEE
# ============================================================
"""
Method 1 (BBox Prompts):
  Mean IoU:  0.87-0.93
  Mean Dice: 0.91-0.95
  
Method 2 (Text Prompts):
  Mean IoU:  0.75-0.88
  Mean Dice: 0.82-0.92
  
Performance Gap: 5-15% (due to YOLO detection errors)
"""
```