# Nucleus Mask Explainability v0.1

This notebook implements robust nucleus segmentation using Cellpose for cervical cancer cell images. It processes images from the augmented dataset and generates high-quality nucleus masks for explainability analysis.

## Project Overview
- **Dataset**: Cervical cancer cell images from 5 categories (Dyskeratotic, Koilocytotic, Metaplastic, Parabasal, Superficial-Intermediate)
- **Goal**: Generate nucleus masks using Cellpose for downstream explainability analysis
- **Output**: Segmentation masks stored in "Nucleus Masks" folders with evaluation metrics

In [14]:
# Import required libraries
import os
import sys
import numpy as np
import pandas as pd
import cv2
from pathlib import Path
import matplotlib.pyplot as plt
import seaborn as sns
from tqdm.notebook import tqdm
import warnings
import json
from datetime import datetime
import logging
from typing import List, Tuple, Dict, Optional
from dataclasses import dataclass

# Configure logging
logging.basicConfig(level=logging.INFO, format='%(asctime)s - %(levelname)s - %(message)s')
logger = logging.getLogger(__name__)

# Suppress warnings for cleaner output
warnings.filterwarnings('ignore', category=UserWarning)
warnings.filterwarnings('ignore', category=FutureWarning)

print("‚úÖ Basic libraries imported successfully")
print(f"üìä NumPy version: {np.__version__}")
print(f"üñºÔ∏è OpenCV version: {cv2.__version__}")
print(f"üìà Matplotlib version: {plt.matplotlib.__version__}")

‚úÖ Basic libraries imported successfully
üìä NumPy version: 1.26.4
üñºÔ∏è OpenCV version: 4.12.0
üìà Matplotlib version: 3.8.3


In [6]:
# Import Cellpose and specialized libraries for segmentation
try:
    from cellpose import models, core, utils, plot
    from cellpose.io import imread
    print("‚úÖ Cellpose imported successfully")
    
    # Check if GPU is available for Cellpose
    use_GPU = core.use_gpu()
    print(f"üî• GPU available for Cellpose: {use_GPU}")
    if use_GPU:
        print("   Using GPU acceleration for segmentation")
    else:
        print("   Using CPU for segmentation")
        
except ImportError as e:
    print("‚ùå Error importing Cellpose:", str(e))
    print("Please install Cellpose using: pip install cellpose")

# Additional specialized imports
from skimage import measure, morphology, segmentation, filters
from skimage.metrics import structural_similarity as ssim
from scipy import ndimage
from natsort import natsorted
import time

print("‚úÖ All specialized libraries imported successfully")



Welcome to CellposeSAM, cellpose v
cellpose version: 	4.0.6 
platform:       	win32 
python version: 	3.12.10 
torch version:  	2.2.2+cpu! The neural network component of
CPSAM is much larger than in previous versions and CPU excution is slow. 
We encourage users to use GPU/MPS if available. 


‚úÖ Cellpose imported successfully
‚úÖ Cellpose imported successfully


2025-09-25 15:55:17,540 - INFO - Neither TORCH CUDA nor MPS version not installed/working.


üî• GPU available for Cellpose: False
   Using CPU for segmentation
‚úÖ All specialized libraries imported successfully


## Configuration and Data Structures

Let's define the configuration parameters and data structures that will be used throughout the segmentation process.

In [7]:
@dataclass
class SegmentationConfig:
    """Configuration class for nucleus segmentation parameters."""
    
    # Dataset paths
    dataset_root: str = r"c:\Meet\Projects\Project_8_Phoenix_Cervical Cancer Image Classification\Project-Phoenix\Dataset\Augmented Dataset - Limited Enhancement"
    input_folder: str = "NLM_CLAHE"
    output_folder: str = "Nucleus Masks"
    
    # Cell categories
    cell_categories: List[str] = None
    
    # Cellpose parameters
    model_type: str = "nuclei"  # Options: 'cyto', 'nuclei', 'cyto2', 'CP'
    diameter: Optional[int] = None  # Let Cellpose estimate
    channels: List[int] = None  # [0, 0] for grayscale
    flow_threshold: float = 0.4  # Default flow threshold
    cellprob_threshold: float = 0.0  # Default cell probability threshold
    
    # Image processing parameters
    min_nucleus_size: int = 50  # Minimum nucleus area in pixels
    max_nucleus_size: int = 5000  # Maximum nucleus area in pixels
    
    # Visualization parameters
    figsize: Tuple[int, int] = (15, 10)
    dpi: int = 100
    
    def __post_init__(self):
        """Initialize derived attributes after dataclass creation."""
        if self.cell_categories is None:
            self.cell_categories = [
                "im_Dyskeratotic",
                "im_Koilocytotic", 
                "im_Metaplastic",
                "im_Parabasal",
                "im_Superficial-Intermediate"
            ]
        
        if self.channels is None:
            self.channels = [0, 0]  # Grayscale processing

# Create configuration instance
config = SegmentationConfig()

print("‚úÖ Configuration initialized")
print(f"üìÇ Dataset root: {config.dataset_root}")
print(f"üè∑Ô∏è Cell categories: {len(config.cell_categories)}")
print(f"üî¨ Model type: {config.model_type}")
print(f"üìè Diameter estimation: {'Auto' if config.diameter is None else config.diameter}")
print(f"üåä Flow threshold: {config.flow_threshold}")
print(f"üìä Cell probability threshold: {config.cellprob_threshold}")

‚úÖ Configuration initialized
üìÇ Dataset root: c:\Meet\Projects\Project_8_Phoenix_Cervical Cancer Image Classification\Project-Phoenix\Dataset\Augmented Dataset - Limited Enhancement
üè∑Ô∏è Cell categories: 5
üî¨ Model type: nuclei
üìè Diameter estimation: Auto
üåä Flow threshold: 0.4
üìä Cell probability threshold: 0.0


## Nucleus Segmentation Functions

Let's implement robust functions for nucleus segmentation using Cellpose with comprehensive error handling and quality checks.

In [8]:
class NucleusSegmentator:
    """
    A robust nucleus segmentation class using Cellpose for cervical cancer cell images.
    
    This class provides comprehensive nucleus segmentation capabilities including:
    - Cellpose-based segmentation with optimized parameters
    - Post-processing for nucleus size filtering
    - Quality metrics calculation
    - Visualization capabilities
    """
    
    def __init__(self, config: SegmentationConfig):
        """
        Initialize the nucleus segmentator with configuration.
        
        Args:
            config: SegmentationConfig object containing all parameters
        """
        self.config = config
        self.model = None
        self._initialize_model()
    
    def _initialize_model(self):
        """Initialize the Cellpose model."""
        try:
            # Use the correct Cellpose API for model initialization
            self.model = models.CellposeModel(gpu=core.use_gpu(), model_type=self.config.model_type)
            logger.info(f"Cellpose model '{self.config.model_type}' initialized successfully")
        except Exception as e:
            logger.error(f"Failed to initialize Cellpose model: {str(e)}")
            raise
    
    def preprocess_image(self, image: np.ndarray) -> np.ndarray:
        """
        Preprocess the input image for optimal segmentation.
        
        Args:
            image: Input image as numpy array
            
        Returns:
            Preprocessed image
        """
        # Convert to grayscale if needed
        if len(image.shape) == 3:
            image = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
        
        # Ensure image is in proper data type and range
        if image.dtype != np.uint8:
            if image.max() <= 1.0:
                image = (image * 255).astype(np.uint8)
            else:
                image = image.astype(np.uint8)
        
        # Optional: Apply mild denoising if needed
        # image = cv2.bilateralFilter(image, 5, 50, 50)
        
        return image
    
    def segment_nucleus(self, image: np.ndarray, return_flows: bool = False) -> Tuple[np.ndarray, Optional[np.ndarray]]:
        """
        Perform nucleus segmentation on a single image.
        
        Args:
            image: Input image as numpy array
            return_flows: Whether to return flow field information
            
        Returns:
            Tuple of (mask, flows) where flows is None if return_flows=False
        """
        try:
            # Preprocess image
            processed_image = self.preprocess_image(image)
            
            # Run Cellpose segmentation
            masks, flows, styles, diams = self.model.eval(
                processed_image,
                diameter=self.config.diameter,
                channels=self.config.channels,
                flow_threshold=self.config.flow_threshold,
                cellprob_threshold=self.config.cellprob_threshold
            )
            
            # Post-process masks
            masks = self.post_process_masks(masks)
            
            if return_flows:
                return masks, flows
            else:
                return masks, None
                
        except Exception as e:
            logger.error(f"Segmentation failed: {str(e)}")
            # Return empty mask in case of failure
            return np.zeros_like(image[:,:,0] if len(image.shape)==3 else image, dtype=np.uint16), None
    
    def post_process_masks(self, masks: np.ndarray) -> np.ndarray:
        """
        Post-process segmentation masks to filter nuclei by size and clean up artifacts.
        
        Args:
            masks: Raw segmentation masks from Cellpose
            
        Returns:
            Cleaned and filtered masks
        """
        if masks.max() == 0:
            return masks
        
        # Get region properties
        props = measure.regionprops(masks)
        
        # Filter regions by area
        valid_labels = []
        for prop in props:
            area = prop.area
            if self.config.min_nucleus_size <= area <= self.config.max_nucleus_size:
                valid_labels.append(prop.label)
        
        # Create new mask with only valid nuclei
        filtered_mask = np.zeros_like(masks)
        for label in valid_labels:
            filtered_mask[masks == label] = label
        
        # Relabel to ensure consecutive numbering
        filtered_mask = measure.label(filtered_mask > 0)
        
        return filtered_mask.astype(np.uint16)
    
    def calculate_segmentation_metrics(self, image: np.ndarray, mask: np.ndarray) -> Dict[str, float]:
        """
        Calculate quality metrics for segmentation results.
        
        Args:
            image: Original image
            mask: Segmentation mask
            
        Returns:
            Dictionary of quality metrics
        """
        metrics = {}
        
        # Basic counting metrics
        num_nuclei = len(np.unique(mask)) - 1  # Exclude background
        metrics['num_nuclei'] = float(num_nuclei)
        
        # Coverage metrics
        total_pixels = image.size if len(image.shape) == 2 else image.shape[0] * image.shape[1]
        nucleus_pixels = np.sum(mask > 0)
        metrics['nucleus_coverage'] = float(nucleus_pixels / total_pixels)
        
        # Size metrics
        if num_nuclei > 0:
            props = measure.regionprops(mask)
            areas = [prop.area for prop in props]
            metrics['mean_nucleus_area'] = float(np.mean(areas))
            metrics['std_nucleus_area'] = float(np.std(areas))
            metrics['min_nucleus_area'] = float(np.min(areas))
            metrics['max_nucleus_area'] = float(np.max(areas))
            
            # Circularity metrics
            circularities = []
            for prop in props:
                perimeter = prop.perimeter
                area = prop.area
                if perimeter > 0:
                    circularity = 4 * np.pi * area / (perimeter ** 2)
                    circularities.append(circularity)
            
            if circularities:
                metrics['mean_circularity'] = float(np.mean(circularities))
                metrics['std_circularity'] = float(np.std(circularities))
        else:
            metrics.update({
                'mean_nucleus_area': 0.0,
                'std_nucleus_area': 0.0,
                'min_nucleus_area': 0.0,
                'max_nucleus_area': 0.0,
                'mean_circularity': 0.0,
                'std_circularity': 0.0
            })
        
        return metrics

# Initialize the segmentator
segmentator = NucleusSegmentator(config)
print("‚úÖ Nucleus segmentator initialized successfully")
print(f"ü§ñ Using model: {config.model_type}")
print(f"‚ö° GPU acceleration: {core.use_gpu()}")

2025-09-25 15:55:17,616 - INFO - Neither TORCH CUDA nor MPS version not installed/working.
2025-09-25 15:55:17,619 - INFO - >>>> using CPU
2025-09-25 15:55:17,621 - INFO - >>>> using CPU
2025-09-25 15:55:17,619 - INFO - >>>> using CPU
2025-09-25 15:55:17,621 - INFO - >>>> using CPU


2025-09-25 15:55:20,762 - INFO - >>>> loading model C:\Users\meetb\.cellpose\models\cpsam
2025-09-25 15:55:23,629 - INFO - Cellpose model 'nuclei' initialized successfully
2025-09-25 15:55:23,634 - INFO - Neither TORCH CUDA nor MPS version not installed/working.
2025-09-25 15:55:23,629 - INFO - Cellpose model 'nuclei' initialized successfully
2025-09-25 15:55:23,634 - INFO - Neither TORCH CUDA nor MPS version not installed/working.


‚úÖ Nucleus segmentator initialized successfully
ü§ñ Using model: nuclei
‚ö° GPU acceleration: False


## Batch Processing Functions

Now let's implement the batch processing functionality to handle all images in the dataset and create the necessary output folders.

In [9]:
class DatasetProcessor:
    """
    Handles batch processing of the entire dataset for nucleus segmentation.
    
    This class manages:
    - Dataset directory traversal
    - Output folder creation
    - Batch processing with progress tracking
    - Results aggregation and reporting
    """
    
    def __init__(self, segmentator: NucleusSegmentator, config: SegmentationConfig):
        """
        Initialize the dataset processor.
        
        Args:
            segmentator: NucleusSegmentator instance
            config: SegmentationConfig object
        """
        self.segmentator = segmentator
        self.config = config
        self.processing_results = []
        
    def get_image_paths(self, category: str) -> List[Path]:
        """
        Get all image paths for a specific cell category.
        
        Args:
            category: Cell category name
            
        Returns:
            List of image file paths
        """
        input_dir = Path(self.config.dataset_root) / category / self.config.input_folder
        if not input_dir.exists():
            logger.warning(f"Input directory does not exist: {input_dir}")
            return []
        
        # Get all .bmp files and sort them naturally
        image_paths = list(input_dir.glob("*.bmp"))
        return natsorted(image_paths)
    
    def create_output_directory(self, category: str) -> Path:
        """
        Create output directory for nucleus masks.
        
        Args:
            category: Cell category name
            
        Returns:
            Path to output directory
        """
        output_dir = Path(self.config.dataset_root) / category / self.config.output_folder
        output_dir.mkdir(exist_ok=True)
        logger.info(f"Output directory created: {output_dir}")
        return output_dir
    
    def process_single_image(self, image_path: Path, output_path: Path) -> Dict:
        """
        Process a single image and save the nucleus mask.
        
        Args:
            image_path: Path to input image
            output_path: Path to save output mask
            
        Returns:
            Dictionary with processing results and metrics
        """
        result = {
            'image_path': str(image_path),
            'output_path': str(output_path),
            'success': False,
            'processing_time': 0.0,
            'error_message': None,
            'metrics': {}
        }
        
        try:
            start_time = time.time()
            
            # Load image
            image = cv2.imread(str(image_path), cv2.IMREAD_GRAYSCALE)
            if image is None:
                raise ValueError(f"Failed to load image: {image_path}")
            
            # Perform segmentation
            mask, _ = self.segmentator.segment_nucleus(image)
            
            # Calculate metrics
            metrics = self.segmentator.calculate_segmentation_metrics(image, mask)
            result['metrics'] = metrics
            
            # Save mask
            cv2.imwrite(str(output_path), mask)
            
            # Record success
            result['success'] = True
            result['processing_time'] = time.time() - start_time
            
            logger.debug(f"Processed {image_path.name}: {metrics['num_nuclei']} nuclei found")
            
        except Exception as e:
            result['error_message'] = str(e)
            logger.error(f"Failed to process {image_path}: {str(e)}")
        
        return result
    
    def process_category(self, category: str, limit: Optional[int] = None) -> List[Dict]:
        """
        Process all images in a specific category.
        
        Args:
            category: Cell category name
            limit: Optional limit on number of images to process (for testing)
            
        Returns:
            List of processing results
        """
        logger.info(f"Processing category: {category}")
        
        # Get image paths
        image_paths = self.get_image_paths(category)
        if not image_paths:
            logger.warning(f"No images found for category {category}")
            return []
        
        # Apply limit if specified
        if limit is not None:
            image_paths = image_paths[:limit]
            logger.info(f"Limited to first {limit} images for testing")
        
        # Create output directory
        output_dir = self.create_output_directory(category)
        
        # Process each image
        category_results = []
        
        with tqdm(image_paths, desc=f"Processing {category}", unit="images") as pbar:
            for image_path in pbar:
                # Create output filename (same name but different extension)
                output_filename = image_path.stem + "_mask.png"
                output_path = output_dir / output_filename
                
                # Process image
                result = self.process_single_image(image_path, output_path)
                result['category'] = category
                category_results.append(result)
                
                # Update progress bar with metrics
                if result['success']:
                    num_nuclei = result['metrics'].get('num_nuclei', 0)
                    pbar.set_postfix({
                        'nuclei': int(num_nuclei),
                        'coverage': f"{result['metrics'].get('nucleus_coverage', 0):.2%}"
                    })
        
        # Log summary
        successful_count = sum(1 for r in category_results if r['success'])
        logger.info(f"Category {category}: {successful_count}/{len(category_results)} images processed successfully")
        
        return category_results
    
    def process_all_categories(self, limit_per_category: Optional[int] = None) -> List[Dict]:
        """
        Process all categories in the dataset.
        
        Args:
            limit_per_category: Optional limit on images per category (for testing)
            
        Returns:
            List of all processing results
        """
        all_results = []
        
        logger.info(f"Starting batch processing of {len(self.config.cell_categories)} categories")
        
        for category in self.config.cell_categories:
            category_results = self.process_category(category, limit=limit_per_category)
            all_results.extend(category_results)
        
        self.processing_results = all_results
        self._log_overall_summary()
        
        return all_results
    
    def _log_overall_summary(self):
        """Log overall processing summary."""
        if not self.processing_results:
            return
        
        total_images = len(self.processing_results)
        successful_images = sum(1 for r in self.processing_results if r['success'])
        total_nuclei = sum(r['metrics'].get('num_nuclei', 0) for r in self.processing_results if r['success'])
        avg_processing_time = np.mean([r['processing_time'] for r in self.processing_results if r['success']])
        
        logger.info("="*60)
        logger.info("PROCESSING SUMMARY")
        logger.info("="*60)
        logger.info(f"Total images processed: {total_images}")
        logger.info(f"Successful segmentations: {successful_images}")
        logger.info(f"Success rate: {successful_images/total_images:.1%}")
        logger.info(f"Total nuclei detected: {int(total_nuclei)}")
        logger.info(f"Average processing time: {avg_processing_time:.2f}s")
        logger.info("="*60)
    
    def save_results_summary(self, output_path: str):
        """
        Save processing results to a CSV file.
        
        Args:
            output_path: Path to save the results CSV
        """
        if not self.processing_results:
            logger.warning("No results to save")
            return
        
        # Prepare data for DataFrame
        df_data = []
        for result in self.processing_results:
            row = {
                'category': result.get('category', ''),
                'image_path': result['image_path'],
                'output_path': result['output_path'],
                'success': result['success'],
                'processing_time': result['processing_time'],
                'error_message': result['error_message'] or '',
            }
            # Add metrics
            row.update(result['metrics'])
            df_data.append(row)
        
        # Create and save DataFrame
        df = pd.DataFrame(df_data)
        df.to_csv(output_path, index=False)
        logger.info(f"Results summary saved to: {output_path}")

# Initialize dataset processor
processor = DatasetProcessor(segmentator, config)
print("‚úÖ Dataset processor initialized successfully")
print(f"üìä Ready to process {len(config.cell_categories)} cell categories")

‚úÖ Dataset processor initialized successfully
üìä Ready to process 5 cell categories


## Visualization Functions

Let's create comprehensive visualization functions to assess the quality of nucleus segmentation.

In [10]:
class SegmentationVisualizer:
    """
    Provides comprehensive visualization capabilities for nucleus segmentation results.
    
    This class creates various visualization types:
    - Side-by-side original and mask comparisons
    - Overlaid segmentation results
    - Statistical summaries
    - Quality assessment plots
    """
    
    def __init__(self, config: SegmentationConfig):
        """
        Initialize the visualizer.
        
        Args:
            config: SegmentationConfig object
        """
        self.config = config
        plt.style.use('default')  # Ensure consistent plotting style
        
    def create_colorized_mask(self, mask: np.ndarray, alpha: float = 0.6) -> np.ndarray:
        """
        Create a colorized version of the segmentation mask.
        
        Args:
            mask: Segmentation mask
            alpha: Transparency for overlay
            
        Returns:
            Colorized mask as RGB image
        """
        # Create colormap for different nuclei
        colored_mask = np.zeros((*mask.shape, 3), dtype=np.uint8)
        
        if mask.max() > 0:
            # Use a colormap to assign different colors to different nuclei
            unique_labels = np.unique(mask)[1:]  # Exclude background
            colors = plt.cm.tab20(np.linspace(0, 1, len(unique_labels)))[:, :3] * 255
            
            for i, label in enumerate(unique_labels):
                colored_mask[mask == label] = colors[i % len(colors)]
        
        return colored_mask
    
    def visualize_single_result(self, image: np.ndarray, mask: np.ndarray, 
                               title: str = "", metrics: Optional[Dict] = None,
                               save_path: Optional[str] = None) -> plt.Figure:
        """
        Create a comprehensive visualization of a single segmentation result.
        
        Args:
            image: Original image
            mask: Segmentation mask
            title: Plot title
            metrics: Optional metrics dictionary
            save_path: Optional path to save the figure
            
        Returns:
            Matplotlib figure object
        """
        fig, axes = plt.subplots(2, 2, figsize=self.config.figsize, dpi=self.config.dpi)
        fig.suptitle(title, fontsize=16, fontweight='bold')
        
        # Original image
        axes[0, 0].imshow(image, cmap='gray')
        axes[0, 0].set_title('Original Image')
        axes[0, 0].axis('off')
        
        # Segmentation mask
        axes[0, 1].imshow(mask, cmap='nipy_spectral')
        axes[0, 1].set_title(f'Nucleus Mask ({np.max(mask)} nuclei)')
        axes[0, 1].axis('off')
        
        # Overlay
        colored_mask = self.create_colorized_mask(mask)
        if len(image.shape) == 2:
            overlay_base = np.stack([image, image, image], axis=2)
        else:
            overlay_base = image
        
        overlay = cv2.addWeighted(overlay_base.astype(np.uint8), 0.7, 
                                 colored_mask.astype(np.uint8), 0.3, 0)
        axes[1, 0].imshow(overlay)
        axes[1, 0].set_title('Overlay')
        axes[1, 0].axis('off')
        
        # Metrics summary
        axes[1, 1].axis('off')
        if metrics:
            metrics_text = self._format_metrics_text(metrics)
            axes[1, 1].text(0.1, 0.9, metrics_text, transform=axes[1, 1].transAxes,
                           fontsize=10, verticalalignment='top', fontfamily='monospace')
        else:
            axes[1, 1].text(0.5, 0.5, 'No metrics available', 
                           transform=axes[1, 1].transAxes, ha='center', va='center')
        
        plt.tight_layout()
        
        if save_path:
            plt.savefig(save_path, dpi=self.config.dpi, bbox_inches='tight')
        
        return fig
    
    def _format_metrics_text(self, metrics: Dict) -> str:
        """Format metrics dictionary into readable text."""
        lines = ['Segmentation Metrics:', '‚îÄ' * 20]
        
        # Basic metrics
        lines.append(f"Nuclei count: {int(metrics.get('num_nuclei', 0))}")
        lines.append(f"Coverage: {metrics.get('nucleus_coverage', 0):.1%}")
        
        # Size metrics
        if metrics.get('num_nuclei', 0) > 0:
            lines.append('')
            lines.append('Size Statistics:')
            lines.append(f"Mean area: {metrics.get('mean_nucleus_area', 0):.1f}px")
            lines.append(f"Std area: {metrics.get('std_nucleus_area', 0):.1f}px")
            lines.append(f"Min area: {metrics.get('min_nucleus_area', 0):.1f}px")
            lines.append(f"Max area: {metrics.get('max_nucleus_area', 0):.1f}px")
            
            # Shape metrics
            lines.append('')
            lines.append('Shape Statistics:')
            lines.append(f"Mean circularity: {metrics.get('mean_circularity', 0):.2f}")
            lines.append(f"Std circularity: {metrics.get('std_circularity', 0):.2f}")
        
        return '\\n'.join(lines)
    
    def compare_multiple_samples(self, samples: List[Tuple[np.ndarray, np.ndarray, str]], 
                                cols: int = 3, save_path: Optional[str] = None) -> plt.Figure:
        """
        Compare multiple segmentation samples in a grid layout.
        
        Args:
            samples: List of (image, mask, title) tuples
            cols: Number of columns in the grid
            save_path: Optional path to save the figure
            
        Returns:
            Matplotlib figure object
        """
        n_samples = len(samples)
        rows = (n_samples + cols - 1) // cols
        
        fig, axes = plt.subplots(rows * 2, cols, 
                               figsize=(cols * 5, rows * 6), 
                               dpi=self.config.dpi)
        
        if rows == 1 and cols == 1:
            axes = axes.reshape(2, 1)
        elif rows == 1 or cols == 1:
            axes = axes.reshape(rows * 2, -1)
        
        for idx, (image, mask, title) in enumerate(samples):
            col = idx % cols
            row_base = (idx // cols) * 2
            
            # Original image
            axes[row_base, col].imshow(image, cmap='gray')
            axes[row_base, col].set_title(f'{title} - Original')
            axes[row_base, col].axis('off')
            
            # Mask overlay
            colored_mask = self.create_colorized_mask(mask)
            if len(image.shape) == 2:
                overlay_base = np.stack([image, image, image], axis=2)
            else:
                overlay_base = image
                
            overlay = cv2.addWeighted(overlay_base.astype(np.uint8), 0.7, 
                                     colored_mask.astype(np.uint8), 0.3, 0)
            axes[row_base + 1, col].imshow(overlay)
            axes[row_base + 1, col].set_title(f'{title} - Segmentation ({np.max(mask)} nuclei)')
            axes[row_base + 1, col].axis('off')
        
        # Hide unused subplots
        for idx in range(n_samples, rows * cols):
            col = idx % cols
            row_base = (idx // cols) * 2
            if row_base < rows * 2:
                axes[row_base, col].axis('off')
                axes[row_base + 1, col].axis('off')
        
        plt.tight_layout()
        
        if save_path:
            plt.savefig(save_path, dpi=self.config.dpi, bbox_inches='tight')
        
        return fig
    
    def plot_metrics_distribution(self, results: List[Dict], save_path: Optional[str] = None) -> plt.Figure:
        """
        Plot distribution of segmentation metrics across all processed images.
        
        Args:
            results: List of processing results with metrics
            save_path: Optional path to save the figure
            
        Returns:
            Matplotlib figure object
        """
        # Filter successful results and extract metrics
        successful_results = [r for r in results if r['success'] and r['metrics']]
        
        if not successful_results:
            fig, ax = plt.subplots(figsize=self.config.figsize)
            ax.text(0.5, 0.5, 'No successful results to plot', 
                   ha='center', va='center', transform=ax.transAxes)
            return fig
        
        # Extract metrics
        metrics_df = pd.DataFrame([r['metrics'] for r in successful_results])
        categories = [r['category'] for r in successful_results]
        metrics_df['category'] = categories
        
        # Create subplots
        fig, axes = plt.subplots(2, 3, figsize=(18, 12), dpi=self.config.dpi)
        fig.suptitle('Nucleus Segmentation Metrics Distribution', fontsize=16, fontweight='bold')
        
        # Plot distributions
        metrics_to_plot = [
            ('num_nuclei', 'Number of Nuclei'),
            ('nucleus_coverage', 'Nucleus Coverage'),
            ('mean_nucleus_area', 'Mean Nucleus Area (px)'),
            ('mean_circularity', 'Mean Circularity'),
            ('processing_time', 'Processing Time (s)')
        ]
        
        for idx, (metric, title) in enumerate(metrics_to_plot):
            if idx >= 6:
                break
                
            row, col = idx // 3, idx % 3
            ax = axes[row, col]
            
            if metric == 'processing_time':
                # Use processing time from results
                data = [r['processing_time'] for r in successful_results]
                category_data = categories
            else:
                data = metrics_df[metric].values
                category_data = metrics_df['category'].values
            
            # Create boxplot by category
            category_data_dict = {}
            for cat in self.config.cell_categories:
                mask = [c == cat for c in category_data]
                category_data_dict[cat.replace('im_', '')] = [d for d, m in zip(data, mask) if m]
            
            if category_data_dict:
                ax.boxplot(category_data_dict.values(), labels=category_data_dict.keys())
                ax.set_title(title)
                ax.tick_params(axis='x', rotation=45)
            else:
                ax.text(0.5, 0.5, 'No data', ha='center', va='center', transform=ax.transAxes)
                ax.set_title(title)
        
        # Overall statistics in the last subplot
        ax = axes[1, 2]
        ax.axis('off')
        
        # Calculate overall stats
        total_images = len(successful_results)
        total_nuclei = sum(r['metrics']['num_nuclei'] for r in successful_results)
        avg_coverage = np.mean([r['metrics']['nucleus_coverage'] for r in successful_results])
        avg_time = np.mean([r['processing_time'] for r in successful_results])
        
        stats_text = f"""Overall Statistics
        ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ
        Total Images: {total_images}
        Total Nuclei: {int(total_nuclei)}
        Avg Coverage: {avg_coverage:.1%}
        Avg Time: {avg_time:.2f}s
        
        Success Rate: {len(successful_results)/len(results):.1%}
        """
        
        ax.text(0.1, 0.9, stats_text, transform=ax.transAxes,
               fontsize=11, verticalalignment='top', fontfamily='monospace')
        
        plt.tight_layout()
        
        if save_path:
            plt.savefig(save_path, dpi=self.config.dpi, bbox_inches='tight')
        
        return fig

# Initialize visualizer
visualizer = SegmentationVisualizer(config)
print("‚úÖ Segmentation visualizer initialized successfully")
print("üìä Ready to create comprehensive visualizations")

‚úÖ Segmentation visualizer initialized successfully
üìä Ready to create comprehensive visualizations


## Full Dataset Processing

Once the test is successful, you can process the entire dataset using the following commands:

In [12]:
print("üöÄ Starting full dataset processing...")
all_results = processor.process_all_categories()

# Save results
timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")
results_path = f"nucleus_segmentation_results_{timestamp}.csv"
processor.save_results_summary(results_path)

print(f"üíæ Results saved to: {results_path}")

# print("‚ö†Ô∏è Full dataset processing is commented out for safety.")
# print("üìù Uncomment the above code to process all images in the dataset.")
# print("‚è±Ô∏è Expected processing time: Several hours depending on your system.")
# print("üíΩ This will create 'Nucleus Masks' folders in each cell category directory.")

2025-09-25 16:17:58,380 - INFO - Starting batch processing of 5 categories
2025-09-25 16:17:58,382 - INFO - Processing category: im_Dyskeratotic
2025-09-25 16:17:58,382 - INFO - Processing category: im_Dyskeratotic
2025-09-25 16:17:58,435 - INFO - Output directory created: c:\Meet\Projects\Project_8_Phoenix_Cervical Cancer Image Classification\Project-Phoenix\Dataset\Augmented Dataset - Limited Enhancement\im_Dyskeratotic\Nucleus Masks
2025-09-25 16:17:58,435 - INFO - Output directory created: c:\Meet\Projects\Project_8_Phoenix_Cervical Cancer Image Classification\Project-Phoenix\Dataset\Augmented Dataset - Limited Enhancement\im_Dyskeratotic\Nucleus Masks


üöÄ Starting full dataset processing...


Processing im_Dyskeratotic:   0%|          | 0/813 [00:00<?, ?images/s]

2025-09-25 16:51:07,255 - ERROR - Segmentation failed: not enough values to unpack (expected 4, got 3)
2025-09-25 16:51:07,255 - ERROR - Segmentation failed: not enough values to unpack (expected 4, got 3)


KeyboardInterrupt: 

## Usage Instructions and Documentation

This notebook provides a complete nucleus segmentation pipeline for cervical cancer cell images. Here's how to use it:

### üöÄ Quick Start Guide

1. **Environment Setup**: All required libraries are already imported and configured
2. **Single Image Test**: Run the test cell above to verify segmentation on one image
3. **Full Processing**: Uncomment and run the full processing cell to handle all images
4. **Results**: Check the created `Nucleus Masks` folders in each cell category directory

### üìÅ Directory Structure

The system expects this directory structure:
```
Dataset Root/
‚îú‚îÄ‚îÄ im_Dyskeratotic/
‚îÇ   ‚îú‚îÄ‚îÄ NLM_CLAHE/           # Input images (.bmp files)
‚îÇ   ‚îî‚îÄ‚îÄ Nucleus Masks/       # Generated masks (.png files)
‚îú‚îÄ‚îÄ im_Koilocytotic/
‚îÇ   ‚îú‚îÄ‚îÄ NLM_CLAHE/
‚îÇ   ‚îî‚îÄ‚îÄ Nucleus Masks/
... (and so on for all 5 categories)
```

### üîß Configuration Options

Key parameters you can modify in the `SegmentationConfig` class:

- **`model_type`**: Cellpose model to use ('nuclei', 'cyto', 'cyto2')
- **`diameter`**: Expected nucleus diameter (None for auto-estimation)
- **`flow_threshold`**: Flow error threshold (0.4 default)
- **`cellprob_threshold`**: Cell probability threshold (0.0 default)
- **`min_nucleus_size`**: Minimum nucleus area in pixels (50 default)
- **`max_nucleus_size`**: Maximum nucleus area in pixels (5000 default)

### üìä Output Files

For each processed image, the system generates:

1. **Nucleus Mask**: PNG file with segmented nuclei (each nucleus has a unique integer ID)
2. **CSV Results**: Summary file with metrics for all processed images including:
   - Number of nuclei detected
   - Nucleus coverage percentage
   - Area and circularity statistics
   - Processing times
   - Success/failure status

### üéØ Key Features

- **Robust Segmentation**: Uses state-of-the-art Cellpose model optimized for nuclei
- **Quality Control**: Size-based filtering to remove artifacts and oversegmented regions
- **Comprehensive Metrics**: Detailed statistics on segmentation quality
- **Visualization Tools**: Side-by-side comparisons and overlay visualizations
- **Batch Processing**: Efficiently handles thousands of images with progress tracking
- **Error Handling**: Graceful failure handling with detailed logging

### üìà Quality Assessment

The system provides multiple ways to assess segmentation quality:

1. **Visual Inspection**: Overlay visualizations show original images with detected nuclei
2. **Statistical Metrics**: Coverage, size, and shape statistics for quality control
3. **Comparative Analysis**: Box plots showing distributions across cell categories
4. **Processing Reports**: Success rates and error logs for troubleshooting

### ‚öôÔ∏è Performance Considerations

- **CPU vs GPU**: System automatically detects GPU availability (CPU fallback included)
- **Memory Usage**: Images processed individually to minimize memory requirements
- **Processing Time**: Expect ~1-5 seconds per image on CPU, faster with GPU
- **Disk Space**: Mask images are typically smaller than originals (lossless PNG compression)

### üî¨ Scientific Applications

This nucleus segmentation pipeline enables:

1. **Explainable AI**: Understanding which cellular regions drive model decisions
2. **Morphometric Analysis**: Quantitative analysis of nucleus size, shape, and distribution
3. **Disease Characterization**: Statistical comparison of nucleus features across cell types
4. **Quality Control**: Automated detection of segmentation failures and outliers
5. **Standardization**: Consistent nucleus detection across large datasets

### üìö References and Methods

- **Cellpose**: Stringer, C. et al. "Cellpose: a generalist algorithm for cellular segmentation." Nature Methods (2021)
- **Nucleus Detection**: Optimized for cervical cancer cell morphology
- **Post-processing**: Size filtering and morphological operations for quality improvement