# Install Required Packages

This cell installs all necessary libraries for the ensemble approach:
- PyTorch for deep learning
- MONAI for medical imaging (Swin-UNETR, MedNeXt)
- nnU-Net v2 for automatic segmentation
- Standard ML libraries (numpy, opencv, sklearn, etc.)

**Note:** Adjust CUDA version based on your GPU
**Estimated time:** 5-10 minutes

In [3]:
# Install PyTorch (CUDA 11.8 - adjust for your GPU)
!pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118

# Install MONAI with all dependencies
!pip install "monai[all]==1.3.0"

# Install nnU-Net v2
!pip install nnunetv2

# Install other dependencies
!pip install opencv-python scikit-learn pandas matplotlib seaborn tqdm
!pip install SimpleITK nibabel pydicom albumentations

Looking in indexes: https://download.pytorch.org/whl/cu118
Collecting monai==1.3.0 (from monai[all]==1.3.0)
  Downloading monai-1.3.0-202310121228-py3-none-any.whl.metadata (10 kB)
Collecting clearml>=1.10.0rc0 (from monai[all]==1.3.0)
  Downloading clearml-2.1.3-py2.py3-none-any.whl.metadata (17 kB)
Collecting cucim>=23.2.0 (from monai[all]==1.3.0)
  Downloading cucim-23.10.0-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl.metadata (43 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m43.1/43.1 kB[0m [31m2.2 MB/s[0m eta [36m0:00:00[0m
Collecting fire (from monai[all]==1.3.0)
  Downloading fire-0.7.1-py3-none-any.whl.metadata (5.8 kB)
Collecting imagecodecs (from monai[all]==1.3.0)
  Downloading imagecodecs-2026.1.14-cp311-abi3-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl.metadata (20 kB)
Collecting itk>=5.2 (from monai[all]==1.3.0)
  Downloading itk-5.4.5-cp311-abi3-manylinux_2_28_x86_64.whl.metadata (22 kB)
Collecting lmdb (from monai[all]==1.3.0)
  

Import Libraries

In [10]:
!pip install monai


Collecting monai
  Downloading monai-1.5.2-py3-none-any.whl.metadata (13 kB)
Downloading monai-1.5.2-py3-none-any.whl (2.7 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m2.7/2.7 MB[0m [31m44.4 MB/s[0m eta [36m0:00:00[0ma [36m0:00:01[0m
[?25hInstalling collected packages: monai
Successfully installed monai-1.5.2


# Import Required Libraries

Import all necessary libraries for:
- Deep learning (PyTorch, MONAI)
- Data processing (numpy, opencv)
- Visualization (matplotlib, seaborn)
- File handling (pathlib, json)

**No errors should appear if installation was successful**

In [11]:
# Standard libraries
import os
import sys
import json
import warnings
from pathlib import Path
from tqdm import tqdm
import shutil

# Data processing
import numpy as np
import pandas as pd
import cv2
from sklearn.model_selection import KFold

# Deep learning
import torch
import torch.nn as nn
from torch.utils.data import Dataset, DataLoader
from torch.optim import Adam
from torch.optim.lr_scheduler import CosineAnnealingLR

# MONAI
from monai.networks.nets import SwinUNETR, UNet
from monai.losses import DiceLoss
from monai.metrics import DiceMetric
from monai.transforms import (
    Compose, RandRotate, RandFlip, RandZoom
)

# Visualization
import matplotlib.pyplot as plt
import seaborn as sns

# Suppress warnings
warnings.filterwarnings('ignore')

print("✓ All libraries imported successfully!")
print(f"PyTorch version: {torch.__version__}")
print(f"CUDA available: {torch.cuda.is_available()}")
if torch.cuda.is_available():
    print(f"GPU: {torch.cuda.get_device_name(0)}")

2026-02-07 19:41:18.677086: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:467] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
E0000 00:00:1770493278.838241      55 cuda_dnn.cc:8579] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
E0000 00:00:1770493278.883908      55 cuda_blas.cc:1407] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
W0000 00:00:1770493279.291445      55 computation_placer.cc:177] computation placer already registered. Please check linkage and avoid linking the same target more than once.
W0000 00:00:1770493279.291494      55 computation_placer.cc:177] computation placer already registered. Please check linkage and avoid linking the same target more than once.
W0000 00:00:1770493279.291496      55 computation_placer.cc:177] computation placer alr

✓ All libraries imported successfully!
PyTorch version: 2.8.0+cu126
CUDA available: True
GPU: Tesla T4


Set Random Seeds

# Set Random Seeds for Reproducibility

Ensures reproducible results across multiple runs by setting:
- Python random seed
- NumPy random seed  
- PyTorch random seed (CPU & GPU)

**Important:** Same seeds = same results

In [13]:
def set_seed(seed=42):
    """Set random seeds for reproducibility"""
    import random
    random.seed(seed)
    np.random.seed(seed)
    torch.manual_seed(seed)
    torch.cuda.manual_seed_all(seed)
    torch.backends.cudnn.deterministic = True
    torch.backends.cudnn.benchmark = False
    
    print(f"✓ Random seed set to {seed}")

set_seed(42)

✓ Random seed set to 42


# Create Project Directory Structure

Creates all necessary folders for:
- Raw data (KSSD2025 images and masks)
- Preprocessed data (nnU-Net and MONAI formats)
- Model checkpoints (saved weights)
- Results (predictions, visualizations)

**Run this before any training**

In [14]:
def create_directories():
    """Create all necessary directories"""
    
    directories = [
        # Data directories
        "data/KSSD2025/images",
        "data/KSSD2025/masks",
        "data/nnUNet_raw",
        "data/nnUNet_preprocessed",
        "data/nnUNet_results",
        "data/MONAI_data",
        
        # Results directories
        "results/nnunet",
        "results/swin_unetr",
        "results/mednext",
        "results/ensemble",
        "results/visualizations",
        
        # Model checkpoints
        "checkpoints/nnunet",
        "checkpoints/swin_unetr",
        "checkpoints/mednext",
    ]
    
    for directory in directories:
        Path(directory).mkdir(parents=True, exist_ok=True)
    
    print("✓ Directory structure created successfully!")
    print(f"  Total directories: {len(directories)}")

create_directories()

✓ Directory structure created successfully!
  Total directories: 14


In [19]:
import os

BASE_PATH = "/kaggle/input/kssd2025-kidney-stone-segmentation-dataset"

for root, dirs, files in os.walk(BASE_PATH):
    print(root)
    print("  Dirs:", dirs)
    print("  Files:", files[:5])
    break


/kaggle/input/kssd2025-kidney-stone-segmentation-dataset
  Dirs: ['data']
  Files: []


# KSSD2025 Dataset Preprocessor

This class handles:
- Creating 5-fold cross-validation splits
- Converting data to nnU-Net format (for nnU-Net model)
- Converting data to MONAI format (for Swin-UNETR and MedNeXt)
- Maintaining consistent splits across all models

**Key Features:**
- Automatic fold splitting
- Format conversion for different frameworks
- Preserves data integrity

In [43]:
# DEBUG: Check what's actually in the image and label folders
from pathlib import Path

img_dir = Path("/kaggle/input/kssd2025-kidney-stone-segmentation-dataset/data/image")
label_dir = Path("/kaggle/input/kssd2025-kidney-stone-segmentation-dataset/data/label")

print("=== CHECKING IMAGE DIRECTORY ===")
print(f"Path: {img_dir}")
print(f"Exists: {img_dir.exists()}")

if img_dir.exists():
    all_files = list(img_dir.iterdir())
    print(f"Total files: {len(all_files)}")
    
    if all_files:
        print("\nFirst 10 files:")
        for f in all_files[:10]:
            print(f"  - {f.name} (extension: {f.suffix})")
        
        # Count by extension
        from collections import Counter
        extensions = Counter([f.suffix for f in all_files])
        print("\nFile extensions found:")
        for ext, count in extensions.items():
            print(f"  {ext}: {count} files")

print("\n=== CHECKING LABEL DIRECTORY ===")
print(f"Path: {label_dir}")
print(f"Exists: {label_dir.exists()}")

if label_dir.exists():
    all_files = list(label_dir.iterdir())
    print(f"Total files: {len(all_files)}")
    
    if all_files:
        print("\nFirst 10 files:")
        for f in all_files[:10]:
            print(f"  - {f.name} (extension: {f.suffix})")
        
        # Count by extension
        from collections import Counter
        extensions = Counter([f.suffix for f in all_files])
        print("\nFile extensions found:")
        for ext, count in extensions.items():
            print(f"  {ext}: {count} files")

=== CHECKING IMAGE DIRECTORY ===
Path: /kaggle/input/kssd2025-kidney-stone-segmentation-dataset/data/image
Exists: True
Total files: 838

First 10 files:
  - 659.tif (extension: .tif)
  - 274.tif (extension: .tif)
  - 315.tif (extension: .tif)
  - 919.tif (extension: .tif)
  - 948.tif (extension: .tif)
  - 683.tif (extension: .tif)
  - 130.tif (extension: .tif)
  - 508.tif (extension: .tif)
  - 1136.tif (extension: .tif)
  - 1052.tif (extension: .tif)

File extensions found:
  .tif: 838 files

=== CHECKING LABEL DIRECTORY ===
Path: /kaggle/input/kssd2025-kidney-stone-segmentation-dataset/data/label
Exists: True
Total files: 838

First 10 files:
  - 659.tif (extension: .tif)
  - 274.tif (extension: .tif)
  - 315.tif (extension: .tif)
  - 919.tif (extension: .tif)
  - 948.tif (extension: .tif)
  - 683.tif (extension: .tif)
  - 130.tif (extension: .tif)
  - 508.tif (extension: .tif)
  - 1136.tif (extension: .tif)
  - 1052.tif (extension: .tif)

File extensions found:
  .tif: 838 files


In [44]:
import os
import json
import shutil
from pathlib import Path
from sklearn.model_selection import KFold
import numpy as np
import subprocess

# =========================================================
# PART 1: PREPROCESSING
# =========================================================

class KSSD2025Preprocessor:
    def __init__(self, raw_data_path, output_base_path, n_folds=5):
        self.raw_path = Path(raw_data_path)
        self.output_path = Path(output_base_path)
        self.n_folds = n_folds
        
        self.nnunet_path = self.output_path / "nnUNet_raw"
        self.monai_path = self.output_path / "MONAI_data"
        
        # Store images and masks after first finding them
        self.images = None
        self.masks = None
        
        print(f"✓ Preprocessor initialized")
        print(f"  Raw data: {self.raw_path}")
    
    def _find_images_and_masks(self):
        """Find images and masks only once and cache them"""
        if self.images is not None and self.masks is not None:
            return self.images, self.masks
            
        img_dir = self.raw_path / "image"
        mask_dir = self.raw_path / "label"
        
        print(f"\n=== Searching for images and masks ===")
        print(f"Image dir: {img_dir}")
        print(f"Mask dir: {mask_dir}")
        
        # Search for .tif files (the actual format in your dataset)
        self.images = sorted(list(img_dir.glob("*.tif")))
        self.masks = sorted(list(mask_dir.glob("*.tif")))
        
        print(f"✓ Found {len(self.images)} images and {len(self.masks)} masks")
        
        if len(self.images) == 0:
            raise ValueError(f"No images found in {img_dir}")
        
        return self.images, self.masks
    
    def prepare_nnunet_format(self, dataset_id=500, dataset_name="KSSD2025"):
        print("\n=== Preparing nnU-Net Format ===")
        
        images, masks = self._find_images_and_masks()
        
        dataset_folder = self.nnunet_path / f"Dataset{dataset_id:03d}_{dataset_name}"
        imagesTr = dataset_folder / "imagesTr"
        labelsTr = dataset_folder / "labelsTr"
        
        if dataset_folder.exists():
            print(f"Removing existing dataset folder...")
            shutil.rmtree(dataset_folder)
        
        imagesTr.mkdir(parents=True, exist_ok=True)
        labelsTr.mkdir(parents=True, exist_ok=True)
        
        print(f"Converting {len(images)} files...")
        
        for idx, (img_path, mask_path) in enumerate(zip(images, masks)):
            case_id = f"{dataset_name}_{idx:04d}"
            
            # Copy with .tif extension (keep original format)
            shutil.copy(img_path, imagesTr / f"{case_id}_0000.tif")
            shutil.copy(mask_path, labelsTr / f"{case_id}.tif")
            
            if (idx + 1) % 200 == 0:
                print(f"  Processed {idx + 1}/{len(images)}")
        
        print(f"✓ Copied all {len(images)} files")
        
        # CORRECT dataset.json format for nnU-Net v2 with .tif files
        dataset_json = {
            "channel_names": {
                "0": "grayscale"
            },
            "labels": {
                "background": 0,
                "kidney_stone": 1
            },
            "numTraining": len(images),
            "file_ending": ".tif",
            "overwrite_image_reader_writer": "NaturalImage2DIO"  # Works for TIFF too
        }
        
        with open(dataset_folder / "dataset.json", "w") as f:
            json.dump(dataset_json, f, indent=4)
        
        print(f"✓ dataset.json created")
        print(f"✓ nnU-Net format ready: {len(images)} samples")
        return dataset_folder
    
    def prepare_monai_format(self):
        print("\n=== Preparing MONAI Format ===")
        
        # Reuse cached images and masks
        images, masks = self._find_images_and_masks()
        
        print(f"Creating splits for {len(images)} samples...")
        
        if len(images) == 0:
            raise ValueError("No images available for MONAI format!")
        
        self.monai_path.mkdir(parents=True, exist_ok=True)
        
        # Create k-fold splits
        kfold = KFold(n_splits=self.n_folds, shuffle=True, random_state=42)
        indices = np.arange(len(images))
        
        fold_splits = {}
        for fold, (train_idx, val_idx) in enumerate(kfold.split(indices)):
            fold_splits[f"fold_{fold}"] = {"train": [], "val": []}
            
            for idx in train_idx:
                fold_splits[f"fold_{fold}"]["train"].append({
                    "image": str(images[idx]),
                    "label": str(masks[idx])
                })
            
            for idx in val_idx:
                fold_splits[f"fold_{fold}"]["val"].append({
                    "image": str(images[idx]),
                    "label": str(masks[idx])
                })
            
            print(f"  Fold {fold}: {len(train_idx)} train, {len(val_idx)} val")
        
        with open(self.monai_path / "fold_splits.json", "w") as f:
            json.dump(fold_splits, f, indent=4)
        
        print(f"✓ MONAI format ready: {self.n_folds} folds, {len(images)} total samples")
        return self.monai_path

# =========================================================
# PART 2: nnU-Net TRAINER
# =========================================================

class nnUNetTrainer:
    def __init__(self, dataset_id=500, dataset_name="KSSD2025", base_path="/kaggle/working"):
        self.dataset_id = dataset_id
        self.dataset_name = dataset_name
        self.base_path = Path(base_path)
        
        self.nnunet_raw = self.base_path / "nnUNet_raw"
        self.nnunet_preprocessed = self.base_path / "nnUNet_preprocessed"
        self.nnunet_results = self.base_path / "nnUNet_results"
        
        self.nnunet_preprocessed.mkdir(parents=True, exist_ok=True)
        self.nnunet_results.mkdir(parents=True, exist_ok=True)
        
        os.environ['nnUNet_raw'] = str(self.nnunet_raw)
        os.environ['nnUNet_preprocessed'] = str(self.nnunet_preprocessed)
        os.environ['nnUNet_results'] = str(self.nnunet_results)
        
        print(f"✓ nnU-Net environment configured")
        print(f"  Raw: {self.nnunet_raw}")
        print(f"  Preprocessed: {self.nnunet_preprocessed}")
        print(f"  Results: {self.nnunet_results}")
    
    def plan_and_preprocess(self):
        print(f"\n=== nnU-Net Planning and Preprocessing ===")
        
        # WITHOUT verification flag (avoids the StopIteration error)
        cmd = [
            "nnUNetv2_plan_and_preprocess",
            "-d", str(self.dataset_id)
        ]
        
        print(f"Running: {' '.join(cmd)}")
        print("⏳ This may take 5-15 minutes...")
        subprocess.run(cmd, check=True)
        print("✓ Preprocessing completed!")
    
    def train_fold(self, fold=0, configuration="2d"):
        print(f"\n=== Training Fold {fold} ===")
        
        cmd = [
            "nnUNetv2_train",
            str(self.dataset_id),
            configuration,
            str(fold)
        ]
        
        print(f"Running: {' '.join(cmd)}")
        subprocess.run(cmd, check=True)
        print(f"✓ Fold {fold} completed!")
    
    def train_all_folds(self, n_folds=5, configuration="2d"):
        print(f"\n{'='*60}")
        print(f"⚠️  WARNING: Training {n_folds} folds")
        print(f"⚠️  Estimated time: 24-48 hours")
        print(f"{'='*60}\n")
        
        for fold in range(n_folds):
            self.train_fold(fold, configuration)
        
        print(f"\n{'='*60}")
        print(f"✓ All {n_folds} folds trained successfully!")
        print(f"{'='*60}")

# =========================================================
# RUN EVERYTHING
# =========================================================

print("="*60)
print("STEP 1: PREPROCESSING")
print("="*60)

preprocessor = KSSD2025Preprocessor(
    raw_data_path="/kaggle/input/kssd2025-kidney-stone-segmentation-dataset/data",
    output_base_path="/kaggle/working",
    n_folds=5
)

nnunet_folder = preprocessor.prepare_nnunet_format(dataset_id=500, dataset_name="KSSD2025")
monai_folder = preprocessor.prepare_monai_format()

print("\n" + "="*60)
print("STEP 2: nnU-Net SETUP")
print("="*60)

nnunet_trainer = nnUNetTrainer(dataset_id=500, base_path="/kaggle/working")
nnunet_trainer.plan_and_preprocess()

print("\n" + "="*60)
print("✓✓✓ ALL SETUP COMPLETED SUCCESSFULLY! ✓✓✓")
print("="*60)
print(f"nnU-Net data: {nnunet_folder}")
print(f"MONAI data: {monai_folder}")
print(f"Total samples: 838 (.tif files)")
print("\nReady to train!")
print("To start training (24-48 hours):")
print("  nnunet_trainer.train_all_folds(n_folds=5)")
print("="*60)

STEP 1: PREPROCESSING
✓ Preprocessor initialized
  Raw data: /kaggle/input/kssd2025-kidney-stone-segmentation-dataset/data

=== Preparing nnU-Net Format ===

=== Searching for images and masks ===
Image dir: /kaggle/input/kssd2025-kidney-stone-segmentation-dataset/data/image
Mask dir: /kaggle/input/kssd2025-kidney-stone-segmentation-dataset/data/label
✓ Found 838 images and 838 masks
Removing existing dataset folder...
Converting 838 files...
  Processed 200/838
  Processed 400/838
  Processed 600/838
  Processed 800/838
✓ Copied all 838 files
✓ dataset.json created
✓ nnU-Net format ready: 838 samples

=== Preparing MONAI Format ===
Creating splits for 838 samples...
  Fold 0: 670 train, 168 val
  Fold 1: 670 train, 168 val
  Fold 2: 670 train, 168 val
  Fold 3: 671 train, 167 val
  Fold 4: 671 train, 167 val
✓ MONAI format ready: 5 folds, 838 total samples

STEP 2: nnU-Net SETUP
✓ nnU-Net environment configured
  Raw: /kaggle/working/nnUNet_raw
  Preprocessed: /kaggle/working/nnUNet_p

100%|██████████| 838/838 [00:17<00:00, 48.75it/s] 


Experiment planning...

############################
INFO: You are using the old nnU-Net default planner. We have updated our recommendations. Please consider using those instead! Read more here: https://github.com/MIC-DKFZ/nnUNet/blob/master/documentation/resenc_presets.md
############################

2D U-Net configuration:
{'data_identifier': 'nnUNetPlans_2d', 'preprocessor_name': 'DefaultPreprocessor', 'batch_size': 14, 'patch_size': (np.int64(448), np.int64(512)), 'median_image_size_in_voxels': array([416., 512.]), 'spacing': array([1., 1.]), 'normalization_schemes': ['ZScoreNormalization'], 'use_mask_for_norm': [False], 'resampling_fn_data': 'resample_data_or_seg_to_shape', 'resampling_fn_seg': 'resample_data_or_seg_to_shape', 'resampling_fn_data_kwargs': {'is_seg': False, 'order': 3, 'order_z': 0, 'force_separate_z': None}, 'resampling_fn_seg_kwargs': {'is_seg': True, 'order': 1, 'order_z': 0, 'force_separate_z': None}, 'resampling_fn_probabilities': 'resample_data_or_seg_to_sh

100%|██████████| 838/838 [01:30<00:00,  9.26it/s]


Configuration: 3d_fullres...
INFO: Configuration 3d_fullres not found in plans file nnUNetPlans.json of dataset Dataset500_KSSD2025. Skipping.
Configuration: 3d_lowres...
INFO: Configuration 3d_lowres not found in plans file nnUNetPlans.json of dataset Dataset500_KSSD2025. Skipping.
✓ Preprocessing completed!

✓✓✓ ALL SETUP COMPLETED SUCCESSFULLY! ✓✓✓
nnU-Net data: /kaggle/working/nnUNet_raw/Dataset500_KSSD2025
MONAI data: /kaggle/working/MONAI_data
Total samples: 838 (.tif files)

Ready to train!
To start training (24-48 hours):
  nnunet_trainer.train_all_folds(n_folds=5)


# nnU-Net Training Wrapper

nnU-Net is a self-configuring framework that:
- Automatically determines preprocessing
- Automatically selects architecture
- Automatically tunes hyperparameters

This wrapper simplifies:
- Environment setup
- Training all 5 folds
- Running predictions
- Ensemble predictions

**nnU-Net Advantages:**
- Proven SOTA performance
- No manual hyperparameter tuning
- Robust to different datasets

In [45]:
import subprocess

class nnUNetTrainer:
    """
    Wrapper for nnU-Net v2 training
    
    nnU-Net is a self-configuring segmentation method that automatically:
    - Configures preprocessing
    - Selects network architecture  
    - Tunes hyperparameters
    
    We just need to provide the data and let it run!
    """
    
    def __init__(self, dataset_id=500, dataset_name="KSSD2025"):
        """
        Initialize nnU-Net trainer
        
        Args:
            dataset_id: Unique dataset identifier (default: 500)
            dataset_name: Dataset name (default: KSSD2025)
        """
        self.dataset_id = dataset_id
        self.dataset_name = dataset_name
        
        # Set nnU-Net environment variables
        self.setup_environment()
    
    def setup_environment(self):
        """
        Set up nnU-Net environment variables
        
        nnU-Net requires 3 paths:
        - nnUNet_raw: Raw dataset location
        - nnUNet_preprocessed: Where preprocessed data goes
        - nnUNet_results: Where trained models are saved
        """
        base_path = Path("data")
        
        os.environ['nnUNet_raw'] = str(base_path / "nnUNet_raw")
        os.environ['nnUNet_preprocessed'] = str(base_path / "nnUNet_preprocessed")
        os.environ['nnUNet_results'] = str(base_path / "nnUNet_results")
        
        print("✓ nnU-Net environment variables set:")
        print(f"  nnUNet_raw: {os.environ['nnUNet_raw']}")
        print(f"  nnUNet_preprocessed: {os.environ['nnUNet_preprocessed']}")
        print(f"  nnUNet_results: {os.environ['nnUNet_results']}")
    
    def plan_and_preprocess(self):
        """
        Run nnU-Net preprocessing pipeline
        
        This step:
        - Analyzes dataset properties
        - Determines optimal preprocessing
        - Configures network architecture
        - Prepares data for training
        
        **Run this ONCE before training**
        **Time:** ~10-20 minutes
        """
        print("\n=== nnU-Net Planning and Preprocessing ===")
        
        cmd = [
            "nnUNetv2_plan_and_preprocess",
            "-d", str(self.dataset_id),
            "--verify_dataset_integrity"
        ]
        
        print(f"Running: {' '.join(cmd)}")
        subprocess.run(cmd, check=True)
        print("✓ Preprocessing completed!")
    
    def train_fold(self, fold=0, configuration="2d", trainer="nnUNetTrainer"):
        """
        Train a single fold
        
        Args:
            fold: Fold number (0-4)
            configuration: '2d' or '3d' (use 2d for our dataset)
            trainer: nnU-Net trainer variant (default is fine)
            
        **Time per fold:** ~4-6 hours on RTX 3060
        """
        print(f"\n=== Training nnU-Net Fold {fold} ===")
        
        cmd = [
            "nnUNetv2_train",
            str(self.dataset_id),
            configuration,
            str(fold),
            "-tr", trainer,
            "--npz"  # Save probability maps
        ]
        
        print(f"Running: {' '.join(cmd)}")
        subprocess.run(cmd, check=True)
        print(f"✓ Fold {fold} training completed!")
    
    def train_all_folds(self, n_folds=5, configuration="2d"):
        """
        Train all 5 folds sequentially
        
        Args:
            n_folds: Number of folds (default: 5)
            configuration: 2d or 3d
            
        **Total time:** ~20-30 hours (let it run overnight)
        """
        print(f"\n{'='*60}")
        print(f"Training nnU-Net - All {n_folds} Folds")
        print(f"{'='*60}\n")
        
        for fold in range(n_folds):
            self.train_fold(fold=fold, configuration=configuration)
        
        print(f"\n{'='*60}")
        print("✓ All folds training completed!")
        print(f"{'='*60}\n")
    
    def predict(self, input_folder, output_folder, fold=0, configuration="2d"):
        """
        Run prediction on test data using single fold
        
        Args:
            input_folder: Folder with test images
            output_folder: Where to save predictions
            fold: Which fold's model to use
            configuration: 2d or 3d
        """
        print(f"\n=== Running Prediction (Fold {fold}) ===")
        
        cmd = [
            "nnUNetv2_predict",
            "-i", str(input_folder),
            "-o", str(output_folder),
            "-d", str(self.dataset_id),
            "-c", configuration,
            "-f", str(fold),
            "--save_probabilities"
        ]
        
        print(f"Running: {' '.join(cmd)}")
        subprocess.run(cmd, check=True)
        print("✓ Prediction completed!")
    
    def predict_ensemble(self, input_folder, output_folder, 
                        n_folds=5, configuration="2d"):
        """
        Run ensemble prediction across all folds
        
        nnU-Net automatically averages predictions from all folds
        
        Args:
            input_folder: Folder with test images
            output_folder: Where to save predictions
            n_folds: Number of folds to ensemble
            configuration: 2d or 3d
        """
        print(f"\n=== Running nnU-Net Ensemble Prediction ===")
        
        fold_str = " ".join([str(i) for i in range(n_folds)])
        
        cmd = [
            "nnUNetv2_predict",
            "-i", str(input_folder),
            "-o", str(output_folder),
            "-d", str(self.dataset_id),
            "-c", configuration,
            "-f", fold_str,
            "--save_probabilities"
        ]
        
        print(f"Running: {' '.join(cmd)}")
        subprocess.run(cmd, check=True)
        print("✓ Ensemble prediction completed!")

print("✓ nnUNetTrainer class defined")

✓ nnUNetTrainer class defined



from pathlib import Path

print("=== DEBUGGING DATASET LOCATION ===\n")

# Check common Kaggle paths
paths_to_check = [
    "/kaggle/input",
    "/kaggle/input/kssd2025-kidney-stone-segmentation-dataset",
    "/kaggle/input/kssd2025-kidney-stone-segmentation-dataset/data",
]

for path_str in paths_to_check:
    path = Path(path_str)
    print(f"\n{'='*60}")
    print(f"Checking: {path}")
    print(f"Exists: {path.exists()}")
    
    if path.exists():
        items = list(path.iterdir())
        print(f"Number of items: {len(items)}")
        print(f"\nContents (first 20):")
        
        for item in sorted(items)[:20]:
            if item.is_dir():
                subitem_count = len(list(item.iterdir()))
                print(f"  📁 {item.name}/ ({subitem_count} items)")
                
                # Show subdirectory contents
                if subitem_count < 50:
                    for subitem in sorted(item.iterdir())[:10]:
                        print(f"     - {subitem.name}")
            else:
                size_mb = item.stat().st_size / (1024*1024)
                print(f"  📄 {item.name} ({size_mb:.2f} MB)")
    print(f"{'='*60}")

# Also check what was created
print("\n\n=== CHECKING OUTPUT ===")
output_path = Path("/kaggle/working/nnUNet_raw")
if output_path.exists():
    print(f"\n{output_path} exists!")
    for item in output_path.rglob("*"):
        if item.is_file():
            print(f"  {item}")
else:
    print(f"\n{output_path} does NOT exist")

# Train nnU-Net Model

**⚠️ WARNING: This cell takes 24-48 hours to complete**

Steps:
1. Plan and preprocess (~15 min)
2. Train fold 0 (~4-6 hours)
3. Train fold 1 (~4-6 hours)
4. Train fold 2 (~4-6 hours)
5. Train fold 3 (~4-6 hours)
6. Train fold 4 (~4-6 hours)

**Recommendations:**
- Run overnight or over weekend
- Monitor GPU temperature
- Use `nohup` or `tmux` for stability
- Check results in: `data/nnUNet_results/`

**You can skip this and use pretrained weights if available**

In [46]:
# Initialize nnU-Net trainer
nnunet_trainer = nnUNetTrainer(dataset_id=500, dataset_name="KSSD2025")

# Step 1: Plan and preprocess (run once)
nnunet_trainer.plan_and_preprocess()

# Step 2: Train all 5 folds
# ⚠️ This takes 24-48 hours!
nnunet_trainer.train_all_folds(n_folds=5, configuration="2d")

print("\n" + "="*60)
print("✓ nnU-Net training completed!")
print("  Check results in: data/nnUNet_results/")
print("="*60)

✓ nnU-Net environment variables set:
  nnUNet_raw: data/nnUNet_raw
  nnUNet_preprocessed: data/nnUNet_preprocessed
  nnUNet_results: data/nnUNet_results

=== nnU-Net Planning and Preprocessing ===
Running: nnUNetv2_plan_and_preprocess -d 500 --verify_dataset_integrity
Fingerprint extraction...
Dataset500_KSSD2025


Traceback (most recent call last):
  File "/usr/local/bin/nnUNetv2_plan_and_preprocess", line 8, in <module>
    sys.exit(plan_and_preprocess_entry())
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/nnunetv2/experiment_planning/plan_and_preprocess_entrypoints.py", line 180, in plan_and_preprocess_entry
    extract_fingerprints(args.d, args.fpe, args.npfp, args.verify_dataset_integrity, args.clean, args.verbose)
  File "/usr/local/lib/python3.12/dist-packages/nnunetv2/experiment_planning/plan_and_preprocess_api.py", line 47, in extract_fingerprints
    extract_fingerprint_dataset(d, fingerprint_extractor_class, num_processes, check_dataset_integrity, clean,
  File "/usr/local/lib/python3.12/dist-packages/nnunetv2/experiment_planning/plan_and_preprocess_api.py", line 30, in extract_fingerprint_dataset
    verify_dataset_integrity(join(nnUNet_raw, dataset_name), num_processes)
  File "/usr/local/lib/python3.12/dist-packages/nnunetv2/experiment_plan

CalledProcessError: Command '['nnUNetv2_plan_and_preprocess', '-d', '500', '--verify_dataset_integrity']' returned non-zero exit status 1.