# Install Required Packages

This cell installs all necessary libraries for the ensemble approach:
- PyTorch for deep learning
- MONAI for medical imaging (Swin-UNETR, MedNeXt)
- nnU-Net v2 for automatic segmentation
- Standard ML libraries (numpy, opencv, sklearn, etc.)

**Note:** Adjust CUDA version based on your GPU
**Estimated time:** 5-10 minutes

In [3]:
# Install PyTorch (CUDA 11.8 - adjust for your GPU)
!pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118

# Install MONAI with all dependencies
!pip install "monai[all]==1.3.0"

# Install nnU-Net v2
!pip install nnunetv2

# Install other dependencies
!pip install opencv-python scikit-learn pandas matplotlib seaborn tqdm
!pip install SimpleITK nibabel pydicom albumentations

Looking in indexes: https://download.pytorch.org/whl/cu118
Collecting monai==1.3.0 (from monai[all]==1.3.0)
  Downloading monai-1.3.0-202310121228-py3-none-any.whl.metadata (10 kB)
Collecting clearml>=1.10.0rc0 (from monai[all]==1.3.0)
  Downloading clearml-2.1.3-py2.py3-none-any.whl.metadata (17 kB)
Collecting cucim>=23.2.0 (from monai[all]==1.3.0)
  Downloading cucim-23.10.0-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl.metadata (43 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m43.1/43.1 kB[0m [31m2.2 MB/s[0m eta [36m0:00:00[0m
Collecting fire (from monai[all]==1.3.0)
  Downloading fire-0.7.1-py3-none-any.whl.metadata (5.8 kB)
Collecting imagecodecs (from monai[all]==1.3.0)
  Downloading imagecodecs-2026.1.14-cp311-abi3-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl.metadata (20 kB)
Collecting itk>=5.2 (from monai[all]==1.3.0)
  Downloading itk-5.4.5-cp311-abi3-manylinux_2_28_x86_64.whl.metadata (22 kB)
Collecting lmdb (from monai[all]==1.3.0)
  

Import Libraries

In [10]:
!pip install monai


Collecting monai
  Downloading monai-1.5.2-py3-none-any.whl.metadata (13 kB)
Downloading monai-1.5.2-py3-none-any.whl (2.7 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m2.7/2.7 MB[0m [31m44.4 MB/s[0m eta [36m0:00:00[0ma [36m0:00:01[0m
[?25hInstalling collected packages: monai
Successfully installed monai-1.5.2


# Import Required Libraries

Import all necessary libraries for:
- Deep learning (PyTorch, MONAI)
- Data processing (numpy, opencv)
- Visualization (matplotlib, seaborn)
- File handling (pathlib, json)

**No errors should appear if installation was successful**

In [11]:
# Standard libraries
import os
import sys
import json
import warnings
from pathlib import Path
from tqdm import tqdm
import shutil

# Data processing
import numpy as np
import pandas as pd
import cv2
from sklearn.model_selection import KFold

# Deep learning
import torch
import torch.nn as nn
from torch.utils.data import Dataset, DataLoader
from torch.optim import Adam
from torch.optim.lr_scheduler import CosineAnnealingLR

# MONAI
from monai.networks.nets import SwinUNETR, UNet
from monai.losses import DiceLoss
from monai.metrics import DiceMetric
from monai.transforms import (
    Compose, RandRotate, RandFlip, RandZoom
)

# Visualization
import matplotlib.pyplot as plt
import seaborn as sns

# Suppress warnings
warnings.filterwarnings('ignore')

print("✓ All libraries imported successfully!")
print(f"PyTorch version: {torch.__version__}")
print(f"CUDA available: {torch.cuda.is_available()}")
if torch.cuda.is_available():
    print(f"GPU: {torch.cuda.get_device_name(0)}")

2026-02-07 19:41:18.677086: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:467] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
E0000 00:00:1770493278.838241      55 cuda_dnn.cc:8579] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
E0000 00:00:1770493278.883908      55 cuda_blas.cc:1407] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
W0000 00:00:1770493279.291445      55 computation_placer.cc:177] computation placer already registered. Please check linkage and avoid linking the same target more than once.
W0000 00:00:1770493279.291494      55 computation_placer.cc:177] computation placer already registered. Please check linkage and avoid linking the same target more than once.
W0000 00:00:1770493279.291496      55 computation_placer.cc:177] computation placer alr

✓ All libraries imported successfully!
PyTorch version: 2.8.0+cu126
CUDA available: True
GPU: Tesla T4


Set Random Seeds

# Set Random Seeds for Reproducibility

Ensures reproducible results across multiple runs by setting:
- Python random seed
- NumPy random seed  
- PyTorch random seed (CPU & GPU)

**Important:** Same seeds = same results

In [13]:
def set_seed(seed=42):
    """Set random seeds for reproducibility"""
    import random
    random.seed(seed)
    np.random.seed(seed)
    torch.manual_seed(seed)
    torch.cuda.manual_seed_all(seed)
    torch.backends.cudnn.deterministic = True
    torch.backends.cudnn.benchmark = False
    
    print(f"✓ Random seed set to {seed}")

set_seed(42)

✓ Random seed set to 42


# Create Project Directory Structure

Creates all necessary folders for:
- Raw data (KSSD2025 images and masks)
- Preprocessed data (nnU-Net and MONAI formats)
- Model checkpoints (saved weights)
- Results (predictions, visualizations)

**Run this before any training**

In [14]:
def create_directories():
    """Create all necessary directories"""
    
    directories = [
        # Data directories
        "data/KSSD2025/images",
        "data/KSSD2025/masks",
        "data/nnUNet_raw",
        "data/nnUNet_preprocessed",
        "data/nnUNet_results",
        "data/MONAI_data",
        
        # Results directories
        "results/nnunet",
        "results/swin_unetr",
        "results/mednext",
        "results/ensemble",
        "results/visualizations",
        
        # Model checkpoints
        "checkpoints/nnunet",
        "checkpoints/swin_unetr",
        "checkpoints/mednext",
    ]
    
    for directory in directories:
        Path(directory).mkdir(parents=True, exist_ok=True)
    
    print("✓ Directory structure created successfully!")
    print(f"  Total directories: {len(directories)}")

create_directories()

✓ Directory structure created successfully!
  Total directories: 14


In [19]:
import os

BASE_PATH = "/kaggle/input/kssd2025-kidney-stone-segmentation-dataset"

for root, dirs, files in os.walk(BASE_PATH):
    print(root)
    print("  Dirs:", dirs)
    print("  Files:", files[:5])
    break


/kaggle/input/kssd2025-kidney-stone-segmentation-dataset
  Dirs: ['data']
  Files: []


# KSSD2025 Dataset Preprocessor

This class handles:
- Creating 5-fold cross-validation splits
- Converting data to nnU-Net format (for nnU-Net model)
- Converting data to MONAI format (for Swin-UNETR and MedNeXt)
- Maintaining consistent splits across all models

**Key Features:**
- Automatic fold splitting
- Format conversion for different frameworks
- Preserves data integrity

In [24]:
import os

DATA_PATH = "/kaggle/input/kssd2025-kidney-stone-segmentation-dataset/data"

print("Folders in data/:", os.listdir(DATA_PATH))

print("\nChecking images folder:")
images_path = os.path.join(DATA_PATH, "images")
if os.path.exists(images_path):
    print("Found images folder with files:", os.listdir(images_path)[:5])
else:
    print("❌ No images folder found!")

print("\nChecking masks folder:")
masks_path = os.path.join(DATA_PATH, "masks")
if os.path.exists(masks_path):
    print("Found masks folder with files:", os.listdir(masks_path)[:5])
else:
    print("❌ No masks folder found!")


Folders in data/: ['label', 'image']

Checking images folder:
❌ No images folder found!

Checking masks folder:
❌ No masks folder found!


In [25]:
# =========================================================
# Imports
# =========================================================
import json
import shutil
import warnings
from pathlib import Path
from tqdm import tqdm
import cv2
import numpy as np
from sklearn.model_selection import KFold

warnings.filterwarnings("ignore")

# =========================================================
# Preprocessor Class
# =========================================================
class KSSD2025Preprocessor:
    """
    Preprocessor for KSSD2025 dataset (Kaggle version)
    Converts data for nnU-Net, Swin-UNETR, and MONAI
    """

    def __init__(self, raw_data_path, output_base_path, n_folds=5):
        self.raw_data_path = Path(raw_data_path)
        self.output_base_path = Path(output_base_path)
        self.n_folds = n_folds

        self.nnunet_path = self.output_base_path / "nnUNet_raw"
        self.monai_path = self.output_base_path / "MONAI_data"

        self.nnunet_path.mkdir(parents=True, exist_ok=True)
        self.monai_path.mkdir(parents=True, exist_ok=True)

        print("✓ Preprocessor initialized")
        print(f"  Raw data: {self.raw_data_path}")
        print(f"  Output: {self.output_base_path}")
        print(f"  Folds: {self.n_folds}")

    # K-FOLD SPLIT
    def create_fold_splits(self, image_list, seed=42):
        if len(image_list) < self.n_folds:
            raise ValueError(
                f"❌ Number of images ({len(image_list)}) "
                f"is smaller than n_folds ({self.n_folds})"
            )

        kf = KFold(n_splits=self.n_folds, shuffle=True, random_state=seed)
        splits = []

        for fold, (train_idx, val_idx) in enumerate(kf.split(image_list)):
            splits.append({
                "fold": fold,
                "train": [image_list[i] for i in train_idx],
                "val": [image_list[i] for i in val_idx]
            })
            print(
                f"  Fold {fold}: {len(train_idx)} train | {len(val_idx)} val"
            )

        return splits

    # nnU-Net FORMAT
    def prepare_nnunet_format(self, dataset_id=500, dataset_name="KSSD2025"):
        print("\n=== Preparing nnU-Net Format ===")

        images_dir = self.raw_data_path / "image"  # NOTE: changed from 'images'
        masks_dir = self.raw_data_path / "label"   # NOTE: changed from 'masks'

        images = sorted(list(images_dir.glob("*.jpg")) + list(images_dir.glob("*.png")))
        masks = sorted(list(masks_dir.glob("*.jpg")) + list(masks_dir.glob("*.png")))

        print(f"Found: {len(images)} images, {len(masks)} masks")

        if len(images) == 0 or len(masks) == 0:
            raise ValueError("❌ Dataset is EMPTY for nnU-Net")

        dataset_folder = self.nnunet_path / f"Dataset{dataset_id:03d}_{dataset_name}"
        imagesTr = dataset_folder / "imagesTr"
        labelsTr = dataset_folder / "labelsTr"

        imagesTr.mkdir(parents=True, exist_ok=True)
        labelsTr.mkdir(parents=True, exist_ok=True)

        for idx, (img_path, mask_path) in enumerate(
            tqdm(zip(images, masks), total=len(images), desc="Converting")
        ):
            case_id = f"{dataset_name}_{idx:04d}"

            img = cv2.imread(str(img_path), cv2.IMREAD_GRAYSCALE)
            mask = cv2.imread(str(mask_path), cv2.IMREAD_GRAYSCALE)
            mask = (mask > 127).astype(np.uint8)

            np.save(imagesTr / f"{case_id}_0000.npy", img)
            np.save(labelsTr / f"{case_id}.npy", mask)

        dataset_json = {
            "channel_names": {"0": "CT"},
            "labels": {"background": 0, "kidney_stone": 1},
            "numTraining": len(images),
            "file_ending": ".npy",
            "name": dataset_name
        }

        with open(dataset_folder / "dataset.json", "w") as f:
            json.dump(dataset_json, f, indent=4)

        print(f"✓ nnU-Net data prepared at: {dataset_folder}")
        return dataset_folder

    # MONAI FORMAT
    def prepare_monai_format(self):
        print("\n=== Preparing MONAI Format ===")

        images_dir = self.raw_data_path / "image"  # NOTE: changed from 'images'
        masks_dir = self.raw_data_path / "label"   # NOTE: changed from 'masks'

        images = sorted(list(images_dir.glob("*.jpg")) + list(images_dir.glob("*.png")))
        masks = sorted(list(masks_dir.glob("*.jpg")) + list(masks_dir.glob("*.png")))

        print(f"Found {len(images)} images, {len(masks)} masks")

        if len(images) == 0 or len(masks) == 0:
            raise ValueError("❌ Dataset is EMPTY for MONAI")

        assert len(images) == len(masks), "❌ Images–Masks mismatch"

        image_names = [img.stem for img in images]
        fold_splits = self.create_fold_splits(image_names)

        for fold_info in fold_splits:
            fold = fold_info["fold"]
            fold_dir = self.monai_path / f"fold_{fold}"

            (fold_dir / "images/train").mkdir(parents=True, exist_ok=True)
            (fold_dir / "images/val").mkdir(parents=True, exist_ok=True)
            (fold_dir / "masks/train").mkdir(parents=True, exist_ok=True)
            (fold_dir / "masks/val").mkdir(parents=True, exist_ok=True)

            for name in fold_info["train"]:
                for ext in ["jpg", "png"]:
                    if (images_dir / f"{name}.{ext}").exists():
                        shutil.copy(images_dir / f"{name}.{ext}", fold_dir / "images/train")
                        shutil.copy(masks_dir / f"{name}.{ext}", fold_dir / "masks/train")
                        break

            for name in fold_info["val"]:
                for ext in ["jpg", "png"]:
                    if (images_dir / f"{name}.{ext}").exists():
                        shutil.copy(images_dir / f"{name}.{ext}", fold_dir / "images/val")
                        shutil.copy(masks_dir / f"{name}.{ext}", fold_dir / "masks/val")
                        break

        with open(self.monai_path / "fold_splits.json", "w") as f:
            json.dump(fold_splits, f, indent=4)

        print(f"✓ MONAI data prepared at: {self.monai_path}")
        return self.monai_path

print("✓ KSSD2025Preprocessor class defined")

# =========================================================
# RUN PREPROCESSING
# =========================================================
preprocessor = KSSD2025Preprocessor(
    raw_data_path="/kaggle/input/kssd2025-kidney-stone-segmentation-dataset/data",
    output_base_path="/kaggle/working",
    n_folds=5
)

nnunet_folder = preprocessor.prepare_nnunet_format(
    dataset_id=500,
    dataset_name="KSSD2025"
)

monai_folder = preprocessor.prepare_monai_format()

print("\n" + "="*60)
print("✓ Data preprocessing completed!")
print("="*60)
print(f"nnU-Net data: {nnunet_folder}")
print(f"MONAI data: {monai_folder}")
print("="*60)


✓ KSSD2025Preprocessor class defined
✓ Preprocessor initialized
  Raw data: /kaggle/input/kssd2025-kidney-stone-segmentation-dataset/data
  Output: /kaggle/working
  Folds: 5

=== Preparing nnU-Net Format ===
Found: 0 images, 0 masks


ValueError: ❌ Dataset is EMPTY for nnU-Net

# Execute Data Preprocessing

**IMPORTANT:** Update `raw_data_path` to your actual KSSD2025 location

This cell:
1. Initializes the preprocessor
2. Creates nnU-Net format (for Model 1)
3. Creates MONAI format with 5-fold splits (for Models 2 & 3)

**Estimated time:** 2-5 minutes for 838 images
**Run only once** - results are saved to disk

In [21]:
# Initialize preprocessor
# ⚠️ CHANGE THIS PATH to your KSSD2025 location
preprocessor = KSSD2025Preprocessor(
    raw_data_path="data/KSSD2025",  # Your KSSD2025 folder
    output_base_path="data",
    n_folds=5
)

# Prepare for nnU-Net
nnunet_folder = preprocessor.prepare_nnunet_format(
    dataset_id=500, 
    dataset_name="KSSD2025"
)

# Prepare for MONAI (Swin-UNETR and MedNeXt)
monai_folder = preprocessor.prepare_monai_format()

print("\n" + "="*60)
print("✓ Data preprocessing completed!")
print("="*60)
print(f"nnU-Net data: {nnunet_folder}")
print(f"MONAI data: {monai_folder}")
print("="*60)

✓ Preprocessor initialized
  Raw data: data/KSSD2025
  Output: data
  Folds: 5

=== Preparing nnU-Net Format ===
Found: 0 images, 0 masks


ValueError: ❌ Dataset is EMPTY for nnU-Net