Cell 1 — Install nnU-Net v2 + utilities

What this does: installs nnU-Net v2 and a few Python helpers we’ll use (json, nibabel etc.).
Why: reproducible environment.

In [1]:
!pip -q install nnunetv2 nibabel SimpleITK acvl-utils


[?25l     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/211.5 kB[0m [31m?[0m eta [36m-:--:--[0m[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m211.5/211.5 kB[0m [31m16.6 MB/s[0m eta [36m0:00:00[0m
[?25h  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
  Preparing metadata (pyproject.toml) ... [?25l[?25hdone
  Preparing metadata (setup.py) ... [?25l[?25hdone
  Preparing metadata (setup.py) ... [?25l[?25hdone
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m77.0/77.0 kB[0m [31m7.1 MB/s[0m eta [36m0:00:00[0m
[?25h  Preparing metadata (setup.py) ... [?25l[?25hdone
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m44.8/44.8 kB[0m [31m3.8 MB/s[0m eta [36m0:00:00[0m
[?25h  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
  Preparing metadata (pyproject.toml) ... [?25l[?25hdone
[2K 

In [2]:
import nnunetv2
print("nnUNet v2 import OK:", nnunetv2.__version__ if hasattr(nnunetv2, "__version__") else "version unknown")


nnUNet v2 import OK: version unknown


 Install Kaggle CLI

In [3]:
!pip -q install kaggle


In [4]:
from google.colab import files
files.upload()  # upload kaggle.json


Saving kaggle.json to kaggle.json


{'kaggle.json': b'{"username":"hamzashahid8587","key":"edc8896c03619fd21d5500e490fad82a"}'}

In [5]:
import os
from pathlib import Path

kaggle_dir = Path.home() / ".kaggle"
kaggle_dir.mkdir(parents=True, exist_ok=True)

# Move uploaded kaggle.json
src = Path("/content/kaggle.json")
dst = kaggle_dir / "kaggle.json"
assert src.exists(), "kaggle.json not found in /content. Upload it first."
dst.write_bytes(src.read_bytes())

# Fix permissions
os.chmod(dst, 0o600)

print("✅ Kaggle API token installed at:", dst)


✅ Kaggle API token installed at: /root/.kaggle/kaggle.json


Download the dataset ZIP from Kaggle

What we’re doing: download your dataset by Kaggle slug.

In [6]:
import subprocess
from pathlib import Path

kaggle_slug = "murillobouzon/kssd2025-kidney-stone-segmentation-dataset"
download_dir = Path("/content/downloads")
download_dir.mkdir(parents=True, exist_ok=True)

subprocess.run(
    ["kaggle", "datasets", "download", "-d", kaggle_slug, "-p", str(download_dir)],
    check=True
)

print("✅ Download complete. Files:", list(download_dir.iterdir()))


✅ Download complete. Files: [PosixPath('/content/downloads/kssd2025-kidney-stone-segmentation-dataset.zip')]


Unzip the dataset

In [7]:
import zipfile
from pathlib import Path

download_dir = Path("/content/downloads")
zips = sorted(download_dir.glob("*.zip"))
assert len(zips) >= 1, "No zip found in /content/downloads"

zip_path = zips[0]
extract_dir = download_dir / "extracted"
extract_dir.mkdir(parents=True, exist_ok=True)

with zipfile.ZipFile(zip_path, "r") as z:
    z.extractall(extract_dir)

print("✅ Extracted to:", extract_dir)


✅ Extracted to: /content/downloads/extracted


Inspect what Kaggle gave you (so mapping is correct)

What we’re doing: list folder tree + find all NIfTI files.
Why: Kaggle datasets often have different folder names (images/masks/etc.). We’ll map automatically once we see patterns.

In [8]:
from pathlib import Path

extract_dir = Path("/content/downloads/extracted")

# show a shallow tree
def list_tree(root: Path, depth=4): # Increased depth
    root = root.resolve()
    for p in sorted(root.rglob("*")):
        rel = p.relative_to(root)
        if len(rel.parts) <= depth:
            item_type = " (file)" if p.is_file() else " (dir)"
            print(f"{'  ' * (len(rel.parts) - 1)}{rel}{item_type}")

print("=== Folder tree (depth 4, with file types) ===")
list_tree(extract_dir, depth=4)

nii = sorted(list((extract_dir / "data").rglob("*.nii")) + list((extract_dir / "data").rglob("*.nii.gz")))
print("\nFound NIfTI files:", len(nii))
print("Examples:")
for p in nii[:20]:
    print(" -", p)

=== Folder tree (depth 4, with file types) ===
data (dir)
  data/image (dir)
    data/image/1.tif (file)
    data/image/10.tif (file)
    data/image/1000.tif (file)
    data/image/1001.tif (file)
    data/image/1002.tif (file)
    data/image/1003.tif (file)
    data/image/1012.tif (file)
    data/image/1013.tif (file)
    data/image/1014.tif (file)
    data/image/1015.tif (file)
    data/image/1020.tif (file)
    data/image/1021.tif (file)
    data/image/1022.tif (file)
    data/image/1023.tif (file)
    data/image/1024.tif (file)
    data/image/1025.tif (file)
    data/image/1026.tif (file)
    data/image/1027.tif (file)
    data/image/1028.tif (file)
    data/image/1029.tif (file)
    data/image/1030.tif (file)
    data/image/1031.tif (file)
    data/image/1032.tif (file)
    data/image/1033.tif (file)
    data/image/1034.tif (file)
    data/image/1035.tif (file)
    data/image/1036.tif (file)
    data/image/1037.tif (file)
    data/image/1038.tif (file)
    data/image/1039.tif (file

Build nnU-Net raw dataset folder (Dataset501_KSSD)

This cell is robust: it tries to automatically detect which files are images vs labels by folder keywords like mask, label, gt, etc.

What we’re doing: create:

/content/nnUNet/nnUNet_raw/Dataset501_KSSD/imagesTr

/content/nnUNet/nnUNet_raw/Dataset501_KSSD/labelsTr

and symlink/copy files into nnU-Net naming scheme.

In [27]:
import os, re, shutil
from pathlib import Path
import tifffile  # Import tifffile
import nibabel as nib
import numpy as np # For image data manipulation

# Install tifffile if not already installed (this is a common practice in Colab)
try:
    import tifffile
except ImportError:
    print("tifffile not found, installing...")
    !pip install -q tifffile
    import tifffile

# nnU-Net raw root
nnunet_raw = Path("/content/nnUNet/nnUNet_raw")
ds_name = "Dataset501_KSSD"
target = nnunet_raw / ds_name
imagesTr = target / "imagesTr"
labelsTr = target / "labelsTr"
imagesTr.mkdir(parents=True, exist_ok=True)
labelsTr.mkdir(parents=True, exist_ok=True)

extracted_data_dir = Path("/content/downloads/extracted/data") # Point to the 'data' folder

# Create a temporary directory for converted NIfTI files
converted_nii_dir = extracted_data_dir.parent / "converted_nii"
converted_nii_dir.mkdir(parents=True, exist_ok=True)
converted_images_dir = converted_nii_dir / "image"
converted_labels_dir = converted_nii_dir / "label"
converted_images_dir.mkdir(parents=True, exist_ok=True)
converted_labels_dir.mkdir(parents=True, exist_ok=True)

# Find all .tif files
all_tif_images = sorted(list(extracted_data_dir.rglob("image/*.tif")))
all_tif_labels = sorted(list(extracted_data_dir.rglob("label/*.tif")))

assert all_tif_images, "No .tif images found. Check extraction path."
assert all_tif_labels, "No .tif labels found. Check extraction path."

# Convert .tif files to .nii.gz
print(f"Converting {len(all_tif_images)} images from .tif to .nii.gz...")
for tif_path in all_tif_images:
    img_data = tifffile.imread(tif_path)
    # Ensure 3D for nibabel by adding a singleton Z dimension at the end
    if img_data.ndim == 2:
        img_data = img_data[:, :, np.newaxis] # Change from [np.newaxis, :, :] to [:, :, np.newaxis]
    # Assuming the data type is suitable for NIfTI, otherwise convert
    if img_data.dtype != np.float32 and img_data.dtype != np.int16: # common types
        img_data = img_data.astype(np.float32) # Or choose a more appropriate type

    nii_img = nib.Nifti1Image(img_data, affine=np.eye(4)) # identity affine for now, could be improved if metadata available
    nib.save(nii_img, converted_images_dir / f"{tif_path.stem}.nii.gz")

print(f"Converting {len(all_tif_labels)} labels from .tif to .nii.gz...")
for tif_path in all_tif_labels:
    lbl_data = tifffile.imread(tif_path)
    if lbl_data.ndim == 2:
        lbl_data = lbl_data[:, :, np.newaxis] # Change from [np.newaxis, :, :] to [:, :, np.newaxis]
    # Labels usually are integer types, for binary segmentation, binarize to 0 and 1
    lbl_data = (lbl_data > 0).astype(np.uint8) # Convert all non-zero values to 1, keep 0 as 0

    nii_lbl = nib.Nifti1Image(lbl_data, affine=np.eye(4))
    nib.save(nii_lbl, converted_labels_dir / f"{tif_path.stem}.nii.gz")

print("Conversion complete.")

# Now, nnU-Net dataset creation logic using the converted NIfTI files
# We will search within the 'converted_nii_dir' for the .nii.gz files
all_nii_converted = sorted(list(converted_nii_dir.rglob("*.nii.gz")))
assert all_nii_converted, "No .nii.gz found after conversion. Something went wrong."

# Heuristic: labels usually live in folders containing these words
LABEL_HINTS = ("label", "labels", "mask", "masks", "gt", "groundtruth", "seg", "segment", "annotation", "annotations")

def is_label_path(p: Path) -> bool:
    low = str(p).lower()
    return any(h in low for h in LABEL_HINTS)

# split candidates using the converted files
label_files = [p for p in all_nii_converted if is_label_path(p)]
image_files = [p for p in all_nii_converted if p not in label_files]

print("Candidate images (converted NIfTI):", len(image_files))
print("Candidate labels (converted NIfTI):", len(label_files))

# Helper: try to derive a case id from filename (remove extensions)
def stem_niigz(p: Path) -> str:
    name = p.name
    if name.endswith(".nii.gz"):
        return name[:-7]
    if name.endswith(".nii"): # Although we are converting to .nii.gz, keep for robustness
        return name[:-4]
    return p.stem # Fallback, though should not be needed here

# Build dictionaries by “case key”
img_map = {}
for p in image_files:
    s = stem_niigz(p)
    if s.endswith("_0000"):
        key = s[:-5]
    else:
        key = s
    img_map.setdefault(key, []).append(p)

lbl_map = {}
for p in label_files:
    key = stem_niigz(p)
    if key.endswith("_0000"):
        key = key[:-5]
    lbl_map.setdefault(key, []).append(p)

# Pairing by intersection
keys = sorted(set(img_map.keys()) & set(lbl_map.keys()))
print("Paired cases:", len(keys))

assert len(keys) > 0, (
    "Could not auto-pair images and labels after conversion.\n"
    "This usually means folder naming doesn't include mask/label hints or files are not paired correctly.\n"
    "Ensure image and label filenames match (e.g., '1.tif' for image and '1.tif' for label, or 'case1.tif' for image and 'case1_label.tif' for label)."
)

# Choose one file per key (if multiple, we pick the first; if you have multi-modality, tell me)
def pick_one(m):
    # prefer .nii.gz, which all converted files are now
    m = sorted(m, key=lambda p: (0 if p.name.endswith(".nii.gz") else 1, str(p)))
    return m[0]

use_symlinks = True  # set False to copy instead

def place(src: Path, dst: Path):
    if dst.exists() or dst.is_symlink():
        dst.unlink()
    dst.parent.mkdir(parents=True, exist_ok=True)
    if use_symlinks:
        os.symlink(src, dst)
    else:
        shutil.copy2(src, dst)

# Write into nnU-Net format
for key in keys:
    img_src = pick_one(img_map[key])
    lbl_src = pick_one(lbl_map[key])

    img_dst = imagesTr / f"{key}_0000.nii.gz"
    lbl_dst = labelsTr / f"{key}.nii.gz"

    place(img_src, img_dst)
    place(lbl_src, lbl_dst)

print("✅ Written nnU-Net raw structure at:", target)
print("imagesTr count:", len(list(imagesTr.glob('*.nii.gz'))))
print("labelsTr count:", len(list(labelsTr.glob('*.nii.gz'))))

Converting 838 images from .tif to .nii.gz...
Converting 838 labels from .tif to .nii.gz...
Conversion complete.
Candidate images (converted NIfTI): 838
Candidate labels (converted NIfTI): 838
Paired cases: 838
✅ Written nnU-Net raw structure at: /content/nnUNet/nnUNet_raw/Dataset501_KSSD
imagesTr count: 838
labelsTr count: 838


In [10]:
from pathlib import Path

target = Path("/content/nnUNet/nnUNet_raw/Dataset501_KSSD")
imagesTr = target / "imagesTr"
labelsTr = target / "labelsTr"

imgs = sorted(imagesTr.glob("*.nii.gz"))
lbls = sorted(labelsTr.glob("*.nii.gz"))

img_keys = {p.name.replace("_0000.nii.gz", "") for p in imgs if p.name.endswith("_0000.nii.gz")}
lbl_keys = {p.name.replace(".nii.gz", "") for p in lbls}

missing_lbl = sorted(img_keys - lbl_keys)
missing_img = sorted(lbl_keys - img_keys)

print("Total images:", len(imgs))
print("Total labels:", len(lbls))
print("Missing labels for images:", len(missing_lbl))
print("Missing images for labels:", len(missing_img))

assert len(missing_lbl) == 0, f"Images without labels (examples): {missing_lbl[:10]}"
assert len(missing_img) == 0, f"Labels without images (examples): {missing_img[:10]}"

print("✅ Pairing is perfect.")


Total images: 838
Total labels: 838
Missing labels for images: 0
Missing images for labels: 0
✅ Pairing is perfect.


Set nnU-Net environment variables (again, just to be sure)

In [11]:
import os
from pathlib import Path

os.environ["nnUNet_raw"] = "/content/nnUNet/nnUNet_raw"
os.environ["nnUNet_preprocessed"] = "/content/nnUNet/nnUNet_preprocessed"
os.environ["nnUNet_results"] = "/content/nnUNet/nnUNet_results"

for k in ["nnUNet_raw","nnUNet_preprocessed","nnUNet_results"]:
    Path(os.environ[k]).mkdir(parents=True, exist_ok=True)
    print(k, "=", os.environ[k])


nnUNet_raw = /content/nnUNet/nnUNet_raw
nnUNet_preprocessed = /content/nnUNet/nnUNet_preprocessed
nnUNet_results = /content/nnUNet/nnUNet_results


Plan + preprocess (run once)

In [28]:
!nnUNetv2_plan_and_preprocess -d 501 --verify_dataset_integrity

Fingerprint extraction...
Dataset501_KSSD
Using <class 'nnunetv2.imageio.simpleitk_reader_writer.SimpleITKIO'> as reader/writer

####################
verify_dataset_integrity Done. 
If you didn't see any error messages then your dataset is most likely OK!
####################

Experiment planning...

############################
INFO: You are using the old nnU-Net default planner. We have updated our recommendations. Please consider using those instead! Read more here: https://github.com/MIC-DKFZ/nnUNet/blob/master/documentation/resenc_presets.md
############################

Dropping 3d_lowres config because the image size difference to 3d_fullres is too small. 3d_fullres: [512. 416.   1.], 3d_lowres: [512, 416, 1]
2D U-Net configuration:
{'data_identifier': 'nnUNetPlans_2d', 'preprocessor_name': 'DefaultPreprocessor', 'batch_size': 2535, 'patch_size': (np.int64(448), np.int64(1)), 'median_image_size_in_voxels': array([416.,   1.]), 'spacing': array([1., 1.]), 'normalization_schemes':

## Generate dataset.json

### Subtask:
Generate the `dataset.json` file for the nnU-Net dataset.


In [13]:
import json
from pathlib import Path
import os

dataset_name = "Dataset501_KSSD"
dataset_dir = Path(os.environ["nnUNet_raw"]) / dataset_name

dataset_json_content = {
    "name": "KidneyStoneSegmentationDataset",
    "description": "Kidney Stone Segmentation Dataset from Kaggle KSSD2025",
    "reference": "https://www.kaggle.com/datasets/murillobouzon/kssd2025-kidney-stone-segmentation-dataset",
    "licence": "Apache 2.0", # Assuming a permissive license, adjust if known
    "release": "1.0",
    "numTraining": len(list((dataset_dir / "imagesTr").glob("*.nii.gz"))),
    "numTest": 0, # No explicit test set provided, will be handled by nnUNet splits
    "channel_names": { "0": "CT" }, # Assuming single channel CT scans
    "labels": { "background": 0, "kidney_stone": 1 }, # Assuming 0 for background, 1 for kidney stone
    "file_ending": ".nii.gz"
}

with open(dataset_dir / "dataset.json", "w") as f:
    json.dump(dataset_json_content, f, indent=4)

print("✅ dataset.json generated:", dataset_dir / "dataset.json")

# Verify the content
with open(dataset_dir / "dataset.json", "r") as f:
    print(f.read())

✅ dataset.json generated: /content/nnUNet/nnUNet_raw/Dataset501_KSSD/dataset.json
{
    "name": "KidneyStoneSegmentationDataset",
    "description": "Kidney Stone Segmentation Dataset from Kaggle KSSD2025",
    "reference": "https://www.kaggle.com/datasets/murillobouzon/kssd2025-kidney-stone-segmentation-dataset",
    "licence": "Apache 2.0",
    "release": "1.0",
    "numTraining": 838,
    "numTest": 0,
    "channel_names": {
        "0": "CT"
    },
    "labels": {
        "background": 0,
        "kidney_stone": 1
    },
    "file_ending": ".nii.gz"
}


In [14]:
from pathlib import Path
import os

pre_root = Path(os.environ["nnUNet_preprocessed"])
print("nnUNet_preprocessed =", pre_root)

# List all Dataset501_* folders
ds501 = sorted([p for p in pre_root.glob("Dataset501_*") if p.is_dir()])
print("\nFound Dataset501_* folders:")
for p in ds501:
    print(" -", p.name)

# Search for splits_final.json anywhere under nnUNet_preprocessed
splits_files = sorted(pre_root.rglob("splits_final.json"))
print("\nFound splits_final.json files:")
for s in splits_files:
    print(" -", s)

if splits_files:
    # pick the one that belongs to Dataset501_*
    chosen = None
    for s in splits_files:
        if "Dataset501_" in str(s):
            chosen = s
            break
    if chosen is None:
        chosen = splits_files[0]

    print("\n✅ Using splits at:", chosen)
else:
    print("\n❌ No splits_final.json found anywhere under nnUNet_preprocessed.")


nnUNet_preprocessed = /content/nnUNet/nnUNet_preprocessed

Found Dataset501_* folders:

Found splits_final.json files:

❌ No splits_final.json found anywhere under nnUNet_preprocessed.


In [15]:
!nnUNetv2_plan_and_preprocess -d 501 --verify_dataset_integrity

Fingerprint extraction...
Dataset501_KSSD
Using <class 'nnunetv2.imageio.simpleitk_reader_writer.SimpleITKIO'> as reader/writer

####################
verify_dataset_integrity Done. 
If you didn't see any error messages then your dataset is most likely OK!
####################

Using <class 'nnunetv2.imageio.simpleitk_reader_writer.SimpleITKIO'> as reader/writer
100% 838/838 [00:07<00:00, 105.28it/s]
Experiment planning...

############################
INFO: You are using the old nnU-Net default planner. We have updated our recommendations. Please consider using those instead! Read more here: https://github.com/MIC-DKFZ/nnUNet/blob/master/documentation/resenc_presets.md
############################

Dropping 3d_lowres config because the image size difference to 3d_fullres is too small. 3d_fullres: [512. 416.   1.], 3d_lowres: [512, 416, 1]
2D U-Net configuration:
{'data_identifier': 'nnUNetPlans_2d', 'preprocessor_name': 'DefaultPreprocessor', 'batch_size': 2535, 'patch_size': (np.int64

## Generate `splits_final.json`

In [16]:
import json
from pathlib import Path
import os
from sklearn.model_selection import KFold

# Retrieve nnU-Net paths
nnunet_raw = Path(os.environ["nnUNet_raw"])
ds_name = "Dataset501_KSSD"
target_raw = nnunet_raw / ds_name

nnunet_preprocessed = Path(os.environ["nnUNet_preprocessed"])
ds_pre_path = nnunet_preprocessed / ds_name

# Load the list of all case keys from the raw data structure
# We assume the imagesTr folder contains files like `case_id_0000.nii.gz`
image_files = sorted(list((target_raw / "imagesTr").glob("*.nii.gz")))
case_ids = sorted([p.name.replace("_0000.nii.gz", "") for p in image_files])

print(f"Found {len(case_ids)} cases for splitting.")

# Generate 5-fold cross-validation splits
kf = KFold(n_splits=5, shuffle=True, random_state=42)
splits = []

for train_idx, val_idx in kf.split(case_ids):
    train_cases = [case_ids[i] for i in train_idx]
    val_cases = [case_ids[i] for i in val_idx]
    splits.append({"train": train_cases, "val": val_cases})

# Save splits_final.json
splits_path = ds_pre_path / "splits_final.json"

with open(splits_path, "w") as f:
    json.dump(splits, f, indent=4)

print("✅ splits_final.json generated at:", splits_path)

# Verify content
with open(splits_path, "r") as f:
    print(f.read())

Found 838 cases for splitting.
✅ splits_final.json generated at: /content/nnUNet/nnUNet_preprocessed/Dataset501_KSSD/splits_final.json
[
    {
        "train": [
            "1",
            "10",
            "1001",
            "1002",
            "1012",
            "1014",
            "1015",
            "1021",
            "1022",
            "1023",
            "1024",
            "1025",
            "1026",
            "1027",
            "1028",
            "1029",
            "1030",
            "1031",
            "1032",
            "1034",
            "1035",
            "1036",
            "1037",
            "1038",
            "1042",
            "1044",
            "1045",
            "1046",
            "1047",
            "1048",
            "1050",
            "1051",
            "1052",
            "1053",
            "1054",
            "1055",
            "1056",
            "1057",
            "1058",
            "1060",
            "1061",
            "1062",
   

In [17]:
from pathlib import Path
import os

pre_root = Path(os.environ["nnUNet_preprocessed"])
ds_name = "Dataset501_KSSD"
ds_pre_path = pre_root / ds_name

print(f"Contents of {ds_pre_path}:")
if ds_pre_path.exists():
    for item in sorted(ds_pre_path.iterdir()):
        if item.is_dir():
            print(f"  [DIR] {item.name}/")
        else:
            print(f"  [FILE] {item.name}")
else:
    print(f"Directory {ds_pre_path} does not exist.")

Contents of /content/nnUNet/nnUNet_preprocessed/Dataset501_KSSD:
  [FILE] dataset.json
  [FILE] dataset_fingerprint.json
  [DIR] gt_segmentations/
  [FILE] nnUNetPlans.json
  [DIR] nnUNetPlans_2d/
  [DIR] nnUNetPlans_3d_fullres/
  [FILE] splits_final.json


In [18]:
from pathlib import Path
import os

pre_root = Path(os.environ["nnUNet_preprocessed"])
print("nnUNet_preprocessed =", pre_root)

# Prefer Dataset501_KSSD if it exists, otherwise pick the only Dataset501_* folder
preferred = pre_root / "Dataset501_KSSD"
if preferred.exists():
    ds_pre = preferred
else:
    candidates = sorted([p for p in pre_root.glob("Dataset501_*") if p.is_dir()])
    assert len(candidates) >= 1, "No Dataset501_* folder found. Preprocess likely failed."
    # If multiple exist, pick the most recently modified folder
    ds_pre = max(candidates, key=lambda p: p.stat().st_mtime)

splits_path = ds_pre / "splits_final.json"
print("Using preprocessed dataset folder:", ds_pre)
print("Checking splits:", splits_path)

assert splits_path.exists(), "splits_final.json still missing. Re-run plan_and_preprocess and check its output for errors."
print("✅ splits_final.json found. You can proceed to training/CV now.")

nnUNet_preprocessed = /content/nnUNet/nnUNet_preprocessed
Using preprocessed dataset folder: /content/nnUNet/nnUNet_preprocessed/Dataset501_KSSD
Checking splits: /content/nnUNet/nnUNet_preprocessed/Dataset501_KSSD/splits_final.json
✅ splits_final.json found. You can proceed to training/CV now.


Show what’s inside the preprocessed folder

In [19]:
from pathlib import Path
import os

ds_pre = Path(os.environ["nnUNet_preprocessed"]) / "Dataset501_KSSD"
if not ds_pre.exists():
    # fallback to whichever Dataset501_* was chosen last time
    candidates = sorted([p for p in Path(os.environ["nnUNet_preprocessed"]).glob("Dataset501_*") if p.is_dir()])
    ds_pre = max(candidates, key=lambda p: p.stat().st_mtime)

print("Listing:", ds_pre)
for p in sorted(ds_pre.iterdir()):
    print(p.name)


Listing: /content/nnUNet/nnUNet_preprocessed/Dataset501_KSSD
dataset.json
dataset_fingerprint.json
gt_segmentations
nnUNetPlans.json
nnUNetPlans_2d
nnUNetPlans_3d_fullres
splits_final.json


In [20]:
import json
from pathlib import Path
import os

splits_path = Path(os.environ["nnUNet_preprocessed"]) / "Dataset501_KSSD" / "splits_final.json"
splits = json.loads(splits_path.read_text())

print("Number of folds:", len(splits))
for i, sp in enumerate(splits):
    print(f"Fold {i}: train={len(sp['train'])}, val={len(sp['val'])}")


Number of folds: 5
Fold 0: train=670, val=168
Fold 1: train=670, val=168
Fold 2: train=670, val=168
Fold 3: train=671, val=167
Fold 4: train=671, val=167


In [21]:
!nvidia-smi


Tue Dec 23 13:53:42 2025       
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.54.15              Driver Version: 550.54.15      CUDA Version: 12.4     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|   0  NVIDIA A100-SXM4-80GB          Off |   00000000:00:05.0 Off |                    0 |
| N/A   36C    P0             55W /  400W |       0MiB /  81920MiB |      0%      Default |
|                                         |                        |             Disabled |
+-----------------------------------------+------------------------+----------------------+
                                                

In [22]:
from google.colab import drive
drive.mount('/content/drive')


Mounted at /content/drive


In [23]:
import os
os.environ["nnUNet_results"] = "/content/drive/MyDrive/nnUNet_results"
print("nnUNet_results =", os.environ["nnUNet_results"])


nnUNet_results = /content/drive/MyDrive/nnUNet_results


In [24]:
import json, os
from pathlib import Path
splits = json.loads((Path(os.environ["nnUNet_preprocessed"]) / "Dataset501_KSSD" / "splits_final.json").read_text())
[(i, len(s["train"]), len(s["val"])) for i, s in enumerate(splits)]


[(0, 670, 168), (1, 670, 168), (2, 670, 168), (3, 671, 167), (4, 671, 167)]

In [25]:
from pathlib import Path
import os

splits_path = Path(os.environ["nnUNet_preprocessed"]) / "Dataset501_KSSD" / "splits_final.json"
print("splits:", splits_path)
assert splits_path.exists(), "splits_final.json missing. Run preprocess again."
print("✅ Splits ready. Do NOT delete/regenerate this file.")


splits: /content/nnUNet/nnUNet_preprocessed/Dataset501_KSSD/splits_final.json
✅ Splits ready. Do NOT delete/regenerate this file.


In [34]:
from pathlib import Path
import os
from collections import Counter

folder = Path(os.environ["nnUNet_preprocessed"]) / "Dataset501_KSSD" / "nnUNetPlans_2d"
files = [p.name for p in folder.iterdir() if p.is_file()]

# show top 50 filenames
print("Example files:")
for n in sorted(files)[:50]:
    print(" ", n)

# count file endings (last suffix or .nii.gz-style)
def ending(name: str) -> str:
    if name.endswith(".nii.gz"):
        return ".nii.gz"
    if name.endswith(".npz"):
        return ".npz"
    if name.endswith(".pkl"):
        return ".pkl"
    if name.endswith(".npy"):
        return ".npy"
    if name.endswith(".pt"):
        return ".pt"
    # last suffix fallback
    return Path(name).suffix

cnt = Counter(ending(n) for n in files)
print("\nEndings found:", cnt)
print("Total files:", len(files))


Example files:
  1.b2nd
  1.pkl
  10.b2nd
  10.pkl
  1000.b2nd
  1000.pkl
  1000_seg.b2nd
  1001.b2nd
  1001.pkl
  1001_seg.b2nd
  1002.b2nd
  1002.pkl
  1002_seg.b2nd
  1003.b2nd
  1003.pkl
  1003_seg.b2nd
  1012.b2nd
  1012.pkl
  1012_seg.b2nd
  1013.b2nd
  1013.pkl
  1013_seg.b2nd
  1014.b2nd
  1014.pkl
  1014_seg.b2nd
  1015.b2nd
  1015.pkl
  1015_seg.b2nd
  1020.b2nd
  1020.pkl
  1020_seg.b2nd
  1021.b2nd
  1021.pkl
  1021_seg.b2nd
  1022.b2nd
  1022.pkl
  1022_seg.b2nd
  1023.b2nd
  1023.pkl
  1023_seg.b2nd
  1024.b2nd
  1024.pkl
  1024_seg.b2nd
  1025.b2nd
  1025.pkl
  1025_seg.b2nd
  1026.b2nd
  1026.pkl
  1026_seg.b2nd
  1027.b2nd

Endings found: Counter({'.b2nd': 1676, '.pkl': 838})
Total files: 2514


In [35]:
import shutil
from pathlib import Path
import os

folder = Path(os.environ["nnUNet_preprocessed"]) / "Dataset501_KSSD" / "nnUNetPlans_2d"
assert folder.exists(), "nnUNetPlans_2d folder not found"

print("Deleting:", folder)
shutil.rmtree(folder)
print("✅ Deleted nnUNetPlans_2d cache")


Deleting: /content/nnUNet/nnUNet_preprocessed/Dataset501_KSSD/nnUNetPlans_2d
✅ Deleted nnUNetPlans_2d cache


In [36]:
!nnUNetv2_plan_and_preprocess -d 501 --verify_dataset_integrity


Fingerprint extraction...
Dataset501_KSSD
Using <class 'nnunetv2.imageio.simpleitk_reader_writer.SimpleITKIO'> as reader/writer

####################
verify_dataset_integrity Done. 
If you didn't see any error messages then your dataset is most likely OK!
####################

Experiment planning...

############################
INFO: You are using the old nnU-Net default planner. We have updated our recommendations. Please consider using those instead! Read more here: https://github.com/MIC-DKFZ/nnUNet/blob/master/documentation/resenc_presets.md
############################

2D U-Net configuration:
{'data_identifier': 'nnUNetPlans_2d', 'preprocessor_name': 'DefaultPreprocessor', 'batch_size': 14, 'patch_size': (np.int64(512), np.int64(448)), 'median_image_size_in_voxels': array([512., 416.]), 'spacing': array([1., 1.]), 'normalization_schemes': ['CTNormalization'], 'use_mask_for_norm': [False], 'resampling_fn_data': 'resample_data_or_seg_to_shape', 'resampling_fn_seg': 'resample_data_

Train nnU-Net 5 folds

In [None]:
!nnUNetv2_train 501 2d 0


############################
INFO: You are using the old nnU-Net default plans. We have updated our recommendations. Please consider using those instead! Read more here: https://github.com/MIC-DKFZ/nnUNet/blob/master/documentation/resenc_presets.md
############################

Using device: cuda:0

#######################################################################
Please cite the following paper when using nnU-Net:
Isensee, F., Jaeger, P. F., Kohl, S. A., Petersen, J., & Maier-Hein, K. H. (2021). nnU-Net: a self-configuring method for deep learning-based biomedical image segmentation. Nature methods, 18(2), 203-211.
#######################################################################

2025-12-23 14:12:59.180202: Using torch.compile...
2025-12-23 14:13:00.786543: do_dummy_2d_data_aug: False
2025-12-23 14:13:00.791765: Using splits from existing split file: /content/nnUNet/nnUNet_preprocessed/Dataset501_KSSD/splits_final.json
2025-12-23 14:13:00.794445: The split file contains 