# Fundus Preprocessing Pipeline — Explanation

## What this script does
Takes a dataset in `dataset/<class_name>/<image>` format, **preprocesses fundus images**, saves 224×224 outputs to `preprocessed224_best/<class_name>/...`, and writes a `manifest.csv` with per-image status and label. It runs **multi-threaded** for speed.

---

## Inputs & knobs
- **Paths:** `input_dir="dataset"`, `output_dir="preprocessed224_best"`
- **Output size:** `size=224` (change here to produce e.g., 384)
- **Workers:** `workers=8` (threads for parallel processing)
- **Format:** `ext="jpg"` (can be `png` etc.)
- **Manifest:** `manifest.csv` in the output root

---

## File I/O helpers (robust to unicode paths)
- `imread_any(path)`: reads bytes → `cv2.imdecode` (safer than `cv2.imread` for odd paths)
- `imwrite_any(path, img, ext)`: encodes via `cv2.imencode` then writes bytes (controls JPEG quality/PNG compression)
- Supports extensions: `.jpg, .jpeg, .png, .bmp, .tif, .tiff, .webp`

---

## Core preprocessing steps (fundus-specific)
All steps occur in **`preprocess_fundus(img_bgr, size)`** in this order:

1. **ROI crop (`fundus_roi_crop`)**  
   - Threshold low intensities (`gray > 8`) to get a circular mask  
   - Morphological close (7×7) to fill holes  
   - **`robust_bbox_from_mask`**: bounding box with small margin → crop tightly to fundus

2. **Illumination/shade correction (`shade_correction`)**  
   - Divide by heavy Gaussian blur (σ≈40) to remove vignetting/shading; rescale

3. **Color constancy (`shades_of_gray_cc`, p=6)**  
   - Normalizes per-channel gains to reduce color cast

4. **Local contrast on green (`clahe_on_green`)**  
   - CLAHE on the **green channel** (vessels/lesions are prominent in G)

5. **Optional global contrast (`optional_l_channel_clahe`)**  
   - CLAHE on **L (lightness)** in LAB space for overall contrast

6. **Adaptive gamma (`adaptive_gamma`, target=0.42)**  
   - Computes image median brightness → sets gamma to reach target midtone

7. **Sharpen (`unsharp`, σ=1.0, amount=0.5)**  
   - Unsharp masking for detail enhancement

8. **Square letterbox + resize (`letterbox_square`)**  
   - Pads to square (no aspect distortion) and resizes to **224×224** (cubic)

> Each step returns `uint8` BGR for OpenCV.

---

## Parallel processing
- Collects all image files via `rglob`
- Uses `ThreadPoolExecutor(workers)` to **process images concurrently**
- Per-file function: **`process_one`**
  - Skips if output already exists
  - Reads → `preprocess_fundus` → writes → returns status (`ok`, `skipped`, `read_error`, `write_error`, or `error:<msg>`)

---

## Manifest & labels
- Aggregates results into a **DataFrame (`df`)**
- Infers **label** from first path component under `input_dir` (the class folder)
- Saves `manifest.csv` with columns: `in`, `out`, `status`, `label`
- Prints a **status summary** at the end

---

## Things you can tune quickly
- **`size`** (e.g., 384) and **`output_dir`**
- CLAHE params: `clip`, `tile`
- **Gamma target** (`adaptive_gamma(target=...)`)
- **Shade σ** (`shade_correction(sigma=...)`)
- **Unsharp** (`sigma`, `amount`)
- **Margin** in `robust_bbox_from_mask(margin_ratio=...)`
- **`workers`** for faster preprocessing



In [7]:
from pathlib import Path
from concurrent.futures import ThreadPoolExecutor, as_completed
import cv2
import numpy as np
import pandas as pd
from tqdm import tqdm

# ====== SET YOUR PATHS HERE ======
input_dir  = r"dataset"        # folder with class folders
output_dir = r"preprocessed224_best"
size       = 224
ext        = "jpg"
workers    = 8
manifest   = "manifest.csv"
# =================================

# --- all helper functions from before ---
def imread_any(path: Path):
    arr = np.fromfile(str(path), dtype=np.uint8)
    return cv2.imdecode(arr, cv2.IMREAD_COLOR)

def imwrite_any(path: Path, img_bgr: np.ndarray, ext="jpg", jpg_quality=95, png_compress=3):
    path = path.with_suffix(f".{ext.lower()}")
    if ext.lower() in ("jpg", "jpeg"):
        ok, enc = cv2.imencode(".jpg", img_bgr, [int(cv2.IMWRITE_JPEG_QUALITY), jpg_quality])
    elif ext.lower() == "png":
        ok, enc = cv2.imencode(".png", img_bgr, [int(cv2.IMWRITE_PNG_COMPRESSION), png_compress])
    else:
        ok, enc = cv2.imencode(f".{ext}", img_bgr)
    if not ok:
        return False
    enc.tofile(str(path))
    return True

def robust_bbox_from_mask(mask: np.ndarray, margin_ratio=0.02):
    coords = cv2.findNonZero(mask)
    if coords is None:
        h, w = mask.shape[:2]
        return 0, 0, w, h
    x, y, w, h = cv2.boundingRect(coords)
    m = int(max(h, w) * margin_ratio)
    x = max(0, x - m); y = max(0, y - m)
    return x, y, min(mask.shape[1] - x, w + 2*m), min(mask.shape[0] - y, h + 2*m)

def fundus_roi_crop(bgr: np.ndarray):
    gray = cv2.cvtColor(bgr, cv2.COLOR_BGR2GRAY)
    _, mask = cv2.threshold(gray, 8, 255, cv2.THRESH_BINARY)
    mask = cv2.morphologyEx(mask, cv2.MORPH_CLOSE, np.ones((7,7), np.uint8))
    x, y, w, h = robust_bbox_from_mask(mask)
    return bgr[y:y+h, x:x+w]

def shade_correction(bgr: np.ndarray, sigma=40):
    I = bgr.astype(np.float32) + 1.0
    bg = cv2.GaussianBlur(I, (0,0), sigmaX=sigma, sigmaY=sigma)
    corrected = I / (bg + 1e-6) * 128.0
    return np.clip(corrected, 0, 255).astype(np.uint8)

def shades_of_gray_cc(bgr: np.ndarray, p=6, eps=1e-6):
    I = bgr.astype(np.float32)
    Rp = np.power(I[:,:,2], p).mean() ** (1.0/p)
    Gp = np.power(I[:,:,1], p).mean() ** (1.0/p)
    Bp = np.power(I[:,:,0], p).mean() ** (1.0/p)
    scale = (Rp + Gp + Bp) / 3.0
    R = I[:,:,2] * (scale / (Rp + eps))
    G = I[:,:,1] * (scale / (Gp + eps))
    B = I[:,:,0] * (scale / (Bp + eps))
    out = np.stack([B, G, R], axis=2)
    return np.clip(out, 0, 255).astype(np.uint8)

def clahe_on_green(bgr: np.ndarray, clip=2.0, tile=(8,8)):
    b, g, r = cv2.split(bgr)
    clahe = cv2.createCLAHE(clipLimit=clip, tileGridSize=tile)
    g2 = clahe.apply(g)
    return cv2.merge([b, g2, r])

def optional_l_channel_clahe(bgr: np.ndarray, clip=1.5, tile=(8,8)):
    lab = cv2.cvtColor(bgr, cv2.COLOR_BGR2LAB)
    L, a, b = cv2.split(lab)
    clahe = cv2.createCLAHE(clipLimit=clip, tileGridSize=tile)
    L2 = clahe.apply(L)
    lab2 = cv2.merge([L2, a, b])
    return cv2.cvtColor(lab2, cv2.COLOR_LAB2BGR)

def adaptive_gamma(bgr: np.ndarray, target=0.42):
    gray = cv2.cvtColor(bgr, cv2.COLOR_BGR2GRAY)
    med = np.median(gray) / 255.0
    med = np.clip(med, 1e-4, 0.999)
    gamma = np.log(target) / np.log(med)
    inv = 1.0 / np.clip(gamma, 0.2, 5.0)
    lut = np.arange(256, dtype=np.float32) / 255.0
    lut = np.power(lut, inv)
    lut = np.clip(lut * 255.0, 0, 255).astype(np.uint8)
    return cv2.LUT(bgr, lut)

def unsharp(bgr: np.ndarray, sigma=1.0, amount=0.5):
    blurred = cv2.GaussianBlur(bgr, (0,0), sigmaX=sigma, sigmaY=sigma)
    sharp = cv2.addWeighted(bgr, 1 + amount, blurred, -amount, 0)
    return np.clip(sharp, 0, 255).astype(np.uint8)

def letterbox_square(bgr: np.ndarray, size=224, color=(0,0,0)):
    h, w = bgr.shape[:2]
    dim = max(h, w)
    top = (dim - h) // 2
    bottom = dim - h - top
    left = (dim - w) // 2
    right = dim - w - left
    padded = cv2.copyMakeBorder(bgr, top, bottom, left, right,
                                cv2.BORDER_CONSTANT, value=color)
    return cv2.resize(padded, (size, size), interpolation=cv2.INTER_CUBIC)

def preprocess_fundus(img_bgr: np.ndarray, size=224):
    img = fundus_roi_crop(img_bgr)
    img = shade_correction(img, sigma=40)
    img = shades_of_gray_cc(img, p=6)
    img = clahe_on_green(img, clip=2.0, tile=(8,8))
    img = optional_l_channel_clahe(img, clip=1.5, tile=(8,8))
    img = adaptive_gamma(img, target=0.42)
    img = unsharp(img, sigma=1.0, amount=0.5)
    img = letterbox_square(img, size=size, color=(0,0,0))
    return img

def process_one(in_path: Path, out_root: Path, rel: Path):
    out_path = (out_root / rel).with_suffix(f".{ext}")
    out_path.parent.mkdir(parents=True, exist_ok=True)
    if out_path.exists():
        return {"in": str(in_path), "out": str(out_path), "status": "skipped"}
    bgr = imread_any(in_path)
    if bgr is None:
        return {"in": str(in_path), "out": str(out_path), "status": "read_error"}
    try:
        proc = preprocess_fundus(bgr, size=size)
        ok = imwrite_any(out_path, proc, ext=ext)
        return {"in": str(in_path), "out": str(out_path), "status": "ok" if ok else "write_error"}
    except Exception as e:
        return {"in": str(in_path), "out": str(out_path), "status": f"error:{e}"}

# --- run preprocessing ---
in_root = Path(input_dir)
out_root = Path(output_dir)
out_root.mkdir(parents=True, exist_ok=True)

exts = {".jpg", ".jpeg", ".png", ".bmp", ".tif", ".tiff", ".webp"}
files = [p for p in in_root.rglob("*") if p.suffix.lower() in exts]

rows = []
with ThreadPoolExecutor(max_workers=workers) as ex_pool:
    futures = []
    for p in files:
        rel = p.relative_to(in_root)
        futures.append(ex_pool.submit(process_one, p, out_root, rel))
    for fut in tqdm(as_completed(futures), total=len(futures), desc="Preprocessing (fundus best)"):
        rows.append(fut.result())

df = pd.DataFrame(rows)
labels = []
for r in df["in"]:
    rel = Path(r).relative_to(in_root)
    labels.append(rel.parts[0] if len(rel.parts) > 1 else "unknown")
df["label"] = labels
df.to_csv(out_root / manifest, index=False, encoding="utf-8")

print("Summary:\n", df["status"].value_counts())
print(f"\nSaved images & manifest to: {out_root}")


Preprocessing (fundus best): 100%|█████████████████████████████████████████████████| 4728/4728 [04:08<00:00, 19.01it/s]


Summary:
 status
ok    4728
Name: count, dtype: int64

Saved images & manifest to: preprocessed224_best


## Preprocessing 299

In [1]:
from pathlib import Path
from concurrent.futures import ThreadPoolExecutor, as_completed
import cv2
import numpy as np
import pandas as pd
from tqdm import tqdm

# ====== SET YOUR PATHS HERE (InceptionV3) ======
input_dir  = r"dataset"                 # folder with class folders
output_dir = r"preprocessed299_inception"
size       = 299                        # <-- InceptionV3 wants 299x299
ext        = "jpg"
workers    = 8
manifest   = "manifest.csv"
# ===============================================

# --- helpers ---
def imread_any(path: Path):
    arr = np.fromfile(str(path), dtype=np.uint8)
    return cv2.imdecode(arr, cv2.IMREAD_COLOR)

def imwrite_any(path: Path, img_bgr: np.ndarray, ext="jpg", jpg_quality=95, png_compress=3):
    path = path.with_suffix(f".{ext.lower()}")
    if ext.lower() in ("jpg", "jpeg"):
        ok, enc = cv2.imencode(".jpg", img_bgr, [int(cv2.IMWRITE_JPEG_QUALITY), jpg_quality])
    elif ext.lower() == "png":
        ok, enc = cv2.imencode(".png", img_bgr, [int(cv2.IMWRITE_PNG_COMPRESSION), png_compress])
    else:
        ok, enc = cv2.imencode(f".{ext}", img_bgr)
    if not ok:
        return False
    enc.tofile(str(path))
    return True

def robust_bbox_from_mask(mask: np.ndarray, margin_ratio=0.02):
    coords = cv2.findNonZero(mask)
    if coords is None:
        h, w = mask.shape[:2]
        return 0, 0, w, h
    x, y, w, h = cv2.boundingRect(coords)
    m = int(max(h, w) * margin_ratio)
    x = max(0, x - m); y = max(0, y - m)
    return x, y, min(mask.shape[1] - x, w + 2*m), min(mask.shape[0] - y, h + 2*m)

def fundus_roi_crop(bgr: np.ndarray):
    gray = cv2.cvtColor(bgr, cv2.COLOR_BGR2GRAY)
    _, mask = cv2.threshold(gray, 8, 255, cv2.THRESH_BINARY)
    mask = cv2.morphologyEx(mask, cv2.MORPH_CLOSE, np.ones((7,7), np.uint8))
    x, y, w, h = robust_bbox_from_mask(mask)
    return bgr[y:y+h, x:x+w]

def shade_correction(bgr: np.ndarray, sigma=40):
    I = bgr.astype(np.float32) + 1.0
    bg = cv2.GaussianBlur(I, (0,0), sigmaX=sigma, sigmaY=sigma)
    corrected = I / (bg + 1e-6) * 128.0
    return np.clip(corrected, 0, 255).astype(np.uint8)

def shades_of_gray_cc(bgr: np.ndarray, p=6, eps=1e-6):
    I = bgr.astype(np.float32)
    Rp = np.power(I[:,:,2], p).mean() ** (1.0/p)
    Gp = np.power(I[:,:,1], p).mean() ** (1.0/p)
    Bp = np.power(I[:,:,0], p).mean() ** (1.0/p)
    scale = (Rp + Gp + Bp) / 3.0
    R = I[:,:,2] * (scale / (Rp + eps))
    G = I[:,:,1] * (scale / (Gp + eps))
    B = I[:,:,0] * (scale / (Bp + eps))
    out = np.stack([B, G, R], axis=2)
    return np.clip(out, 0, 255).astype(np.uint8)

def clahe_on_green(bgr: np.ndarray, clip=2.0, tile=(8,8)):
    b, g, r = cv2.split(bgr)
    clahe = cv2.createCLAHE(clipLimit=clip, tileGridSize=tile)
    g2 = clahe.apply(g)
    return cv2.merge([b, g2, r])

def optional_l_channel_clahe(bgr: np.ndarray, clip=1.5, tile=(8,8)):
    lab = cv2.cvtColor(bgr, cv2.COLOR_BGR2LAB)
    L, a, b = cv2.split(lab)
    clahe = cv2.createCLAHE(clipLimit=clip, tileGridSize=tile)
    L2 = clahe.apply(L)
    lab2 = cv2.merge([L2, a, b])
    return cv2.cvtColor(lab2, cv2.COLOR_LAB2BGR)

def adaptive_gamma(bgr: np.ndarray, target=0.42):
    gray = cv2.cvtColor(bgr, cv2.COLOR_BGR2GRAY)
    med = np.median(gray) / 255.0
    med = np.clip(med, 1e-4, 0.999)
    gamma = np.log(target) / np.log(med)
    inv = 1.0 / np.clip(gamma, 0.2, 5.0)
    lut = np.arange(256, dtype=np.float32) / 255.0
    lut = np.power(lut, inv)
    lut = np.clip(lut * 255.0, 0, 255).astype(np.uint8)
    return cv2.LUT(bgr, lut)

def unsharp(bgr: np.ndarray, sigma=1.0, amount=0.5):
    blurred = cv2.GaussianBlur(bgr, (0,0), sigmaX=sigma, sigmaY=sigma)
    sharp = cv2.addWeighted(bgr, 1 + amount, blurred, -amount, 0)
    return np.clip(sharp, 0, 255).astype(np.uint8)

def letterbox_square(bgr: np.ndarray, size=299, color=(0,0,0)):
    h, w = bgr.shape[:2]
    dim = max(h, w)
    top = (dim - h) // 2
    bottom = dim - h - top
    left = (dim - w) // 2
    right = dim - w - left
    padded = cv2.copyMakeBorder(bgr, top, bottom, left, right,
                                cv2.BORDER_CONSTANT, value=color)
    return cv2.resize(padded, (size, size), interpolation=cv2.INTER_CUBIC)

def preprocess_fundus(img_bgr: np.ndarray, size=299):
    img = fundus_roi_crop(img_bgr)
    img = shade_correction(img, sigma=40)
    img = shades_of_gray_cc(img, p=6)
    img = clahe_on_green(img, clip=2.0, tile=(8,8))
    img = optional_l_channel_clahe(img, clip=1.5, tile=(8,8))
    img = adaptive_gamma(img, target=0.42)
    img = unsharp(img, sigma=1.0, amount=0.5)
    img = letterbox_square(img, size=size, color=(0,0,0))  # -> 299x299
    return img

def process_one(in_path: Path, out_root: Path, rel: Path):
    out_path = (out_root / rel).with_suffix(f".{ext}")
    out_path.parent.mkdir(parents=True, exist_ok=True)
    if out_path.exists():
        return {"in": str(in_path), "out": str(out_path), "status": "skipped"}
    bgr = imread_any(in_path)
    if bgr is None:
        return {"in": str(in_path), "out": str(out_path), "status": "read_error"}
    try:
        proc = preprocess_fundus(bgr, size=size)
        ok = imwrite_any(out_path, proc, ext=ext)
        return {"in": str(in_path), "out": str(out_path), "status": "ok" if ok else "write_error"}
    except Exception as e:
        return {"in": str(in_path), "out": str(out_path), "status": f"error:{e}"}

# --- run preprocessing ---
in_root = Path(input_dir)
out_root = Path(output_dir)
out_root.mkdir(parents=True, exist_ok=True)

exts = {".jpg", ".jpeg", ".png", ".bmp", ".tif", ".tiff", ".webp"}
files = [p for p in in_root.rglob("*") if p.suffix.lower() in exts]

rows = []
with ThreadPoolExecutor(max_workers=workers) as ex_pool:
    futures = []
    for p in files:
        rel = p.relative_to(in_root)
        futures.append(ex_pool.submit(process_one, p, out_root, rel))
    for fut in tqdm(as_completed(futures), total=len(futures), desc="Preprocessing (fundus → 299)"):
        rows.append(fut.result())

df = pd.DataFrame(rows)
labels = []
for r in df["in"]:
    rel = Path(r).relative_to(in_root)
    labels.append(rel.parts[0] if len(rel.parts) > 1 else "unknown")
df["label"] = labels
df.to_csv(out_root / manifest, index=False, encoding="utf-8")

print("Summary:\n", df["status"].value_counts())
print(f"\nSaved images & manifest to: {out_root}")


Preprocessing (fundus → 299): 100%|████████████████████████████████████████████████| 4728/4728 [04:44<00:00, 16.62it/s]

Summary:
 status
ok    4728
Name: count, dtype: int64

Saved images & manifest to: preprocessed299_inception





## Preprocessing 384

In [1]:
from pathlib import Path
from concurrent.futures import ThreadPoolExecutor, as_completed
import cv2
import numpy as np
import pandas as pd
from tqdm import tqdm

# ====== SET YOUR PATHS HERE (384 px) ======
input_dir  = r"dataset"                 # folder with class folders
output_dir = r"preprocessed384_best"
size       = 384                        # <-- target size
ext        = "jpg"
workers    = 8
manifest   = "manifest.csv"
# ==========================================

# --- helpers ---
def imread_any(path: Path):
    arr = np.fromfile(str(path), dtype=np.uint8)
    return cv2.imdecode(arr, cv2.IMREAD_COLOR)

def imwrite_any(path: Path, img_bgr: np.ndarray, ext="jpg", jpg_quality=95, png_compress=3):
    path = path.with_suffix(f".{ext.lower()}")
    if ext.lower() in ("jpg", "jpeg"):
        ok, enc = cv2.imencode(".jpg", img_bgr, [int(cv2.IMWRITE_JPEG_QUALITY), jpg_quality])
    elif ext.lower() == "png":
        ok, enc = cv2.imencode(".png", img_bgr, [int(cv2.IMWRITE_PNG_COMPRESSION), png_compress])
    else:
        ok, enc = cv2.imencode(f".{ext}", img_bgr)
    if not ok:
        return False
    enc.tofile(str(path))
    return True

def robust_bbox_from_mask(mask: np.ndarray, margin_ratio=0.02):
    coords = cv2.findNonZero(mask)
    if coords is None:
        h, w = mask.shape[:2]
        return 0, 0, w, h
    x, y, w, h = cv2.boundingRect(coords)
    m = int(max(h, w) * margin_ratio)
    x = max(0, x - m); y = max(0, y - m)
    return x, y, min(mask.shape[1] - x, w + 2*m), min(mask.shape[0] - y, h + 2*m)

def fundus_roi_crop(bgr: np.ndarray):
    gray = cv2.cvtColor(bgr, cv2.COLOR_BGR2GRAY)
    _, mask = cv2.threshold(gray, 8, 255, cv2.THRESH_BINARY)
    mask = cv2.morphologyEx(mask, cv2.MORPH_CLOSE, np.ones((7,7), np.uint8))
    x, y, w, h = robust_bbox_from_mask(mask)
    return bgr[y:y+h, x:x+w]

def shade_correction(bgr: np.ndarray, sigma=40):
    I = bgr.astype(np.float32) + 1.0
    bg = cv2.GaussianBlur(I, (0,0), sigmaX=sigma, sigmaY=sigma)
    corrected = I / (bg + 1e-6) * 128.0
    return np.clip(corrected, 0, 255).astype(np.uint8)

def shades_of_gray_cc(bgr: np.ndarray, p=6, eps=1e-6):
    I = bgr.astype(np.float32)
    Rp = np.power(I[:,:,2], p).mean() ** (1.0/p)
    Gp = np.power(I[:,:,1], p).mean() ** (1.0/p)
    Bp = np.power(I[:,:,0], p).mean() ** (1.0/p)
    scale = (Rp + Gp + Bp) / 3.0
    R = I[:,:,2] * (scale / (Rp + eps))
    G = I[:,:,1] * (scale / (Gp + eps))
    B = I[:,:,0] * (scale / (Bp + eps))
    out = np.stack([B, G, R], axis=2)
    return np.clip(out, 0, 255).astype(np.uint8)

def clahe_on_green(bgr: np.ndarray, clip=2.0, tile=(8,8)):
    b, g, r = cv2.split(bgr)
    clahe = cv2.createCLAHE(clipLimit=clip, tileGridSize=tile)
    g2 = clahe.apply(g)
    return cv2.merge([b, g2, r])

def optional_l_channel_clahe(bgr: np.ndarray, clip=1.5, tile=(8,8)):
    lab = cv2.cvtColor(bgr, cv2.COLOR_BGR2LAB)
    L, a, b = cv2.split(lab)
    clahe = cv2.createCLAHE(clipLimit=clip, tileGridSize=tile)
    L2 = clahe.apply(L)
    lab2 = cv2.merge([L2, a, b])
    return cv2.cvtColor(lab2, cv2.COLOR_LAB2BGR)

def adaptive_gamma(bgr: np.ndarray, target=0.42):
    gray = cv2.cvtColor(bgr, cv2.COLOR_BGR2GRAY)
    med = np.median(gray) / 255.0
    med = np.clip(med, 1e-4, 0.999)
    gamma = np.log(target) / np.log(med)
    inv = 1.0 / np.clip(gamma, 0.2, 5.0)
    lut = np.arange(256, dtype=np.float32) / 255.0
    lut = np.power(lut, inv)
    lut = np.clip(lut * 255.0, 0, 255).astype(np.uint8)
    return cv2.LUT(bgr, lut)

def unsharp(bgr: np.ndarray, sigma=1.0, amount=0.5):
    blurred = cv2.GaussianBlur(bgr, (0,0), sigmaX=sigma, sigmaY=sigma)
    sharp = cv2.addWeighted(bgr, 1 + amount, blurred, -amount, 0)
    return np.clip(sharp, 0, 255).astype(np.uint8)

def letterbox_square(bgr: np.ndarray, size=384, color=(0,0,0)):
    h, w = bgr.shape[:2]
    dim = max(h, w)
    top = (dim - h) // 2
    bottom = dim - h - top
    left = (dim - w) // 2
    right = dim - w - left
    padded = cv2.copyMakeBorder(bgr, top, bottom, left, right,
                                cv2.BORDER_CONSTANT, value=color)
    # If downscaling a lot, INTER_AREA can be nicer; INTER_CUBIC is fine generally.
    return cv2.resize(padded, (size, size), interpolation=cv2.INTER_CUBIC)

def preprocess_fundus(img_bgr: np.ndarray, size=384):
    img = fundus_roi_crop(img_bgr)
    img = shade_correction(img, sigma=40)
    img = shades_of_gray_cc(img, p=6)
    img = clahe_on_green(img, clip=2.0, tile=(8,8))
    img = optional_l_channel_clahe(img, clip=1.5, tile=(8,8))
    img = adaptive_gamma(img, target=0.42)
    img = unsharp(img, sigma=1.0, amount=0.5)
    img = letterbox_square(img, size=size, color=(0,0,0))  # -> 384x384
    return img

def process_one(in_path: Path, out_root: Path, rel: Path):
    out_path = (out_root / rel).with_suffix(f".{ext}")
    out_path.parent.mkdir(parents=True, exist_ok=True)
    if out_path.exists():
        return {"in": str(in_path), "out": str(out_path), "status": "skipped"}
    bgr = imread_any(in_path)
    if bgr is None:
        return {"in": str(in_path), "out": str(out_path), "status": "read_error"}
    try:
        proc = preprocess_fundus(bgr, size=size)
        ok = imwrite_any(out_path, proc, ext=ext)
        return {"in": str(in_path), "out": str(out_path), "status": "ok" if ok else "write_error"}
    except Exception as e:
        return {"in": str(in_path), "out": str(out_path), "status": f"error:{e}"}

# --- run preprocessing ---
in_root = Path(input_dir)
out_root = Path(output_dir)
out_root.mkdir(parents=True, exist_ok=True)

exts = {".jpg", ".jpeg", ".png", ".bmp", ".tif", ".tiff", ".webp"}
files = [p for p in in_root.rglob("*") if p.suffix.lower() in exts]

rows = []
with ThreadPoolExecutor(max_workers=workers) as ex_pool:
    futures = []
    for p in files:
        rel = p.relative_to(in_root)
        futures.append(ex_pool.submit(process_one, p, out_root, rel))
    for fut in tqdm(as_completed(futures), total=len(futures), desc="Preprocessing (fundus → 384)"):
        rows.append(fut.result())

df = pd.DataFrame(rows)
labels = []
for r in df["in"]:
    rel = Path(r).relative_to(in_root)
    labels.append(rel.parts[0] if len(rel.parts) > 1 else "unknown")
df["label"] = labels
df.to_csv(out_root / manifest, index=False, encoding="utf-8")

print("Summary:\n", df["status"].value_counts())
print(f"\nSaved images & manifest to: {out_root}")


Preprocessing (fundus → 384): 100%|████████████████████████████████████████████████| 4728/4728 [04:05<00:00, 19.28it/s]

Summary:
 status
ok    4728
Name: count, dtype: int64

Saved images & manifest to: preprocessed384_best



