# LIDC-IDRI 肺结节patch数据集制作

## 概述
该notebook从LIDC-IDRI (Lung Image Database Consortium and Image Database Resource Initiative) 数据集中提取代表性的2D patch，用于肺结节分析的机器学习研究。

## 主要特性
- **多放射科医生共识**：使用可配置的共识阈值结合多个放射科医生的标注
- **智能切片选择**：基于结节面积和空间邻近性选择代表性切片
- **标准化CT窗口**：应用肺窗设置（-600 HU ± 750 HU）以获得一致的可视化效果
- **灵活输出**：生成单独的补丁图像和综合性元数据CSV文件
- **稳健的后备策略**：处理空共识掩码的边缘情况

## 数据集信息
- **来源**：LIDC-IDRI公共数据集
- 来自1010名患者的1018次扫描
- 每次扫描包含由多达4名放射科医生标注的多个结节
- **标注**：每个结节最多由4名放射科医生标注，包含9个语义属性
- **输出格式**：PNG图像（结节补丁）+ 包含标注分数的CSV元数据。具体字段见文档。

## 使用方法
- 指定患者id进行单患者处理
- 进行完整数据集提取
- 根据需要修改参数（共识级别、面积阈值等）

---

In [1]:
import os
from typing import Dict, List, Tuple, Any
import numpy as np
import pandas as pd
import pylidc as pl
from pylidc.utils import consensus
from tqdm import tqdm

from PIL import Image

In [2]:
# ===============================================================================
# 1) 工具函数
# ===============================================================================

# LIDC-IDRI数据集的标注字段（9个语义属性）
ANNOT_FIELDS = [
    'subtlety', 'internalStructure', 'calcification',
    'sphericity', 'margin', 'lobulation', 'spiculation',
    'texture', 'malignancy',
]

def _inplane_area_mm2(scan) -> float:
    """
    计算轴向平面中每个像素的面积（mm^2）。
    
    Args:
        scan: pylidc扫描对象
        
    Returns:
        float: 每个像素的面积，单位为mm^2
    """
    ps = scan.pixel_spacing  # 从pylidc获取像素间距
    try:
        sx, sy = float(ps[0]), float(ps[1])
    except Exception:
        sx = sy = float(ps)
    return sx * sy  # mm^2/px

def _aggregate_ann_fields(anns, agg="round") -> Dict[str, int]:
    """
    聚合同一结节的多个放射科医生标注。
    
    标注字段包括：'subtlety'（细微程度）, 'internalStructure'（内部结构）, 'calcification'（钙化）,
    'sphericity'（球形度）, 'margin'（边缘）, 'lobulation'（分叶）, 'spiculation'（毛刺）, 
    'texture'（纹理）, 'malignancy'（恶性程度）。
    
    Args:
        anns: 来自多个放射科医生的pylidc标注对象列表
        agg: 聚合方法 - "round"（平均后四舍五入）或"mean"（截断平均）
        
    Returns:
        Dict[str, int]: 聚合的标注分数
    """
    out = {}
    for f in ANNOT_FIELDS:
        vals = [getattr(a, f) for a in anns if getattr(a, f) is not None]
        if agg == "round":
            out[f] = int(np.rint(np.mean(vals))) if len(vals) else None
        elif agg == "mean":
            out[f] = int(np.mean(vals)) if len(vals) else None
    return out

def get_final_indices(kept, neigh, mode="intersect"):
    """
    结合面积过滤的切片与最大面积邻近切片。
    
    Args:
        kept: 通过面积阈值的切片索引
        neigh: 最大面积切片周围的邻近切片索引
        mode: "union"（合并所有）或"intersect"（仅重叠部分）
        
    Returns:
        np.ndarray: 最终选择的切片索引
    """
    if mode == "union":
        return np.unique(np.concatenate([kept, np.array(neigh, dtype=int)]))
    elif mode == "intersect":
        return np.intersect1d(kept, neigh)
    else:
        raise ValueError("mode必须是'union'或'intersect'")

def window_hu(img_hu, wl=-600, ww=1500):
    """
    应用CT窗口（肺窗）将HU值转换为0-255灰度图像。
    
    标准肺窗：WL=-600, WW=1500（范围：-1350到150 HU）
    
    Args:
        img_hu: Hounsfield单位的CT图像
        wl: 窗位（中心）
        ww: 窗宽
        
    Returns:
        np.ndarray: 8位灰度图像（0-255）
    """
    lo, hi = wl - ww/2.0, wl + ww/2.0
    x = np.clip(img_hu, lo, hi)
    x = (x - lo) / (hi - lo)
    return (x * 255).astype(np.uint8)

def _select_repr_slices_from_cmask(
    cmask: np.ndarray,
    scan,
    min_area_mm2: float = 50.0,   # 最小结节面积阈值（可调节）
    drop_ends: bool = False,      # 是否丢弃首尾切片
    n_neighbors: int = 2,         # 最大面积切片周围的邻近切片数量
    mode: str = "intersect"       # 过滤条件的交集或并集
) -> Tuple[np.ndarray, np.ndarray]:
    """
    基于以下条件从共识体积掩码中选择代表性切片：
    1. 可选择移除首尾切片（边界伪影）
    2. 基于面积的过滤（移除面积 < min_area_mm2的切片）
    3. 强制包含最大面积切片 ± 邻近切片
    
    Args:
        cmask: 3D共识掩码（H, W, K）
        scan: pylidc扫描对象，用于像素间距
        min_area_mm2: 最小面积阈值，单位为mm^2
        drop_ends: 是否排除首尾切片
        n_neighbors: 最大面积切片上下的切片数量
        mode: 结合面积和邻近条件的模式，"intersect"或"union"
        
    Returns:
        Tuple[np.ndarray, np.ndarray]: （选择的切片索引，对应的面积）
    """
    assert cmask.ndim == 3, "cmask期望为(H, W, K)格式"
    H, W, K = cmask.shape

    # 计算每个切片的面积
    px_area = _inplane_area_mm2(scan)
    areas = np.array([(cmask[:, :, k].sum()) * px_area for k in range(K)])

    # 初始化切片索引
    idx = np.arange(K)
    if drop_ends and K >= 3:
        idx = idx[1:-1]  # 移除首尾切片

    # 按面积阈值过滤
    kept = idx[areas[idx] >= float(min_area_mm2)]
    if kept.size == 0:
        # 后备方案：至少保留最大面积切片
        kept = np.array([int(np.argmax(areas))])

    # 始终包含最大面积切片及其邻近切片
    k0 = int(np.argmax(areas))
    neigh = [k0 + d for d in range(-n_neighbors, n_neighbors + 1) 
             if 0 <= k0 + d < K]  # 限制在有效范围内
    
    # 结合过滤结果
    kept_final = get_final_indices(kept, neigh, mode=mode)

    return kept_final, areas[kept_final]

# ===============================================================================
# 2) 核心函数：为单个结节提取补丁
# ===============================================================================

def extract_patches_for_nodule(
    scan,
    anns: List[pl.Annotation],
    clevel: float = 0.5,
    pad: List[Tuple[int, int]] = [(25, 25), (25, 25), (0, 0)],
    drop_ends: bool = False,
    min_area_mm2: float = 50.0,
    n_neighbors: int = 2,
    mode: str = "intersect",
) -> Dict[str, Any]:
    """
    从3D结节标注中提取代表性的2D补丁。
    
    处理流程：
    1. 从多个放射科医生标注生成共识掩码
    2. 如果共识为空则应用后备策略
    3. 基于面积和空间条件选择代表性切片
    4. 为每个选择的切片提取2D图像/掩码补丁
    
    Args:
        scan: pylidc扫描对象
        anns: 来自多个放射科医生的标注对象列表
        clevel: 共识级别（0.5 = 大多数一致）
        pad: 结节边界框周围的填充 [(x_pad), (y_pad), (z_pad)]
        drop_ends: 是否排除结节的首尾切片
        min_area_mm2: 切片选择的最小面积阈值
        n_neighbors: 最大面积切片周围的邻近切片数量
        mode: 切片选择模式（"intersect"或"union"）
        
    Returns:
        包含以下内容的字典：
        - 'bbox': 共识边界框
        - 'patches': 包含img、mask、索引、面积的补丁字典列表
        - 'ann_agg': 来自多个放射科医生的聚合标注分数
    """
    # 加载完整CT体积
    vol = scan.to_volume()  # 形状：（H, W, K），单位为HU值
    
    # 生成共识掩码（带后备策略）
    cmask, cbbox, masks = consensus(anns, clevel=clevel, pad=pad)
    
    if cmask.sum() == 0:
        # 后备方案1：降低共识阈值
        for cl in (0.25, 0.1):
            cmask_try, _, _ = consensus(anns, clevel=cl, pad=pad)
            if cmask_try.sum() > 0:
                cmask = cmask_try
                break
        
        # 后备方案2：使用所有个体掩码的并集
        if cmask.sum() == 0 and len(masks) > 0:
            cmask = np.any(np.stack(masks, axis=0), axis=0).astype(bool)

    # 使用共识边界框提取子体积
    subvol = vol[cbbox]
    
    # 选择代表性切片
    kept_k, kept_areas = _select_repr_slices_from_cmask(
        cmask, scan, min_area_mm2=min_area_mm2, n_neighbors=n_neighbors, mode=mode,
        drop_ends=drop_ends
    )

    # 为选择的切片生成2D补丁
    patches = []
    for k, k_area in zip(kept_k, kept_areas):
        img2d = subvol[:, :, int(k)]  # 2D CT切片
        m2d = cmask[:, :, int(k)].astype(bool)  # 2D二进制掩码
        
        patches.append({
            "img": img2d,                         # 2D HU补丁
            "mask": m2d,                          # 2D布尔掩码
            "k_local": int(k),                    # 边界框内的索引
            "k_global": int(cbbox[2].start + k),  # 原始体积中的索引
            "area": k_area,                       # 切片面积，单位为mm^2
        })

    # 聚合多个放射科医生的标注
    ann_agg = _aggregate_ann_fields(anns)

    return {"bbox": cbbox, "patches": patches, "ann_agg": ann_agg}

# ===============================================================================
# 3) 顶层函数：患者级别批处理
# ===============================================================================

def extract_patient_level_patches(
    patient_id: str = None,
    clevel: float = 0.5,
    pad: List[Tuple[int, int]] = [(25, 25), (25, 25), (0, 0)],
    drop_ends: bool = False,
    min_area_mm2: float = 50.0,
    n_neighbors: int = 2,
    mode: str = "intersect"
):
    """
    为指定患者的所有结节提取补丁。
    
    处理流程：
    1. 查询LIDC-IDRI数据库获取患者扫描
    2. 通过空间邻近性聚类标注（结节分组）
    3. 为每个结节提取代表性补丁
    4. 为下游处理生成结构化结果
    
    Args:
        patient_id: 特定患者ID（例如"LIDC-IDRI-0001"）或None表示所有患者
        clevel: 多放射科医生一致性的共识级别
        pad: 边界框填充
        drop_ends: 是否排除每个结节的首尾切片
        min_area_mm2: 切片过滤的最小面积阈值
        n_neighbors: 最大面积切片周围的邻近切片数量
        mode: 切片选择模式（"intersect"或"union"）
        
    Yields:
        包含患者元数据和提取补丁的字典
    """
    q = pl.query(pl.Scan)
    if patient_id:
        q = q.filter(pl.Scan.patient_id == patient_id)
    
    for scan in q:  # 每个扫描 = 一个患者的CT检查
        nodule_groups = scan.cluster_annotations()  # 通过空间邻近性对标注进行分组
        
        for n_idx, anns in enumerate(nodule_groups):  # 每个结节簇
            res = extract_patches_for_nodule(
                scan, anns,
                clevel=clevel, pad=pad,
                min_area_mm2=min_area_mm2, n_neighbors=n_neighbors,
                mode=mode, drop_ends=drop_ends
            )
            
            yield {
                "patient_id": scan.patient_id,
                "scan_id": scan.id,
                "nodule_index": int(n_idx),
                "bbox": res["bbox"],
                "ann_summary": res["ann_agg"],     # 聚合的标注分数
                "patches": res["patches"],         # 代表性切片补丁
            }

def save_patches_and_metadata(
    out_dir: str,
    patient_id: str,
    metadata_csv: str,
    clevel: float = 0.5,
    pad: List[Tuple[int, int]] = [(25, 25), (25, 25), (0, 0)],
    drop_ends: bool = False,
    min_area_mm2: float = 50.0,
    n_neighbors: int = 2,
    mode: str = "intersect",
):
    """
    提取补丁并保存为图像，同时生成元数据CSV文件。
    
    输出结构：
    - 每个补丁的单独PNG文件（图像和掩码）
    - 包含补丁信息和标注分数的CSV元数据文件
    
    Args:
        out_dir: 补丁图像的输出目录
        patient_id: 目标患者ID（LIDC-IDRI-xxxx格式）
        metadata_csv: 输出CSV元数据文件的路径
        clevel: 多放射科医生一致性的共识级别
        pad: 边界框填充
        drop_ends: 是否排除每个结节的首尾切片
        min_area_mm2: 最小面积阈值（mm^2）
        n_neighbors: 最大面积切片周围的邻近切片数量
        mode: 切片选择模式（"intersect"或"union"）
        
    Returns:
        pd.DataFrame: 生成的包含补丁信息的元数据
    """
    os.makedirs(out_dir, exist_ok=True)
    records = []
    
    for item in extract_patient_level_patches(
        patient_id=patient_id,
        clevel=clevel,
        pad=pad,
        min_area_mm2=min_area_mm2,
        n_neighbors=n_neighbors,
        mode=mode,
        drop_ends=drop_ends
    ):
        sid = item["scan_id"]
        pid = item["patient_id"]
        nid = item["nodule_index"]

        for p in item["patches"]:
            k_global = p["k_global"]
            img = p["img"]
            mask = p["mask"]

            # 生成文件路径
            img_path = os.path.join(out_dir, f"{pid}_n{nid}_k{k_global}_img.png")
            mask_path = os.path.join(out_dir, f"{pid}_n{nid}_k{k_global}_mask.png")

            # 应用CT窗口以获得标准化可视化效果
            img_windowed = window_hu(img)  # 肺窗：过滤噪声和伪影
            
            # 保存图像（避免逐个图像归一化以保持一致性）
            Image.fromarray(img_windowed).save(img_path)
            Image.fromarray((mask.astype(np.uint8) * 255)).save(mask_path)

            # 边界框信息
            nodule_bbox = item["bbox"]
            nodule_bbox = (
                nodule_bbox[0].start, nodule_bbox[1].start,
                nodule_bbox[0].stop, nodule_bbox[1].stop
            )  # (x_min, y_min, x_max, y_max)

            # 构建元数据记录
            record = {
                "scan_id": sid,
                "patient_id": pid,
                "nodule_index": nid,
                "k_global": k_global,
                "img_path": img_path,
                "mask_path": mask_path,
                "area_mm2": p["area"],
                "nodule_bbox_xmin": nodule_bbox[0],
                "nodule_bbox_ymin": nodule_bbox[1],
                "nodule_bbox_xmax": nodule_bbox[2],
                "nodule_bbox_ymax": nodule_bbox[3],
            }
            # 添加聚合的标注分数
            record.update({f"ann_{k}": v for k, v in item["ann_summary"].items()})

            records.append(record)

    # 保存元数据到CSV
    df = pd.DataFrame(records)
    df.to_csv(metadata_csv, index=False)
    print(f"元数据已保存到 {metadata_csv}，总补丁数：{len(df)}")

    return df

In [3]:
# ===============================================================================
# 示例1：处理单个患者（LIDC-IDRI-0078）
# ===============================================================================

out_dir = "./lidc_patches_0078"
os.makedirs(out_dir, exist_ok=True)
target_pid = "LIDC-IDRI-0078"  # 示例患者ID

save_patches_and_metadata(
    out_dir=out_dir,
    metadata_csv=os.path.join(out_dir, "patches_metadata.csv"),
    clevel=0.5,                    # 50%共识阈值
    pad=[(25, 25), (25, 25), (0, 0)],  # x,y方向25像素填充；z方向无填充
    min_area_mm2=50.0,             # 最小结节面积：50 mm^2
    n_neighbors=2,                 # 包含最大面积切片周围±2个切片
    mode="intersect",              # 使用面积和邻近条件的交集
    patient_id=target_pid,
    drop_ends=False                 # 排除每个结节的首尾切片
)

Loading dicom files ... This may take a moment.
Loading dicom files ... This may take a moment.
Loading dicom files ... This may take a moment.
Loading dicom files ... This may take a moment.
Loading dicom files ... This may take a moment.
Loading dicom files ... This may take a moment.
Loading dicom files ... This may take a moment.
元数据已保存到 ./lidc_patches_0078\patches_metadata.csv，总补丁数：16
元数据已保存到 ./lidc_patches_0078\patches_metadata.csv，总补丁数：16


Unnamed: 0,scan_id,patient_id,nodule_index,k_global,img_path,mask_path,area_mm2,nodule_bbox_xmin,nodule_bbox_ymin,nodule_bbox_xmax,nodule_bbox_ymax,ann_subtlety,ann_internalStructure,ann_calcification,ann_sphericity,ann_margin,ann_lobulation,ann_spiculation,ann_texture,ann_malignancy
0,1,LIDC-IDRI-0078,0,24,./lidc_patches_0078\LIDC-IDRI-0078_n0_k24_img.png,./lidc_patches_0078\LIDC-IDRI-0078_n0_k24_mask...,156.325,269,281,353,375,4,1,6,4,3,2,2,5,4
1,1,LIDC-IDRI-0078,0,25,./lidc_patches_0078\LIDC-IDRI-0078_n0_k25_img.png,./lidc_patches_0078\LIDC-IDRI-0078_n0_k25_mask...,184.21,269,281,353,375,4,1,6,4,3,2,2,5,4
2,1,LIDC-IDRI-0078,0,26,./lidc_patches_0078\LIDC-IDRI-0078_n0_k26_img.png,./lidc_patches_0078\LIDC-IDRI-0078_n0_k26_mask...,191.3925,269,281,353,375,4,1,6,4,3,2,2,5,4
3,1,LIDC-IDRI-0078,0,27,./lidc_patches_0078\LIDC-IDRI-0078_n0_k27_img.png,./lidc_patches_0078\LIDC-IDRI-0078_n0_k27_mask...,147.875,269,281,353,375,4,1,6,4,3,2,2,5,4
4,1,LIDC-IDRI-0078,0,28,./lidc_patches_0078\LIDC-IDRI-0078_n0_k28_img.png,./lidc_patches_0078\LIDC-IDRI-0078_n0_k28_mask...,81.12,269,281,353,375,4,1,6,4,3,2,2,5,4
5,1,LIDC-IDRI-0078,1,44,./lidc_patches_0078\LIDC-IDRI-0078_n1_k44_img.png,./lidc_patches_0078\LIDC-IDRI-0078_n1_k44_mask...,87.88,121,324,211,401,5,1,6,4,3,3,2,4,4
6,1,LIDC-IDRI-0078,1,45,./lidc_patches_0078\LIDC-IDRI-0078_n1_k45_img.png,./lidc_patches_0078\LIDC-IDRI-0078_n1_k45_mask...,182.0975,121,324,211,401,5,1,6,4,3,3,2,4,4
7,1,LIDC-IDRI-0078,1,46,./lidc_patches_0078\LIDC-IDRI-0078_n1_k46_img.png,./lidc_patches_0078\LIDC-IDRI-0078_n1_k46_mask...,183.365,121,324,211,401,5,1,6,4,3,3,2,4,4
8,1,LIDC-IDRI-0078,1,47,./lidc_patches_0078\LIDC-IDRI-0078_n1_k47_img.png,./lidc_patches_0078\LIDC-IDRI-0078_n1_k47_mask...,148.2975,121,324,211,401,5,1,6,4,3,3,2,4,4
9,1,LIDC-IDRI-0078,1,48,./lidc_patches_0078\LIDC-IDRI-0078_n1_k48_img.png,./lidc_patches_0078\LIDC-IDRI-0078_n1_k48_mask...,125.905,121,324,211,401,5,1,6,4,3,3,2,4,4


In [4]:
# ===============================================================================
# 示例2：处理另一个患者（LIDC-IDRI-0151）
# ===============================================================================

out_dir = "./lidc_patches_0151"
os.makedirs(out_dir, exist_ok=True)
target_pid = "LIDC-IDRI-0151"  # 另一个示例患者

save_patches_and_metadata(
    out_dir=out_dir,
    metadata_csv=os.path.join(out_dir, "patches_metadata.csv"),
    clevel=0.5,
    pad=[(25, 25), (25, 25), (0, 0)],
    min_area_mm2=50.0,
    n_neighbors=2,
    mode="intersect",
    patient_id=target_pid,
    drop_ends=False
)

Loading dicom files ... This may take a moment.
Loading dicom files ... This may take a moment.
Loading dicom files ... This may take a moment.
元数据已保存到 ./lidc_patches_0151\patches_metadata.csv，总补丁数：3
元数据已保存到 ./lidc_patches_0151\patches_metadata.csv，总补丁数：3


Unnamed: 0,scan_id,patient_id,nodule_index,k_global,img_path,mask_path,area_mm2,nodule_bbox_xmin,nodule_bbox_ymin,nodule_bbox_xmax,nodule_bbox_ymax,ann_subtlety,ann_internalStructure,ann_calcification,ann_sphericity,ann_margin,ann_lobulation,ann_spiculation,ann_texture,ann_malignancy
0,10,LIDC-IDRI-0151,0,74,./lidc_patches_0151\LIDC-IDRI-0151_n0_k74_img.png,./lidc_patches_0151\LIDC-IDRI-0151_n0_k74_mask...,50.777985,283,381,350,452,5,1,6,4,4,3,3,5,4
1,153,LIDC-IDRI-0151,0,71,./lidc_patches_0151\LIDC-IDRI-0151_n0_k71_img.png,./lidc_patches_0151\LIDC-IDRI-0151_n0_k71_mask...,90.560329,265,370,339,446,5,1,6,3,3,3,2,4,4
2,153,LIDC-IDRI-0151,0,72,./lidc_patches_0151\LIDC-IDRI-0151_n0_k72_img.png,./lidc_patches_0151\LIDC-IDRI-0151_n0_k72_mask...,96.572633,265,370,339,446,5,1,6,3,3,3,2,4,4


In [5]:
# ===============================================================================
# 扫描元数据：为数据库中所有患者生成概览表
# ===============================================================================

# 查询LIDC-IDRI数据库中的所有扫描
scans_all = pl.query(pl.Scan)
scans_all = scans_all.all()

# 创建包含扫描信息的元数据DataFrame
scan_metainfo = pd.DataFrame(scans_all, columns=['Scan_obj'])
scan_metainfo['id'] = scan_metainfo['Scan_obj'].apply(lambda x: x.id)
scan_metainfo['patient_id'] = scan_metainfo['Scan_obj'].apply(lambda x: x.patient_id)
scan_metainfo['num_annotations'] = scan_metainfo['Scan_obj'].apply(lambda x: len(x.annotations))

scan_metainfo

Unnamed: 0,Scan_obj,id,patient_id,num_annotations
0,"Scan(id=1,patient_id=LIDC-IDRI-0078)",1,LIDC-IDRI-0078,13
1,"Scan(id=2,patient_id=LIDC-IDRI-0069)",2,LIDC-IDRI-0069,9
2,"Scan(id=3,patient_id=LIDC-IDRI-0079)",3,LIDC-IDRI-0079,4
3,"Scan(id=4,patient_id=LIDC-IDRI-0101)",4,LIDC-IDRI-0101,2
4,"Scan(id=5,patient_id=LIDC-IDRI-0110)",5,LIDC-IDRI-0110,6
...,...,...,...,...
1013,"Scan(id=1014,patient_id=LIDC-IDRI-0641)",1014,LIDC-IDRI-0641,21
1014,"Scan(id=1015,patient_id=LIDC-IDRI-0640)",1015,LIDC-IDRI-0640,7
1015,"Scan(id=1016,patient_id=LIDC-IDRI-0639)",1016,LIDC-IDRI-0639,5
1016,"Scan(id=1017,patient_id=LIDC-IDRI-0638)",1017,LIDC-IDRI-0638,3


In [6]:
# ===============================================================================
# 批处理：为数据库中所有患者提取补丁
# ===============================================================================

metadata_df_all = []
directory = "./lidc_patches_all"
os.makedirs(directory, exist_ok=True)

# 获取所有唯一患者ID的列表
# patient_list = scan_metainfo.patient_id.unique()
patient_list = ["LIDC-IDRI-0078", "LIDC-IDRI-0151", "LIDC-IDRI-0115", "LIDC-IDRI-0054"]  # 取消注释用于子集测试

# 使用进度条逐个处理每个患者
for patient in tqdm(patient_list, desc="正在处理患者"):
    out_dir = os.path.join(directory, patient)
    os.makedirs(out_dir, exist_ok=True)
    target_pid = patient

    # 为当前患者提取补丁
    metadata_df_patient = save_patches_and_metadata(
        out_dir=out_dir,
        metadata_csv=os.path.join(out_dir, "patches_metadata.csv"),
        clevel=0.5,
        pad=[(25, 25), (25, 25), (0, 0)],
        drop_ends=False,
        min_area_mm2=50.0,
        n_neighbors=2,
        mode="intersect",
        patient_id=target_pid
    )

    # 累积所有患者的元数据
    metadata_df_all.append(metadata_df_patient)

# 将所有患者元数据合并到单个DataFrame中
metadata_df = pd.concat(metadata_df_all, ignore_index=True)
metadata_df.to_csv(os.path.join(directory, "all_patches_metadata.csv"), index=False)

print(f"批处理完成！")
print(f"输出目录：{directory}")
print(f"提取的补丁总数：{len(metadata_df)}")
print(f"已处理的患者数量：{len(patient_list)}")

正在处理患者:   0%|          | 0/4 [00:00<?, ?it/s]

Loading dicom files ... This may take a moment.
Loading dicom files ... This may take a moment.
Loading dicom files ... This may take a moment.
Loading dicom files ... This may take a moment.
Loading dicom files ... This may take a moment.
Loading dicom files ... This may take a moment.
Loading dicom files ... This may take a moment.


正在处理患者:  25%|██▌       | 1/4 [00:03<00:11,  3.67s/it]

元数据已保存到 ./lidc_patches_all\LIDC-IDRI-0078\patches_metadata.csv，总补丁数：16
Loading dicom files ... This may take a moment.
Loading dicom files ... This may take a moment.
Loading dicom files ... This may take a moment.


正在处理患者:  50%|█████     | 2/4 [00:05<00:05,  2.78s/it]

元数据已保存到 ./lidc_patches_all\LIDC-IDRI-0151\patches_metadata.csv，总补丁数：3
Loading dicom files ... This may take a moment.
Loading dicom files ... This may take a moment.


正在处理患者:  75%|███████▌  | 3/4 [00:07<00:02,  2.23s/it]

元数据已保存到 ./lidc_patches_all\LIDC-IDRI-0115\patches_metadata.csv，总补丁数：5
Loading dicom files ... This may take a moment.
Loading dicom files ... This may take a moment.


正在处理患者: 100%|██████████| 4/4 [00:09<00:00,  2.35s/it]

元数据已保存到 ./lidc_patches_all\LIDC-IDRI-0054\patches_metadata.csv，总补丁数：5
批处理完成！
输出目录：./lidc_patches_all
提取的补丁总数：29
已处理的患者数量：4



