Tumor-anchored deep feature random forests for out-of-distribution detection in lung cancer segmentation
Research codebase for post-hoc out-of-distribution detection in 3D CT lung tumor segmentation. The repository accompanies our work on using tumor-anchored deep features from pretrained-then-finetuned segmentation backbones, together with lightweight random forests and limited outlier exposure, to detect unsafe inputs at scan level.
Modern segmentation models can remain highly accurate on in-distribution data while failing confidently on clinically mismatched scans. RF-Deep is designed to detect those failures without modifying the underlying segmentation architecture. Instead of relying only on logits or architecture-specific uncertainty heads, RF-Deep extracts hierarchical deep features from regions of interest anchored to predicted tumor segmentations and uses them for downstream OOD detection.
Representative in-distribution and out-of-distribution examples from the paper, illustrating how uncertainty maps can appear concentrated, diffuse, or misaligned across different deployment scenarios.
In the paper, RF-Deep is evaluated on 2,056 CT scans spanning in-distribution lung cancer, near-OOD chest CT datasets, and far-OOD abdominal datasets. It achieves strong near-OOD detection and near-perfect far-OOD detection while remaining simple, lightweight, and architecture-agnostic.
- Post-hoc OOD detection for segmentation, without changing the segmentation network
- Tumor-anchored feature extraction from predicted regions of interest
- Support for RF-Deep, radiomics, Mahalanobis, and logit-based baselines
- Metadata-aware and scanner-aware analysis utilities
- Figure-generation and analysis scripts used for the paper
ood_rfdeep.py: main RF-Deep experiment entrypointextract_features.py: deep-feature extraction from segmentation backbonesood_maha.py: Mahalanobis deep-feature baseline (with optional ReAct/ASH transforms)logit_baselines.py: logit-derived OOD baselines and analysisroi_logit_baselines.py: ROI-restricted logit baselines using the same crop protocol as RF-Deepood_metadata_holdout.py: metadata-stratified holdout evaluationsegmentation_inference.py: segmentation inference utility for supported backbones
This codebase targets Python 3.9.
pip install -r requirements.txt
python -m scripts.smoke_checkSegmentation backbone weights used in the paper are released separately. Place them under models/finetuned_weights/ and models/pretrained_weights/, or override the locations with the FINETUNED_WEIGHTS_ROOT and PRETRAINED_WEIGHTS_ROOT environment variables.
Download here: MSKBox RF-Deep
(If the above link does not work at any point, please open an issue on GitHub and it will be addressed promptly.)
RF-Deep is evaluated on public CT collections; this repository redistributes none of them. Obtain each from its original source:
- NSCLC-Radiomics — TCIA
- NSCLC-Radiogenomics (LRAD) — TCIA
- RSNA-STR Pulmonary Embolism Detection (RSNA PE) — Kaggle
- MIDRC COVID-19 negative CT (MIDRC C19-) — TCIA
- MIDRC COVID-19 positive CT (MIDRC C19+) — TCIA
- KiTS — kits-challenge.org
- PancreasCT — medicaldecathlon.com
- Breast cancer CT — institutional internal dataset, not publicly redistributed
Using the public datasets listed below together with the released checkpoints, users can reproduce the main RF-Deep workflow: segmentation inference, deep-feature extraction, OOD evaluation, and most analysis scripts. Results that depend on the internal breast cancer CT dataset cannot be reproduced from public data alone, since that cohort is not publicly available.
To reproduce the paper workflow, the main steps are feature extraction, optional baseline generation, RF-Deep evaluation, and figure or analysis generation. Most scripts expect data to be indexed through JSON manifests in jsons/.
RF-Deep operates on hierarchical backbone (3D Swin Transformer) features extracted from the segmentation model.
python extract_features.py --model smitFeature caches are typically written as .pkl files under pickle_data/.
To compare RF-Deep against radiomics or logit-based uncertainty baselines:
python logit_baselines.py global --metric maxlogitTrain and evaluate RF-Deep on the ID and OOD datasets:
python ood_rfdeep.py --method lodo --model-name smit --img-size 128 --train-size 20Datasets are expected under data/ by default, but that path is intentionally ignored because it may be machine-specific or a symlink. Shared code resolves paths through project_paths.py, and dataset roots can be overridden with environment variables when needed. Metadata required for scanner analysis and PyCERR-based radiomics lives in metadata_info/.
Dataset manifests under jsons/ are machine-specific and not redistributed; generate them locally with python -m scripts.make_json after obtaining the datasets. See jsons/README.md for the manifest schema.
PROJECT_LAYOUT.md: canonical directory layout and output policyCODE_REFERENCE.md: module-by-module reference for the shared codebaseAGENTS.md: orientation file for agentic AI tools (Claude Code, Codex, Cursor, etc.)models/README.md: model architectures, feature-extraction expectations, and weight directoriespaper_figures/README.md: figure-generation entrypoints used for paper assetsscripts/README.md: reusable operational and analysis scriptsjsons/README.md: dataset manifest conventionsmetadata_info/README.md: synced metadata inputs for scanner, holdout, and radiomics-support analysisresults/README.md: generated analysis outputsradiomics_features/README.md: generated radiomics CSV outputspickle_data/README.md: cached deep-feature pickle outputsexcelrecords/README.md: generated segmentation metric CSV outputs
We sincerely thank the authors of Swin UNETR and SMIT for open-sourcing their code and models. In addition, thanks to these great repositories: PyTorch, MONAI, PyCERR, DeepMind Surface Distance, NiBabel, scikit-learn among others. Finally, AI-driven coding assistants were used in development of parallelization scripts, code cleanup, and relevant technical documentation.
@article{rangnekar2025tumor,
title={Tumor-anchored deep feature random forests for out-of-distribution detection in lung cancer segmentation},
author={Rangnekar, Aneesh and Veeraraghavan, Harini},
journal={arXiv preprint arXiv:2512.08216},
year={2025}
}
