# TumorImagingBench: Getting Started

Welcome to the TumorImagingBench tutorials! This notebook provides an overview of the framework and quick start instructions.

## What is TumorImagingBench?

TumorImagingBench is a framework for evaluating and comparing foundation model feature extractors for radiomics in medical imaging. It provides:

- **Unified Model Interface**: All models inherit from `BaseModel` with consistent API
- **Dynamic Model Registration**: Add models at runtime without modifying core code
- **Flexible Feature Extraction**: Dataset-specific extractors with configurable paths
- **GPU-Accelerated Inference**: CUDA support for efficient feature extraction
- **Extensible Architecture**: Easy to add new models and datasets

## Installation

```bash
cd /path/to/TumorImagingBench
uv sync
```

## Quick Start: Check Available Models

In [1]:
import sys
sys.path.insert(0, '/home/suraj/Repositories/TumorImagingBench/src')

from tumorimagingbench.models import get_available_extractors, get_extractor

# List all available models
available = get_available_extractors()
print(f"Available models ({len(available)}):")
for name in available:
    
    print(f"  - {name}")

  @autocast(enabled = False)
  @autocast(enabled = False)
In the future `np.bool` will be defined as the corresponding NumPy scalar.
In the future `np.bool` will be defined as the corresponding NumPy scalar.
In the future `np.bool` will be defined as the corresponding NumPy scalar.


✓ Registered extractor: CTClipVitExtractor
✓ Registered extractor: CTFMExtractor
✓ Registered extractor: FMCIBExtractor
✓ Registered extractor: MerlinExtractor
✓ Registered extractor: ModelsGenExtractor
✓ Registered extractor: PASTAExtractor
✓ Registered extractor: SUPREMExtractor
✓ Registered extractor: VISTA3DExtractor
✓ Registered extractor: VocoExtractor
✓ Registered extractor: DummyResNetExtractor
Available models (10):
  - CTClipVitExtractor
  - CTFMExtractor
  - FMCIBExtractor
  - MerlinExtractor
  - ModelsGenExtractor
  - PASTAExtractor
  - SUPREMExtractor
  - VISTA3DExtractor
  - VocoExtractor
  - DummyResNetExtractor


## Quick Start: Load and Test a Model

In [2]:
import torch

# Get the DummyResNetExtractor (a simple example model)
DummyResNet = get_extractor('DummyResNetExtractor')

# Instantiate the model
model = DummyResNet()
print(f"✓ Model instantiated: {model.__class__.__name__}")

# Load pre-trained weights
model.load()
print("✓ Weights loaded")

# Move to GPU if available
device = 'cuda' if torch.cuda.is_available() else 'cpu'
model = model.to(device)
print(f"✓ Model moved to {device}")

In the future `np.bool` will be defined as the corresponding NumPy scalar.
In the future `np.bool` will be defined as the corresponding NumPy scalar.
required package for reader ITKReader is not installed, or the version doesn't match requirement.


✓ Model instantiated: DummyResNetExtractor
✓ Weights loaded
✓ Model moved to cuda


## Quick Start: Extract Features

In [5]:
import numpy as np

# Create dummy input (batch_size=2, channels=1, height=48, width=48, depth=48)
dummy_input = torch.randn(2, 1, 48, 48, 48, device=device)
print(f"Input shape: {dummy_input.shape}")

# Extract features
features = model.forward(dummy_input)
print(f"Output shape: {features.shape}")
print(f"Output type: {type(features)}")
print(f"Output dtype: {features.dtype}")
print(f"\nFeature statistics:")
print(f"  Mean: {features.mean():.6f}")
print(f"  Std: {features.std():.6f}")
print(f"  Min: {features.min():.6f}")
print(f"  Max: {features.max():.6f}")

Input shape: torch.Size([2, 1, 48, 48, 48])
Output shape: torch.Size([2, 512])
Output type: <class 'torch.Tensor'>
Output dtype: torch.float32

Feature statistics:
  Mean: 0.825305
  Std: 0.122632
  Min: 0.457164
  Max: 1.180106


## Framework Architecture

### Component Overview

```
TumorImagingBench/
├── src/tumorimagingbench/
│   ├── models/                 # Foundation model extractors
│   │   ├── base.py             # BaseModel abstract class
│   │   ├── __init__.py         # Model registry
│   │   ├── dummy_resnet.py     # Example model
│   │   └── [other models]      # 10+ foundation models
│   │
│   └── evaluation/             # Feature extraction pipeline
│       ├── base_feature_extractor.py  # Core extraction logic
│       ├── dummy_dataset_feature_extractor.py  # Example dataset
│       └── [dataset extractors]       # One per dataset
│
└── tutorials/                  # Documentation and examples
    ├── 00_getting_started.ipynb
    ├── 01_model_integration.ipynb
    ├── 02_feature_extractor_guide.ipynb
    └── 03_api_reference.ipynb
```

### Model Registry System

All models are registered in `AVAILABLE_EXTRACTORS` dictionary:

```python
AVAILABLE_EXTRACTORS = {
    'DummyResNetExtractor': DummyResNetExtractor,
    'CTClipVitExtractor': CTClipVitExtractor,
    'FMCIBExtractor': FMCIBExtractor,
    # ... more models
}
```

### Feature Extraction Pipeline

1. **Dataset Loading**: `get_split_data()` returns pandas DataFrame
2. **Row Processing**: `preprocess_row()` validates each sample
3. **Model Preprocessing**: `model.preprocess()` loads NIFTI and extracts patch
4. **Feature Extraction**: `model.forward()` extracts features on GPU
5. **Parallel Processing**: Multiprocessing across models
6. **Feature Saving**: Results saved as pickle files

## Key Concepts

### BaseModel Interface

All models must implement three abstract methods:

1. **`load(weights_path=None)`** - Load pre-trained weights
   - Called after `__init__()`
   - Should set model to eval mode
   - Optional parameter for custom weights

2. **`preprocess(x)`** - Convert input dict to tensor
   - Input: `{'image_path': str, 'coordX': float, 'coordY': float, 'coordZ': float}`
   - Output: `torch.Tensor` of shape `(1, H, W, D)` or `(C, H, W, D)`
   - Handles: NIFTI loading, orientation, resampling, cropping, normalization

3. **`forward(x)`** - Extract features
   - Input: `torch.Tensor` of shape `(batch_size, channels, height, width, depth)`
   - Output: `numpy.ndarray` of shape `(batch_size, feature_dim)`
   - Must be on CPU
   - No gradients computed

### Input Data Format

Models expect physical (mm) coordinates for the region of interest:

In [6]:
# Example input format (what get_split_data() should return)
example_row = {
    'image_path': '/path/to/ct_scan.nii.gz',  # NIFTI file
    'coordX': 100.5,                           # X centroid in mm
    'coordY': 150.3,                           # Y centroid in mm
    'coordZ': 200.1,                           # Z centroid in mm
    'label': 0,                                # Optional: label for downstream tasks
}

print("Example input format:")
for key, value in example_row.items():
    print(f"  {key}: {value}")

Example input format:
  image_path: /path/to/ct_scan.nii.gz
  coordX: 100.5
  coordY: 150.3
  coordZ: 200.1
  label: 0


## Common Tasks

### Task 1: List Available Models

In [7]:
from tumorimagingbench.models import get_available_extractors

models = get_available_extractors()
print(f"Available models: {models}")

Available models: ['CTFMExtractor', 'FMCIBExtractor', 'MerlinExtractor', 'ModelsGenExtractor', 'PASTAExtractor', 'SUPREMExtractor', 'VISTA3DExtractor', 'VocoExtractor', 'DummyResNetExtractor']


### Task 2: Extract Features for Specific Models and for preloaded datasets

In [8]:
# Using command line
# python nsclc_radiomics_feature_extractor.py \
#   --output features/nsclc.pkl \
#   --models DummyResNetExtractor CTClipVitExtractor

### Task 3: Register a Custom Model at Runtime

In [9]:
from tumorimagingbench.models import register_extractor, BaseModel
import torch.nn as nn

# Define your custom model
class MyCustomModel(BaseModel):
    def __init__(self):
        super().__init__()
        self.linear = nn.Linear(48*48*48, 512)
    
    def load(self, weights_path=None):
        pass
    
    def preprocess(self, x):
        # Placeholder - in reality would use get_transforms()
        return torch.randn(1, 1, 48, 48, 48)
    
    def forward(self, x):
        x = x.reshape(x.shape[0], -1)
        x = self.linear(x)
        return x.cpu().numpy()

# Register it
register_extractor('MyCustomModel', MyCustomModel)

# Now it's available
print(f"Available models: {get_available_extractors()}")

✓ Registered extractor: MyCustomModel
Available models: ['CTFMExtractor', 'FMCIBExtractor', 'MerlinExtractor', 'ModelsGenExtractor', 'PASTAExtractor', 'SUPREMExtractor', 'VISTA3DExtractor', 'VocoExtractor', 'DummyResNetExtractor', 'MyCustomModel']


## Next Steps

1. **To integrate a new model**: Go to [01_model_integration.ipynb](./01_model_integration.ipynb)
2. **To add a new dataset**: Go to [02_feature_extractor_guide.ipynb](./02_feature_extractor_guide.ipynb)
3. **For API details**: Check [03_api_reference.ipynb](./03_api_reference.ipynb)

## Documentation Structure

- **00_getting_started.ipynb** (this file) - Overview and quick start
- **01_model_integration.ipynb** - How to add a new foundation model
- **02_feature_extractor_guide.ipynb** - How to add a new dataset
- **03_api_reference.ipynb** - Complete API documentation

## Key Directories

- **Models**: `src/tumorimagingbench/models/`
- **Feature Extractors**: `src/tumorimagingbench/evaluation/`
- **Tutorials**: `tutorials/` (you are here)
- **Examples**: `tutorials/examples/`