# ezBIDS CLI Demo: UK Biobank Example DICOMs

This notebook demonstrates converting DICOM data to BIDS format using `ezbids-cli`.

We use publicly available example DICOM datasets from UK Biobank:
- T1-weighted structural (MPRAGE)
- T2 FLAIR structural
- Resting-state fMRI
- Diffusion MRI (DWI)
- Susceptibility-weighted imaging (SWI)

## Setup

Install required packages:

In [None]:
# Install ezbids-cli if not already installed
# !pip install ezbids-cli

# For development, install from local source
!pip install -e .. --quiet

In [None]:
import os
import shutil
import zipfile
from pathlib import Path
from urllib.request import urlretrieve

# Create working directories
WORK_DIR = Path("demo_data")
DICOM_DIR = WORK_DIR / "dicoms"
BIDS_DIR = WORK_DIR / "bids_output"

WORK_DIR.mkdir(exist_ok=True)
DICOM_DIR.mkdir(exist_ok=True)

## Download UK Biobank Example DICOMs

UK Biobank provides publicly accessible example DICOM datasets for various MRI sequences.

See: https://biobank.ndph.ox.ac.uk/ukb/label.cgi?id=507

In [None]:
# UK Biobank example DICOM URLs
UKB_EXAMPLES = {
    "t1": "https://biobank.ndph.ox.ac.uk/ukb/ukb/examples/eg_brain_t1.zip",
    "t2flair": "https://biobank.ndph.ox.ac.uk/ukb/ukb/examples/eg_brain_t2flair.zip",
    "rest_fmri": "https://biobank.ndph.ox.ac.uk/ukb/ukb/examples/eg_brain_rest.zip",
    "dwi": "https://biobank.ndph.ox.ac.uk/ukb/ukb/examples/eg_brain_mbdif.zip",
    "swi": "https://biobank.ndph.ox.ac.uk/ukb/ukb/examples/eg_brain_suswt.zip",
}

def download_and_extract(name: str, url: str, target_dir: Path) -> Path:
    """Download and extract a zip file."""
    zip_path = target_dir / f"{name}.zip"
    extract_dir = target_dir / name
    
    if extract_dir.exists():
        print(f"  {name}: already exists, skipping download")
        return extract_dir
    
    print(f"  {name}: downloading...")
    urlretrieve(url, zip_path)
    
    print(f"  {name}: extracting...")
    with zipfile.ZipFile(zip_path, 'r') as zf:
        zf.extractall(extract_dir)
    
    # Clean up zip file
    zip_path.unlink()
    
    return extract_dir

In [None]:
# Download selected example datasets
# For a quick demo, we'll just use T1 and T2 FLAIR
# Uncomment others as needed (note: fMRI is large ~500MB)

print("Downloading UK Biobank example DICOMs...")

datasets_to_download = ["t1", "t2flair"]  # Quick demo
# datasets_to_download = ["t1", "t2flair", "dwi"]  # Include DWI
# datasets_to_download = list(UKB_EXAMPLES.keys())  # All datasets

for name in datasets_to_download:
    download_and_extract(name, UKB_EXAMPLES[name], DICOM_DIR)

print("\nDownload complete!")

In [None]:
# Show what was downloaded
print("Downloaded DICOM directories:")
for d in sorted(DICOM_DIR.iterdir()):
    if d.is_dir():
        dcm_count = len(list(d.rglob("*.dcm"))) + len(list(d.rglob("*") - set(d.rglob("*.dcm"))))
        print(f"  {d.name}/")

## Explore the Schema

ezBIDS CLI uses `bidsschematools` as the source of truth for BIDS compliance.
Let's explore what the schema provides:

In [None]:
from ezbids_cli.schema import (
    get_bids_version,
    get_entity_order,
    get_required_entities,
    validate_suffix_for_datatype,
)

print(f"BIDS Version: {get_bids_version()}")
print(f"\nEntity order (first 10): {get_entity_order()[:10]}")

In [None]:
# Check required entities for different datatypes
examples = [
    ("anat", "T1w"),
    ("anat", "FLAIR"),
    ("anat", "MEGRE"),
    ("func", "bold"),
    ("dwi", "dwi"),
    ("fmap", "epi"),
]

print("Required entities by datatype/suffix:")
print("-" * 40)
for datatype, suffix in examples:
    required = get_required_entities(datatype, suffix)
    print(f"{datatype}/{suffix}: {required or '(none)'}")

In [None]:
# Validate some datatype/suffix combinations
print("Validating datatype/suffix combinations:")
print("-" * 40)

test_cases = [
    ("anat", "T1w"),      # Valid
    ("anat", "bold"),     # Invalid - bold is func
    ("func", "bold"),     # Valid
    ("func", "T1w"),      # Invalid - T1w is anat
    ("dwi", "dwi"),       # Valid
    ("anat", "FLAIR"),    # Valid
]

for datatype, suffix in test_cases:
    is_valid, error = validate_suffix_for_datatype(datatype, suffix)
    status = "✓" if is_valid else "✗"
    msg = "" if is_valid else f" ({error})"
    print(f"{status} {datatype}/{suffix}{msg}")

## Convert DICOMs to BIDS

Now let's convert the UK Biobank example DICOMs to BIDS format.

In [None]:
# Use the CLI to convert
!ezbids convert {DICOM_DIR} --output-dir {BIDS_DIR}

In [None]:
# Show the resulting BIDS structure
def show_tree(path: Path, prefix: str = "", max_depth: int = 3, current_depth: int = 0):
    """Display directory tree."""
    if current_depth >= max_depth:
        return
    
    items = sorted(path.iterdir())
    for i, item in enumerate(items):
        is_last = i == len(items) - 1
        connector = "└── " if is_last else "├── "
        print(f"{prefix}{connector}{item.name}")
        
        if item.is_dir():
            extension = "    " if is_last else "│   "
            show_tree(item, prefix + extension, max_depth, current_depth + 1)

print("BIDS output structure:")
print("=" * 40)
if BIDS_DIR.exists():
    for dataset_dir in BIDS_DIR.iterdir():
        if dataset_dir.is_dir():
            print(f"\n{dataset_dir.name}/")
            show_tree(dataset_dir)
else:
    print("No BIDS output yet - run the conversion cell above")

## Programmatic API

You can also use ezBIDS CLI programmatically:

In [None]:
from ezbids_cli.core.analyzer import Analyzer
from ezbids_cli.convert.converter import BIDSConverter

# Analyze the data
analyzer = Analyzer(DICOM_DIR, work_dir=WORK_DIR / "work")
analysis = analyzer.analyze()

print(f"Found {len(analysis.get('objects', []))} objects")
print(f"Found {len(analysis.get('subjects', []))} subjects")

In [None]:
# Show detected acquisitions
print("Detected acquisitions:")
print("-" * 60)

for obj in analysis.get("objects", []):
    obj_type = obj.get("_type", "unknown")
    entities = obj.get("_entities", {})
    message = obj.get("_message", "")
    
    # Build entity string
    entity_str = ", ".join(f"{k}={v}" for k, v in entities.items() if k not in ["subject", "session"])
    
    print(f"  {obj_type}")
    if entity_str:
        print(f"    entities: {entity_str}")
    if message:
        print(f"    message: {message}")

## Validate the BIDS Dataset

If you have the BIDS validator installed, you can validate the output:

In [None]:
# Check if bids-validator is available
import shutil

if shutil.which("bids-validator"):
    print("Running BIDS validator...")
    !bids-validator {BIDS_DIR}/* --verbose
else:
    print("bids-validator not found. Install with: npm install -g bids-validator")

## Cleanup

Remove downloaded data when done:

In [None]:
# Uncomment to clean up
# shutil.rmtree(WORK_DIR)
# print("Cleaned up demo data")