# ezBIDS CLI Demo: UK Biobank Example DICOMs

This notebook demonstrates converting DICOM data to BIDS format using `ezbids-cli`.

We'll showcase:
- **Multi-modality conversion**: T1w, FLAIR, T2*w (SWI), resting-state fMRI, and DWI
- **Automatic detection**: How ezbids identifies datatypes, suffixes, and entities
- **Two-stage workflow**: Analyze data, review, then apply conversion
- **Configuration files**: Reusable settings for consistent conversions
- **Validation**: BIDS compliance checking

Data source: [UK Biobank Example DICOMs](https://biobank.ndph.ox.ac.uk/ukb/label.cgi?id=507)

## 1. Setup

Install dependencies and configure working directories.

In [None]:
# Install ezbids-cli (uncomment one option)
# !pip install ezbids-cli  # From PyPI
!pip install -e .. --quiet  # From local source (development)

# Install visualization dependencies
!pip install nibabel matplotlib --quiet

In [None]:
import json
import shutil
import zipfile
from pathlib import Path
from urllib.request import urlretrieve

import matplotlib.pyplot as plt
import nibabel as nib
import numpy as np

# Working directories
WORK_DIR = Path("demo_data")
DICOM_DIR = WORK_DIR / "dicoms"
BIDS_DIR = WORK_DIR / "bids_output"
ANALYSIS_DIR = WORK_DIR / "analysis"

for d in [WORK_DIR, DICOM_DIR, ANALYSIS_DIR]:
    d.mkdir(exist_ok=True)

print(f"Working directory: {WORK_DIR.absolute()}")

## 2. Download UK Biobank Example DICOMs

UK Biobank provides publicly accessible example DICOM datasets. We'll download five modalities to demonstrate multi-modal conversion:

| Modality | BIDS Type | Description |
|----------|-----------|-------------|
| T1 | anat/T1w | Structural MPRAGE |
| T2 FLAIR | anat/FLAIR | Fluid-attenuated inversion recovery |
| SWI | anat/T2starw | Susceptibility-weighted imaging |
| Resting fMRI | func/bold | Resting-state functional MRI |
| DWI | dwi/dwi | Diffusion-weighted imaging |

In [None]:
# UK Biobank example DICOM URLs
UKB_EXAMPLES = {
    "t1": "https://biobank.ndph.ox.ac.uk/ukb/ukb/examples/eg_brain_t1.zip",
    "t2flair": "https://biobank.ndph.ox.ac.uk/ukb/ukb/examples/eg_brain_t2flair.zip",
    "rest_fmri": "https://biobank.ndph.ox.ac.uk/ukb/ukb/examples/eg_brain_rest.zip",
    "dwi": "https://biobank.ndph.ox.ac.uk/ukb/ukb/examples/eg_brain_mbdif.zip",
    "swi": "https://biobank.ndph.ox.ac.uk/ukb/ukb/examples/eg_brain_suswt.zip",
}

def download_and_extract(name: str, url: str, target_dir: Path) -> Path:
    """Download and extract a zip file."""
    zip_path = target_dir / f"{name}.zip"
    extract_dir = target_dir / name
    
    if extract_dir.exists():
        print(f"  {name}: already exists, skipping")
        return extract_dir
    
    print(f"  {name}: downloading...", end=" ", flush=True)
    urlretrieve(url, zip_path)
    
    print("extracting...", end=" ", flush=True)
    with zipfile.ZipFile(zip_path, 'r') as zf:
        zf.extractall(extract_dir)
    zip_path.unlink()
    
    file_count = sum(1 for f in extract_dir.rglob("*") if f.is_file())
    print(f"done ({file_count} files)")
    return extract_dir

In [None]:
# Download all five modalities
datasets_to_download = ["t1", "t2flair", "swi", "rest_fmri", "dwi"]

print("Downloading UK Biobank example DICOMs...\n")
for name in datasets_to_download:
    download_and_extract(name, UKB_EXAMPLES[name], DICOM_DIR)

print("\nDownload complete!")

In [None]:
# Show what we downloaded
print("Downloaded DICOM directories:")
print("=" * 40)
for d in sorted(DICOM_DIR.iterdir()):
    if d.is_dir():
        file_count = sum(1 for f in d.rglob("*") if f.is_file())
        print(f"  {d.name}/ ({file_count} files)")

## 3. One-Command Conversion

The simplest way to use ezbids-cli: a single command that handles everything.

```bash
ezbids convert <input_dir> --output-dir <output_dir>
```

This will:
1. Discover all DICOM files
2. Convert to NIfTI using dcm2niix
3. Identify datatypes and suffixes (T1w, FLAIR, bold, dwi, etc.)
4. Extract BIDS entities (task, direction, run, etc.)
5. Organize into BIDS directory structure
6. Validate the result

In [None]:
# Clean previous output if it exists
if BIDS_DIR.exists():
    shutil.rmtree(BIDS_DIR)

# Run the conversion
!ezbids convert {DICOM_DIR} --output-dir {BIDS_DIR}

In [None]:
# Display the resulting BIDS structure
def show_tree(path: Path, prefix: str = "", max_depth: int = 4, current_depth: int = 0):
    """Display directory tree."""
    if current_depth >= max_depth:
        return
    
    items = sorted(path.iterdir())
    for i, item in enumerate(items):
        is_last = i == len(items) - 1
        connector = "\u2514\u2500\u2500 " if is_last else "\u251c\u2500\u2500 "
        print(f"{prefix}{connector}{item.name}")
        
        if item.is_dir():
            extension = "    " if is_last else "\u2502   "
            show_tree(item, prefix + extension, max_depth, current_depth + 1)

print("BIDS Output Structure")
print("=" * 50)
if BIDS_DIR.exists():
    show_tree(BIDS_DIR)
else:
    print("No output yet - run the conversion cell above")

### Visualize the Converted Data

Let's view the converted NIfTI files:

In [None]:
# Find all NIfTI files in the BIDS output
nifti_files = sorted(BIDS_DIR.rglob("*.nii.gz"))

print("Converted NIfTI files:")
print("-" * 60)
for f in nifti_files:
    # Show path relative to BIDS dir
    rel_path = f.relative_to(BIDS_DIR)
    # Get image dimensions
    img = nib.load(f)
    shape = img.shape
    print(f"  {rel_path}  {shape}")

In [None]:
def plot_slices(nifti_path, title=None, vol_idx=0):
    """Plot orthogonal slices from a NIfTI file."""
    img = nib.load(nifti_path)
    data = img.get_fdata()
    
    # Handle 4D data by selecting a volume
    if data.ndim == 4:
        data = data[..., vol_idx]
        title_suffix = f" (vol {vol_idx})"
    else:
        title_suffix = ""
    
    # Get middle slices
    mid_x, mid_y, mid_z = [s // 2 for s in data.shape[:3]]
    
    fig, axes = plt.subplots(1, 3, figsize=(12, 4))
    
    axes[0].imshow(np.rot90(data[mid_x, :, :]), cmap='gray')
    axes[0].set_title('Sagittal')
    axes[0].axis('off')
    
    axes[1].imshow(np.rot90(data[:, mid_y, :]), cmap='gray')
    axes[1].set_title('Coronal')
    axes[1].axis('off')
    
    axes[2].imshow(np.rot90(data[:, :, mid_z]), cmap='gray')
    axes[2].set_title('Axial')
    axes[2].axis('off')
    
    if title:
        fig.suptitle(f"{title}{title_suffix}", fontsize=12)
    
    plt.tight_layout()
    plt.show()

In [None]:
# View the T1w anatomical scan
t1_files = [f for f in nifti_files if "T1w" in f.name]
if t1_files:
    plot_slices(t1_files[0], title=t1_files[0].name)

In [None]:
# View the FLAIR scan
flair_files = [f for f in nifti_files if "FLAIR" in f.name]
if flair_files:
    plot_slices(flair_files[0], title=flair_files[0].name)

In [None]:
# View the SWI scan (T2*-weighted)
swi_files = [f for f in nifti_files if "T2starw" in f.name]
if swi_files:
    plot_slices(swi_files[0], title=swi_files[0].name)

In [None]:
# View the resting-state fMRI (first volume of 4D)
bold_files = [f for f in nifti_files if "bold" in f.name]
if bold_files:
    plot_slices(bold_files[0], title=bold_files[0].name, vol_idx=0)

In [None]:
# View the DWI scan (first volume of 4D)
dwi_files = [f for f in nifti_files if f.name.endswith("_dwi.nii.gz")]
if dwi_files:
    plot_slices(dwi_files[0], title=dwi_files[0].name, vol_idx=0)

## 4. Understanding What Was Detected

Let's use the programmatic API to understand exactly what ezbids detected and how it made its decisions.

In [None]:
from ezbids_cli.core.analyzer import Analyzer

# Run analysis (without conversion) to inspect what was detected
analyzer = Analyzer(DICOM_DIR, ANALYSIS_DIR)
analysis = analyzer.analyze()

print(f"Analysis Summary")
print("=" * 50)
print(f"Subjects found: {len(analysis.get('subjects', []))}")
print(f"Acquisitions found: {len(analysis.get('objects', []))}")

In [None]:
# Show detailed detection results for each acquisition
print("Detected Acquisitions")
print("=" * 70)

for i, obj in enumerate(analysis.get("objects", []), 1):
    datatype = obj.get("datatype", "unknown")
    suffix = obj.get("suffix", "unknown")
    bids_type = f"{datatype}/{suffix}" if datatype and suffix else "unidentified"
    entities = obj.get("entities", {})
    series_desc = obj.get("SeriesDescription", "N/A")
    
    print(f"\n[{i}] {bids_type}")
    print(f"    Series: {series_desc}")
    
    # Show extracted entities
    entity_items = [(k, v) for k, v in entities.items() if k not in ["subject", "session"] and v]
    if entity_items:
        entity_str = ", ".join(f"{k}={v}" for k, v in entity_items)
        print(f"    Entities: {entity_str}")
    
    # Show any messages or warnings
    if obj.get("message"):
        print(f"    Note: {obj['message']}")

In [None]:
# Examine the JSON sidecar for one acquisition
# This shows the metadata extracted from DICOM headers

# Find a JSON sidecar file
json_files = sorted(BIDS_DIR.rglob("*.json"))
# Filter to acquisition sidecars (not dataset_description.json, etc.)
sidecar_files = [f for f in json_files if f.parent.name in ["anat", "func", "dwi", "fmap", "perf"]]

if sidecar_files:
    example_sidecar = sidecar_files[0]
    print(f"Example JSON sidecar: {example_sidecar.name}")
    print("=" * 50)
    
    with open(example_sidecar) as f:
        sidecar_data = json.load(f)
    
    # Show key fields
    key_fields = [
        "Modality", "MagneticFieldStrength", "Manufacturer", "ManufacturersModelName",
        "RepetitionTime", "EchoTime", "FlipAngle", "SliceThickness",
        "PhaseEncodingDirection", "EffectiveEchoSpacing",
    ]
    
    for field in key_fields:
        if field in sidecar_data:
            print(f"  {field}: {sidecar_data[field]}")

## 5. Two-Stage Workflow

For more control, you can separate analysis from conversion:

1. **Analyze**: Detect and classify acquisitions, save to JSON
2. **Review**: Inspect/modify the analysis (manually or via TUI)
3. **Apply**: Convert using the reviewed analysis

This is useful when you want to:
- Review what will be converted before committing
- Manually correct misidentified acquisitions
- Exclude certain scans from conversion

In [None]:
# Clean previous analysis output to avoid duplicates
if ANALYSIS_DIR.exists():
    shutil.rmtree(ANALYSIS_DIR)
ANALYSIS_DIR.mkdir(exist_ok=True)

# Stage 1: Analyze and save results
!ezbids analyze {DICOM_DIR} --output-dir {ANALYSIS_DIR}

In [None]:
# Show the analysis output files
print("Analysis output files:")
print("-" * 40)
for f in sorted(ANALYSIS_DIR.iterdir()):
    if f.is_file():
        size_kb = f.stat().st_size / 1024
        print(f"  {f.name} ({size_kb:.1f} KB)")

In [None]:
# Load and inspect the analysis JSON
analysis_file = ANALYSIS_DIR / "ezBIDS_core.json"

if analysis_file.exists():
    with open(analysis_file) as f:
        saved_analysis = json.load(f)
    
    print("Analysis file structure:")
    print("-" * 40)
    for key in saved_analysis.keys():
        value = saved_analysis[key]
        if isinstance(value, list):
            print(f"  {key}: [{len(value)} items]")
        elif isinstance(value, dict):
            print(f"  {key}: {{...}}")
        else:
            print(f"  {key}: {value}")

In [None]:
# You could modify the analysis here programmatically
# For example, to exclude an acquisition or change its type:
#
# saved_analysis["objects"][0]["_exclude"] = True
# saved_analysis["objects"][1]["_datatype"] = "anat"
# saved_analysis["objects"][1]["_suffix"] = "T2w"
#
# with open(analysis_file, "w") as f:
#     json.dump(saved_analysis, f, indent=2)

print("Analysis ready for review.")
print("\nTo use the interactive TUI reviewer, run:")
print(f"  ezbids review {analysis_file}")

In [None]:
# Stage 2: Apply the analysis to create BIDS output
BIDS_DIR_STAGED = WORK_DIR / "bids_staged"
if BIDS_DIR_STAGED.exists():
    shutil.rmtree(BIDS_DIR_STAGED)

!ezbids apply {analysis_file} {BIDS_DIR_STAGED}

In [None]:
# Verify the staged output matches
print("Staged BIDS Output")
print("=" * 50)
if BIDS_DIR_STAGED.exists():
    show_tree(BIDS_DIR_STAGED)

## 6. Configuration-Based Conversion

For reproducible conversions across multiple datasets, you can use a YAML configuration file.

This is useful for:
- Applying the same settings to multiple subjects/sessions
- Sharing conversion settings with collaborators
- Documenting exactly how data was converted

In [None]:
# Generate a config file from existing analysis
!ezbids init-config {DICOM_DIR} --output {WORK_DIR}/my_config.yaml

In [None]:
# View the generated config
config_file = WORK_DIR / "my_config.yaml"
if config_file.exists():
    print("Generated configuration:")
    print("=" * 50)
    print(config_file.read_text())

In [None]:
# Example: Create a custom config
custom_config = """
version: "1.0"

dataset:
  name: "UK Biobank Demo"
  bids_version: "1.9.0"
  authors:
    - "ezBIDS CLI Demo"

# Series matching rules (applied in order)
series:
  # T1-weighted structural
  - match:
      series_description: ".*T1.*MPRAGE.*"
    datatype: anat
    suffix: T1w
  
  # FLAIR
  - match:
      series_description: ".*FLAIR.*"
    datatype: anat
    suffix: FLAIR
  
  # Resting-state fMRI
  - match:
      series_description: ".*rest.*fMRI.*"
    datatype: func
    suffix: bold
    entities:
      task: rest
  
  # DWI
  - match:
      series_description: ".*dMRI.*"
    datatype: dwi
    suffix: dwi

output:
  link_mode: hardlink  # hardlink, symlink, or copy
  validate: true
"""

custom_config_path = WORK_DIR / "custom_config.yaml"
custom_config_path.write_text(custom_config)
print("Custom config written to:", custom_config_path)

In [None]:
# Convert using the custom config
BIDS_DIR_CONFIG = WORK_DIR / "bids_from_config"
if BIDS_DIR_CONFIG.exists():
    shutil.rmtree(BIDS_DIR_CONFIG)

!ezbids convert {DICOM_DIR} --config {custom_config_path} --output-dir {BIDS_DIR_CONFIG}

## 7. Validation

ezBIDS CLI includes the official `bids-validator` Python package for checking BIDS compliance.

In [None]:
from ezbids_cli.validation.validator import validate_dataset, print_validation_result

# The BIDS dataset is in the 'dataset' subdirectory
bids_dataset_dir = BIDS_DIR / "dataset"

print("Validating BIDS Dataset")
print("=" * 50)
print(f"Path: {bids_dataset_dir}\n")

result = validate_dataset(bids_dataset_dir)
print_validation_result(result, verbose=True)

In [None]:
# You can also validate from the command line:
# !ezbids validate {BIDS_DIR}

## 8. Inspect Dataset Metadata

Let's look at the dataset-level files that were generated.

In [None]:
# dataset_description.json
bids_dataset_dir = BIDS_DIR / "dataset"
desc_file = bids_dataset_dir / "dataset_description.json"
if desc_file.exists():
    print("dataset_description.json")
    print("=" * 40)
    with open(desc_file) as f:
        print(json.dumps(json.load(f), indent=2))
else:
    print(f"File not found: {desc_file}")

In [None]:
# participants.tsv
participants_file = bids_dataset_dir / "participants.tsv"
if participants_file.exists():
    print("participants.tsv")
    print("=" * 40)
    print(participants_file.read_text())
else:
    print(f"File not found: {participants_file}")

## 9. Cleanup

Remove downloaded data and outputs when done.

In [None]:
# Uncomment to clean up all demo data
# shutil.rmtree(WORK_DIR)
# print("Cleaned up demo data")

# Show disk usage
import subprocess
result = subprocess.run(["du", "-sh", str(WORK_DIR)], capture_output=True, text=True)
print(f"Demo data size: {result.stdout.strip()}")