# QSIPrep Basics: Diffusion MRI Preprocessing

This notebook demonstrates how to use the QSIPrep runner for diffusion MRI preprocessing.

## Overview

QSIPrep configures pipelines for processing diffusion-weighted MRI (dMRI) data. It performs:
- Head motion correction
- Susceptibility distortion correction
- Eddy current correction
- Spatial normalization
- Gradient direction correction

## Prerequisites

- Docker installed and running
- BIDS-formatted diffusion data
- FreeSurfer license file (required)
- Sufficient disk space for working directory

## Setup

In [1]:
import json
from pathlib import Path

from voxelops import (
    QSIPrepDefaults,
    QSIPrepInputs,
    run_qsiprep,
)

## Define Paths

In [None]:
# Input paths -- update these to match your data
bids_dir = Path("/data/bids/")
participant = "01"
fs_license = Path("/opt/freesurfer/license.txt")

# Output paths (optional)
output_dir = Path("/data/derivatives/qsiprep/")
work_dir = Path("/data/work/qsiprep/")

## Basic Usage

### Option 1: Use Default Configuration

In [None]:
# Create inputs
inputs = QSIPrepInputs(
    bids_dir=bids_dir,
    participant=participant,
    output_dir=output_dir,
    work_dir=work_dir,
)

# Run with defaults (8 cores, 16GB RAM)
result = run_qsiprep(
    inputs,
    fs_license=fs_license,  # Required
)

print(f"Success: {result['success']}")
print(f"Duration: {result['duration_human']}")

In [None]:
raise

In [None]:
# Run with more resources
result = run_qsiprep(
    inputs,
    fs_license=fs_license,
    nprocs=16,  # Use 16 cores
    mem_gb=32,  # Allocate 32GB RAM
)

print(f"Success: {result['success']}")
print(f"Used {result['config'].nprocs} cores and {result['config'].mem_gb}GB RAM")

### Option 3: Custom Configuration

In [None]:
# Run with more resources
result = run_qsiprep(
    inputs,
    fs_license=fs_license,
    nprocs=16,       # Use 16 cores
    mem_mb=32000,    # Allocate 32GB RAM
)

print(f"Success: {result['success']}")
print(f"Used {result['config'].nprocs} cores and {result['config'].mem_mb}MB RAM")

## Configuration Priority

Parameters can be specified in multiple ways with the following priority (highest to lowest):

1. **Keyword arguments** (highest priority)
2. **Environment variables** (if enabled)
3. **Config object**
4. **Defaults** (lowest priority)

In [None]:
# Create custom brain bank configuration
config = QSIPrepDefaults(
    nprocs=24,
    mem_mb=64000,
    output_resolution=1.5,  # 1.5mm isotropic
    anatomical_template=["MNI152NLin2009cAsym", "T1w"],  # Multiple spaces
    longitudinal=True,  # Longitudinal processing
    skip_bids_validation=False,  # Always validate BIDS
    fs_license=fs_license,
    docker_image="pennlinc/qsiprep:latest",
)

# Run with custom config
result = run_qsiprep(inputs, config)

print(f"Success: {result['success']}")

## Inspect Execution Record

In [None]:
# Config says 16 cores, but keyword arg overrides to 32
config = QSIPrepDefaults(
    nprocs=16,
    mem_mb=32000,
    fs_license=fs_license,
)

result = run_qsiprep(
    inputs,
    config,
    nprocs=32,  # This wins!
)

print(f"Actually used {result['config'].nprocs} cores")

## Check Expected Outputs

In [None]:
# Execution metadata
print("Execution Details:")
print(f"  Tool: {result['tool']}")
print(f"  Participant: {result['participant']}")
print(f"  Start: {result['start_time']}")
print(f"  End: {result['end_time']}")
print(f"  Duration: {result['duration_human']} ({result['duration_seconds']:.1f}s)")
print(f"  Success: {result['success']}")
print(f"  Exit code: {result['exit_code']}")

# Configuration used
print("\nConfiguration:")
config_used = result["config"]
print(f"  Cores: {config_used.nprocs}")
print(f"  Memory: {config_used.mem_mb}MB")
print(f"  Output resolution: {config_used.output_resolution}mm")
print(f"  Anatomical template: {config_used.anatomical_template}")
print(f"  Longitudinal: {config_used.longitudinal}")
print(f"  Docker image: {config_used.docker_image}")

## View HTML Report

QSIPrep generates a comprehensive HTML quality control report:

In [None]:
from IPython.display import IFrame

# Display HTML report in notebook
if outputs.html_report.exists():
    IFrame(src=str(outputs.html_report), width=900, height=600)
else:
    print(f"HTML report not found at: {outputs.html_report}")

## List Output Files

In [None]:
# List all preprocessed files
if outputs.participant_dir.exists():
    print(f"Files in {outputs.participant_dir}:\n")

    for f in sorted(outputs.participant_dir.rglob("*")):
        if f.is_file():
            rel_path = f.relative_to(outputs.participant_dir)
            size_mb = f.stat().st_size / (1024 * 1024)
            print(f"  {rel_path} ({size_mb:.1f} MB)")
else:
    print("Participant directory not found")

## Error Handling

In [None]:
from voxelops.exceptions import (
    FreeSurferLicenseError,
    InputValidationError,
    ProcedureExecutionError,
)

try:
    result = run_qsiprep(
        inputs,
        fs_license=fs_license,
        nprocs=16,
    )
    print(f"Success: {result['success']}")

except InputValidationError as e:
    print(f"Input validation failed: {e}")
    print("  - Check that BIDS directory exists")
    print("  - Check that participant exists in BIDS directory")

except FreeSurferLicenseError as e:
    print(f"FreeSurfer license error: {e}")
    print(
        "  - Obtain license from https://surfer.nmr.mgh.harvard.edu/registration.html"
    )

except ProcedureExecutionError as e:
    print(f"Execution failed: {e}")
    print(f"  - Check log file: {result.get('log_file')}")
    print("  - Check stderr output")
    if "stderr" in result:
        print(f"\nStderr (last 500 chars):\n{result['stderr'][-500:]}")

except Exception as e:
    print(f"Unexpected error: {e}")

## Batch Processing

In [None]:
import time

# Get list of participants from BIDS directory
participant_dirs = sorted(bids_dir.glob("sub-*"))
participants = [d.name.replace("sub-", "") for d in participant_dirs if d.is_dir()]

print(f"Found {len(participants)} participants: {participants}\n")

# Process each participant
results = []
config = QSIPrepDefaults(
    nprocs=16,
    mem_gb=32,
    fs_license=fs_license,
)

for participant in participants:
    print(f"Processing participant {participant}...")

    inputs = QSIPrepInputs(
        bids_dir=bids_dir,
        participant=participant,
    )

    try:
        start = time.time()
        result = run_qsiprep(inputs, config)
        elapsed = time.time() - start

        results.append(result)
        print(f"  ✓ Success in {elapsed/60:.1f} minutes\n")

    except Exception as e:
        print(f"  ✗ Failed: {e}\n")
        results.append(
            {
                "participant": participant,
                "success": False,
                "error": str(e),
            }
        )

# Summary
successful = sum(1 for r in results if r.get("success"))
total_time = sum(
    r.get("duration_seconds", 0) for r in results if "duration_seconds" in r
)

print(f"\n{'='*60}")
print(f"Processed {len(results)} participants:")
print(f"  ✓ Successful: {successful}")
print(f"  ✗ Failed: {len(results) - successful}")
print(f"  Total time: {total_time/3600:.1f} hours")
print(f"  Average time: {total_time/len(results)/60:.1f} minutes per participant")

## Save Execution Records to Database

The execution records are plain Python dicts, making them easy to store in any database:

In [None]:
import time

# Get list of participants from BIDS directory
participant_dirs = sorted(bids_dir.glob("sub-*"))
participants = [d.name.replace("sub-", "") for d in participant_dirs if d.is_dir()]

print(f"Found {len(participants)} participants: {participants}\n")

# Process each participant
results = []
config = QSIPrepDefaults(
    nprocs=16,
    mem_mb=32000,
    fs_license=fs_license,
)

for participant in participants:
    print(f"Processing participant {participant}...")

    inputs = QSIPrepInputs(
        bids_dir=bids_dir,
        participant=participant,
    )

    try:
        start = time.time()
        result = run_qsiprep(inputs, config)
        elapsed = time.time() - start

        results.append(result)
        print(f"  Success in {elapsed/60:.1f} minutes\n")

    except Exception as e:
        print(f"  Failed: {e}\n")
        results.append(
            {
                "participant": participant,
                "success": False,
                "error": str(e),
            }
        )

# Summary
successful = sum(1 for r in results if r.get("success"))
total_time = sum(
    r.get("duration_seconds", 0) for r in results if "duration_seconds" in r
)

print(f"\n{'='*60}")
print(f"Processed {len(results)} participants:")
print(f"  Successful: {successful}")
print(f"  Failed: {len(results) - successful}")
print(f"  Total time: {total_time/3600:.1f} hours")
if results:
    print(f"  Average time: {total_time/len(results)/60:.1f} minutes per participant")

## Reproduce a Previous Run

Since the exact Docker command is stored, you can reproduce any run:

In [None]:
# Load a previous execution record
previous_record = json.loads((records_dir / "01.json").read_text())

# The exact command is stored
cmd = previous_record["command"]
print("To reproduce this exact run, execute:")
print()
print(" ".join(cmd))

## Next Steps

After QSIPrep preprocessing:

1. Review the HTML QC report
2. Check the preprocessed images
3. Proceed to reconstruction with QSIRecon (see `03_qsirecon_basics.ipynb`)

## Tips

- **Resource allocation**: QSIPrep is memory-intensive. Allocate at least 16GB RAM.
- **Working directory**: Use fast local storage (SSD) for the work directory.
- **Longitudinal processing**: Enable for subjects with multiple sessions.
- **BIDS validation**: Always validate your BIDS dataset before running.
- **FreeSurfer license**: Required even if not running FreeSurfer reconstruction.
- **Output spaces**: Choose spaces based on downstream analysis needs.
- **Docker version**: Pin a specific image version for reproducibility.