# Running PANDORA Imaging Regressions on UK Biobank RAP

This notebook demonstrates how to run `fsl_glm` on UKB-RAP using:

- A Docker image (`pandora-regression.tar.gz`)
- The DNAnexus Swiss Army Knife (SAK) app

This runs GLMs on PANDORA imaging sub-modalities (e.g., tfMRI_cope2, warpfield_jacobian) using FSL's `fsl_glm`, handling PANDORA-specific details:
- Extracting `globals.tar` and sub-modality tar files
- Building `design.mat`, `design.con`, and `subjects.txt`
- Validates required parameters or mismatched lengths
- Validating inputs for `fsl_glm`
- Run `fsl_glm` on specified inputs


## Prerequisites

You should have:
- Uploaded `pandora-regression.tar.gz` to your RAP project
- Access to PANDORA files: `globals.tar`, modality tar (e.g. `tfMRI_cope2`, `warpfield_jacobian.tar`), `subjectIDs_union.sample`
- A csv with a participant ID (eid) column and column(s) with regressor(s) of interest. This csv can be created in several ways. The following resources may help: 
    - [Accessing phenotype data](https://dnanexus.gitbook.io/uk-biobank-rap/working-on-the-research-analysis-platform/accessing-data/accessing-phenotypic-data)
    - [Extracting phenotypic data](https://community.ukbiobank.ac.uk/hc/en-gb/articles/26205157055261-Extracting-phenotypic-data)
    - [FMRIB's documentation](https://pages.fmrib.ox.ac.uk/pandora/web/#:~:text=highly%20non%2DGaussian.-Extracting%20non%2Dimaging%20variables%20to%20regress%20into%20the%20PANDORA%20imaging%20data,-These%20instructions%20describe)

You can run this in a RAP instance terminal or locally with the `dx` toolkit installed.

## Defining inputs

Assumed files in your RAP project:
- `Bulk/Brain MRI/PANDORA/globals.tar`
- `Bulk/Brain MRI/PANDORA/tfMRI_cope2.tar`
- `Bulk/Brain MRI/PANDORA/subjectIDs_union.sample`
- `path/to/cohort_variables.csv`

Adapt paths to match your project layout.

## Running a single PANDORA regression with Swiss Army Knife

Run a GLM on a PANDORA modality (here `tfMRI_cope2`) using `pandora_regression.sh`.

Key options:
- `--csv` file containing eid column and regressors
- `--subject-col` index of the participant (eid) column
- `--var-cols` indices for variables
- `--pi` PANDORA submodality (e.g. `tfMRI_cope2`, `warpfield_jacobian`)
- `--pm` mode: `voxel`, `ICA1K`, `ICA10K`
- `--contrast` contrast vector(s). Separate multiple rows with `;`
- `--confounds` confounds (can be `all` or `small`)
- `--name` label for output files
- `--out` output directory inside container (`/tmp`)

Swiss Army Knife will:
1. Mount the Docker image (`-iimage_file`).
2. Mount listed inputs (`-iin`) under the job workspace.
3. Execute the command inside the container.

In [None]:
# Example: regression on tfMRI_cope2 with one regressor of interest and treating the other two regressors as confounds
dx run swiss-army-knife \
  --priority high \
  --instance-type mem3_ssd3_x4 \
  --destination "/PANDORA/outputs/tfMRI_cope2/" \
  -imount_inputs=true \
  -iimage_file="pandora-regression.tar.gz" \
  -iin="Bulk/Brain MRI/PANDORA/globals.tar" \
  -iin="Bulk/Brain MRI/PANDORA/tfMRI_cope2.tar" \
  -iin="Bulk/Brain MRI/PANDORA/subjectIDs_union.sample" \
  -iin="path/to/cohort_variables.csv" \
  -icmd="pandora_regression.sh \
        --csv cohort_variables.csv \
        --subject-col 1 \
        --var-cols 2,3,4 \
        --pi tfMRI_cope2 \
        --pm voxel \
        --subject-ids subjectIDs_union.sample \
        --globals-tar globals.tar \
        --modality-tar tfMRI_cope2.tar \
        --contrast '1 0 0' \
        --confounds all \
        --name demo \
        --out /tmp"

## Using multiple contrasts

`--contrast` can include multiple rows separated by `;`.

If `--var-cols 2,3,4` defines three regressors, the contrast must contain one value per regressor. For example:
- `"1 0 0"` tests the positive association of first regressor with the other two as confounds
- `"-1 0 0"` tests the negative association of first regressor with the other two as confounds
- `"1 0 0; -1 0 0"` runs both contrasts

Each row is written to `design.con`.

In [None]:
# Example with multiple contrasts
dx run swiss-army-knife \
  --priority high \
  --instance-type mem3_ssd3_x4 \
  --destination "/PANDORA/outputs/tfMRI_cope2/" \
  -imount_inputs=true \
  -iimage_file="pandora-regression.tar.gz" \
  -iin="Bulk/Brain MRI/PANDORA/globals.tar" \
  -iin="Bulk/Brain MRI/PANDORA/tfMRI_cope2.tar" \
  -iin="Bulk/Brain MRI/PANDORA/subjectIDs_union.sample" \
  -iin="path/to/cohort_variables.csv" \
  -icmd="pandora_regression.sh \
        --csv cohort_variables.csv \
        --subject-col 1 \
        --var-cols 2,3,4 \
        --pi tfMRI_cope2 \
        --pm voxel \
        --subject-ids subjectIDs_union.sample \
        --globals-tar globals.tar \
        --modality-tar tfMRI_cope2.tar \
        --contrast '1 0 0; -1 0 0' \
        --confounds all \
        --name demo_multi \
        --out /tmp"

## Controlling output maps (T, P, F, PF)

By default, T-statistic maps will be ouput.

Optional flags:
- `--out-p` write P maps
- `--out-f` write F maps
- `--out-pf` write PF maps
- `--no-out-t` disable T maps

Example producing only P and F maps:
```bash
--no-out-t --out-p --out-f
```


## Notes

- A list of confounds for "all" or "small" can be founded after extracting the `globals.tar` file
- FMRIB recommends using the following instances: mem3_ssd3_x4, mem3_ssd3_x8, mem3_ssd2_v2_x16 (32 / 64 / 122GB RAM)
- The time it takes for the regression to run depends on the sub-modality size and instance used. For a single regressor variable, smaller `<submodality>.tar` files can take ~10 minutes to run and larger files can take ~1 hour.
