# 1. What is this?

The objective is to use context clues, file metadata, and deep interrogation to identify the broad array of components of an imaging dataset, their structures, uses, and their derivatives. In some cases information can be found by simply looking at file names in a directory, in others, it will require identifying the proper tools for extracting metadata from binary file formats.

Using a unix terminal (not file explorer), crawl through a...

* BIDS Input Directory
* C-PAC Minimal Preprocessed Output Directory
  
... to determine, for each file, the...

* **Contents**. The level of detail should indicate understanding rather than be exhaustive. We do not need to see every bit of every file, rather, a conceptual description of their contents.
* **Purpose**. For example, what is its role in analysis or what information does it uniquely provide?
* **Relationship(s) to other files**.


## Setup

For this challenge I'm using a Jupyter notebook with the [bash kernel](https://github.com/takluyver/bash_kernel) to record my steps and take notes. (I'm using the bash kernel rather than a python kernel since this is supposed to be a command line exercise.)

Before starting the challenge, I downloaded the file bundle from [here](https://osf.io/syqvc/files/osfstorage) and unzipped the data archives using `unzip`.

I'll store any intermediate outputs in a local directory, relative to the challenge root directory.

In [1]:
root="/Users/clane/Projects/CMI-Onboarding"

ls -lht $root

total 16
drwxr-xr-x  4 clane  staff   128B Oct  4 16:26 notebooks
drwxr-xr-x  9 clane  staff   288B Oct  4 15:41 outputs
-rw-r--r--  1 clane  staff   396B Oct  4 09:51 comments.md
drwxr-xr-x  6 clane  staff   192B Oct  3 09:49 data
-rw-r--r--  1 clane  staff   228B Oct  3 09:13 README.md


In [2]:
outdir="${root}/outputs"

if [[ ! -d $outdir ]]; then
    mkdir $outdir
fi

echo "output dir:" $outdir

output dir: /Users/clane/Projects/CMI-Onboarding/outputs


## Overview

Let's start by seeing what was included in the challenge file bundle. I'm using the Unix `tree` tool to get an overview of all the contents.

In [3]:
cd ${root}/data

In [4]:
tree --filelimit 10

.
├── C-PAC_Derivatives
│   ├── abcd-options
│   │   └── sub-NDARAD481FXF_ses-1
│   │       └── Derivatives
│   │           ├── anat  [96 entries exceeds filelimit, not opening dir]
│   │           └── func  [32 entries exceeds filelimit, not opening dir]
│   └── preproc
│       ├── cpac_data_config_idx-14_2022-08-25T15-15-35Z.yml
│       ├── cpac_pipeline_config_2022-08-25T15-15-35Z.yml
│       ├── cpac_pipeline_config_2022-08-25T15-15-35Z_min.yml
│       ├── log
│       │   ├── cpac_individual_timing_cpac_preproc.csv
│       │   └── pipeline_cpac_preproc
│       │       └── sub-0025429_ses-1
│       │           ├── callback.log
│       │           ├── callback.log.html
│       │           ├── callback.log.resource_overusage.txt
│       │           ├── pypeline.log
│       │           ├── sub-0025429_ses-1_expectedOutputs.yml
│       │           └── subject_info_sub-0025429_ses-1.pkl
│       ├── output
│       │   └── cpac_cpac_preproc
│       │       └── sub-0025429_ses-1
│       │  

: 2

## Raw data inspection

At a high level, we have two directories `Raw_Data` and `C-PAC_Derivatives`.

`Raw_Data` seems to contain (unsurprisingly) raw data for two MRI subjects in [Brain Imaging Data Structure (BIDS) format](https://bids-specification.readthedocs.io/en/stable/).

The two subjects belong to two different studies (or possibly two sites within the same study): `HNU_1` and `Site-SI`.

After a bit of googling, I found [this reference](http://fcon_1000.projects.nitrc.org/indi/CoRR/html/hnu_1.html) to the Hangzhou Normal University (HNU_1) dataset, which contains a subject 0025429.
Likewise, I found [this reference](http://fcon_1000.projects.nitrc.org/indi/cmi_healthy_brain_network/downloads/downloads_MRI_R1_1.html) via [here](http://fcon_1000.projects.nitrc.org/indi/cmi_healthy_brain_network/) to the Healthy Brain Network R1 release, which contains a subject NDARAD481FXF tested at the Staten Island (SI) site.

Based on file names and the BIDS conventions, we can tell that each subject directory contains a T1-weighted structural MR image (`anat/*_T1w.nii.gz`) and one BOLD resting state run (`func/*_task-rest_bold.nii.gz`), each in gzipped Nifti format.

We can look more into the files using the Freesurfer utility `mri_info`, which reads the Nifti file header

### Structural

The structural images each have (X, Y, Z) dimensions (176, 256, 256) with approximately 1mm^3 isotropic voxels. The data are 32 bit floating point values.

In [None]:
mri_info Raw_Data/HNU_1/sub-0025429/ses-1/anat/sub-0025429_ses-1_run-1_T1w.nii.gz | head -n 14

In [5]:
mri_info Raw_Data/Site-SI/sub-NDARAD481FXF/anat/sub-NDARAD481FXF_T1w.nii.gz | head -n 14

Volume information for Raw_Data/Site-SI/sub-NDARAD481FXF/anat/sub-NDARAD481FXF_T1w.nii.gz
          type: nii
    dimensions: 176 x 256 x 256
   voxel sizes: 0.999977, 1.000000, 1.000000
          type: FLOAT (3)
           fov: 175.996
           dof: 1
        xstart: -88.0, xend: 88.0
        ystart: -128.0, yend: 128.0
        zstart: -128.0, zend: 128.0
            TR: 0.00 msec, TE: 0.00 msec, TI: 0.00 msec, flip angle: 0.00 degrees
       nframes: 1
       PhEncDir: UNKNOWN
       FieldStrength: 0.000000


### Functional

The functional volume series for `HNU_1/sub-0025429` has (X, Y, Z, T) dimension (64, 64, 43, 300) with approximately 3.4mm^3 isotropic voxels and a 2 sec TR.

The series for `Site-SI/sub-NDARAD481FXF` is longer and higher resolution. The dimensions are (78, 78, 54, 420) with approximately 2.5mm^3 voxels and a 1.45 sec TR.

The data in both images are 16 bit signed integer (short) values.

In [6]:
mri_info Raw_Data/HNU_1/sub-0025429/ses-1/func/sub-0025429_ses-1_task-rest_run-1_bold.nii.gz | head -n 14

Volume information for Raw_Data/HNU_1/sub-0025429/ses-1/func/sub-0025429_ses-1_task-rest_run-1_bold.nii.gz
          type: nii
    dimensions: 64 x 64 x 43 x 300
   voxel sizes: 3.437500, 3.437500, 3.400000
          type: SHORT (4)
           fov: 220.000
           dof: 1
        xstart: -110.0, xend: 110.0
        ystart: -110.0, yend: 110.0
        zstart: -73.1, zend: 73.1
            TR: 2000.00 msec, TE: 0.00 msec, TI: 0.00 msec, flip angle: 0.00 degrees
       nframes: 300
       PhEncDir: UNKNOWN
       FieldStrength: 0.000000


In [7]:
mri_info Raw_Data/Site-SI/sub-NDARAD481FXF/func/sub-NDARAD481FXF_task-rest_bold.nii.gz | head -n 14

Volume information for Raw_Data/Site-SI/sub-NDARAD481FXF/func/sub-NDARAD481FXF_task-rest_bold.nii.gz
          type: nii
    dimensions: 78 x 78 x 54 x 420
   voxel sizes: 2.461539, 2.461539, 2.500000
          type: SHORT (4)
           fov: 192.000
           dof: 1
        xstart: -96.0, xend: 96.0
        ystart: -96.0, yend: 96.0
        zstart: -67.5, zend: 67.5
            TR: 1450.00 msec, TE: 0.00 msec, TI: 0.00 msec, flip angle: 0.00 degrees
       nframes: 420
       PhEncDir: UNKNOWN
       FieldStrength: 0.000000


We can report the min, max, and mean values of each series using the FSL utility `fslstats`.

In [8]:
fslstats Raw_Data/HNU_1/sub-0025429/ses-1/func/sub-0025429_ses-1_task-rest_run-1_bold.nii.gz -R -m

0.000000 3364.000000 349.502490 


In [9]:
fslstats Raw_Data/Site-SI/sub-NDARAD481FXF/func/sub-NDARAD481FXF_task-rest_bold.nii.gz -R -m

0.000000 2293.000000 216.797673 


Finally, we can visualize the raw functional data using the FSL `slices` utility.

In [10]:
slices \
    Raw_Data/HNU_1/sub-0025429/ses-1/func/sub-0025429_ses-1_task-rest_run-1_bold.nii.gz \
    -o ${outdir}/sub-0025429_ses-1_task-rest_run-1_bold_slices.gif

slices \
    Raw_Data/Site-SI/sub-NDARAD481FXF/func/sub-NDARAD481FXF_task-rest_bold.nii.gz \
    -o ${outdir}/sub-NDARAD481FXF_task-rest_bold_slices.gif



**sub-0025429_ses-1_task-rest_run-1_bold_slices**

![sub-0025429_ses-1_task-rest_run-1_bold_slices](../outputs/sub-0025429_ses-1_task-rest_run-1_bold_slices.gif)

**sub-NDARAD481FXF_task-rest_bold_slices**

![sub-NDARAD481FXF_task-rest_bold_slices](../outputs/sub-NDARAD481FXF_task-rest_bold_slices.gif)

Note that the second image is larger. This is due to the higher resolution acquisition.

### Functional metadata

In addition to the functional volume series, `sub-NDARAD481FXF/func` contains a JSON file with metadata for the run. The metadata lists the scanner manufacturer, sequence parameters, type of head coil used, slice timing parameters, etc.

In [11]:
cat Raw_Data/Site-SI/sub-NDARAD481FXF/func/sub-NDARAD481FXF_task-rest_bold.json

{
    "InPlanePhaseEncodingDirectionDICOM": "COL", 
    "ImageComments": "Unaliased_MB3_PE3", 
    "ProcedureStepDescription": "CMI_HBN", 
    "DeviceSerialNumber": "26121", 
    "AcquisitionMatrixPE": 78, 
    "ImageOrientationPatientDICOM": [
        1, 
        0, 
        0, 
        0, 
        0.96363, 
        -0.267238
    ], 
    "EffectiveEchoSpacing": 0.000550001, 
    "TotalReadoutTime": 0.04235, 
    "ManufacturersModelName": "Avanto", 
    "ProtocolName": "Resting_State_2.5mm", 
    "BandwidthPerPixelPhaseEncode": 23.31, 
    "RepetitionTime": 1.45, 
    "MagneticFieldStrength": 1.5, 
    "PhaseEncodingSteps": 78, 
    "MRAcquisitionType": "2D", 
    "SliceThickness": 2.5, 
    "DwellTime": 2.7e-06, 
    "TxRefAmp": 244.945, 
    "MultibandAccelerationFactor": 3, 
    "DerivedVendorReportedEchoSpacing": 0.000550001, 
    "SAR": 0.0384271, 
    "PixelBandwidth": 2374, 
    "ScanningSequence": "EP", 
    "Manufacturer": "Siemens", 
    "ConversionSoftware": "dcm2niix", 
   

### Fieldmap

Subject `/sub-NDARAD481FXF` also contains an `fmap` directory containing images that can be used to construct a [field map](https://lcni.uoregon.edu/kb-articles/kb-0003).

In turn, the field map can be used to correct for inhomogeneities in the primary (B0) magnetic field, which cause signal dropout and displacement. (Note that lost signal cannot be recovered. But displaced signal can be somewhat "put back", and correcting distortion can improve functional -> structural registration.)

Some details on how field map data are represented in BIDS are available [here](https://bids-specification.readthedocs.io/en/stable/04-modality-specific-files/01-magnetic-resonance-imaging-data.html#fieldmap-data)

In [12]:
ls -lht Raw_Data/Site-SI/sub-NDARAD481FXF/fmap

total 1440
-rw-r--r--@ 1 clane  staff   1.6K Aug 19 16:44 sub-NDARAD481FXF_magnitude2.json
-rw-r--r--@ 1 clane  staff   215K Aug 19 16:44 sub-NDARAD481FXF_magnitude2.nii.gz
-rw-r--r--@ 1 clane  staff   267K Aug 19 16:44 sub-NDARAD481FXF_phasediff.nii.gz
-rw-r--r--@ 1 clane  staff   1.6K Aug 19 16:44 sub-NDARAD481FXF_magnitude1.json
-rw-r--r--@ 1 clane  staff   223K Aug 19 16:44 sub-NDARAD481FXF_magnitude1.nii.gz
-rw-r--r--@ 1 clane  staff   1.6K Aug 19 16:44 sub-NDARAD481FXF_phasediff.json


We can look at the field map images as well using `slices`.

In [13]:
for f in Raw_Data/Site-SI/sub-NDARAD481FXF/fmap/*.nii.gz; do
    # get basename of form sub-NDARAD481FXF_magnitude1
    basename=${f##*/}
    basename=${basename%.nii.gz}
    slices $f -o ${outdir}/${basename}_slices.gif
done

| **sub-NDARAD481FXF_magnitude1** | **sub-NDARAD481FXF_magnitude2** | **sub-NDARAD481FXF_phasediff** |
| --- | --- | --- |
| ![sub-NDARAD481FXF_magnitude1_slices](../outputs/sub-NDARAD481FXF_magnitude1_slices.gif) | ![sub-NDARAD481FXF_magnitude2_slices](../outputs/sub-NDARAD481FXF_magnitude2_slices.gif) | ![sub-NDARAD481FXF_phasediff_slices](../outputs/sub-NDARAD481FXF_phasediff_slices.gif) |

The first two images are "magnitude" images.

The image on the right is the "phase difference" image. Note the bright values near the frontal sinuses and in the anterior temporal lobes. These areas typically suffer the most fMRI signal dropout.

> **TODO**: review what exactly the magnitude and phasediff images are, and how they're used to correct distortion.

## C-PAC Derivatives

Now I'll look at the `C-PAC_Derivatives` directory.

In [14]:
tree -L 3 C-PAC_Derivatives

C-PAC_Derivatives
├── abcd-options
│   └── sub-NDARAD481FXF_ses-1
│       └── Derivatives
└── preproc
    ├── cpac_data_config_idx-14_2022-08-25T15-15-35Z.yml
    ├── cpac_pipeline_config_2022-08-25T15-15-35Z.yml
    ├── cpac_pipeline_config_2022-08-25T15-15-35Z_min.yml
    ├── log
    │   ├── cpac_individual_timing_cpac_preproc.csv
    │   └── pipeline_cpac_preproc
    ├── output
    │   └── cpac_cpac_preproc
    └── working
        ├── cpac_sub-0025429_ses-1
        └── fcp-indi

11 directories, 4 files


I'll start with the `preproc` subdirectory, which follows the [C-PAC output directory format](https://fcp-indi.github.io/docs/latest/user/output_dir)

### Configs

#### Data config

The file `cpac_data_config_idx-14_2022-08-25T15-15-35Z.yml` is a [CPAC data configuration YAML file](https://fcp-indi.github.io/docs/latest/user/subject_list_config). It contains paths to the original raw data on [AWS S3](https://registry.opendata.aws/fcp-indi/), details on scanning parameters, etc.


(Nb: I got the `head` + `tail` command from [here](https://stackoverflow.com/a/8624829))

In [15]:
(head -n 15; echo "..."; tail -n 3) < C-PAC_Derivatives/preproc/cpac_data_config_idx-14_2022-08-25T15-15-35Z.yml

- anat: s3://fcp-indi/data/Projects/CORR/RawDataBIDS/HNU_1/sub-0025429/ses-1/anat/sub-0025429_ses-1_run-1_T1w.nii.gz
  creds_path: null
  func:
    rest_run-1:
      scan: s3://fcp-indi/data/Projects/CORR/RawDataBIDS/HNU_1/sub-0025429/ses-1/func/sub-0025429_ses-1_task-rest_run-1_bold.nii.gz
      scan_parameters:
        AcquisitionDuration: '10:00'
        AcquisitionMatrix: 64x64
        EchoTime: 0.03
        FieldofViewDimensions: 220x220
        FieldofViewShape: Rectangle
        FlipAngle: 90
        Instructions: '''Relax and remain still with your eyes open. Do not fall asleep
          and do not think about anything in particular.'''
        MagneicFieldSrengh: 3
...
  site_id: HNU_1
  subject_id: sub-0025429
  unique_id: ses-1


#### Pipeline config

The file `cpac_pipeline_config_2022-08-25T15-15-35Z_min.yml` is a CPAC pipeline config file. The config should be for the default ["preproc"](https://fcp-indi.github.io/docs/latest/user/pipelines/preconfig#preproc-default-without-derivatives) pipeline, based on the directory naming. The "preproc" pipeline follows the same pre-processing steps as the default pipeline, but leaves out computation of [derivatives](https://fcp-indi.github.io/docs/latest/user/pipelines/derivatives) such as Voxel-mirrored Homotopic Connectivity (VMHC) and network centrality.

> **TODO**: What are these derivatives?

We can check that the config is as we expect by downloading the base preproc config for the latest release and comparing the two files. We'll use `git diff` to get a nice diff output.

In [16]:
rm ${outdir}/pipeline_config_preproc.yml* 2>/dev/null

wget -P ${outdir}/ \
    https://raw.githubusercontent.com/FCP-INDI/C-PAC/9453398fefcdc83bea2b82727c52b5d140a69989/CPAC/resources/configs/pipeline_config_preproc.yml

--2022-10-04 16:27:09--  https://raw.githubusercontent.com/FCP-INDI/C-PAC/9453398fefcdc83bea2b82727c52b5d140a69989/CPAC/resources/configs/pipeline_config_preproc.yml
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.109.133, 185.199.110.133, 185.199.111.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.109.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 7620 (7.4K) [text/plain]
Saving to: ‘/Users/clane/Projects/CMI-Onboarding/outputs/pipeline_config_preproc.yml’


2022-10-04 16:27:09 (20.4 MB/s) - ‘/Users/clane/Projects/CMI-Onboarding/outputs/pipeline_config_preproc.yml’ saved [7620/7620]



In [17]:
git diff --no-index ${outdir}/pipeline_config_preproc.yml \
    C-PAC_Derivatives/preproc/cpac_pipeline_config_2022-08-25T15-15-35Z_min.yml

[1mdiff --git a/Users/clane/Projects/CMI-Onboarding/outputs/pipeline_config_preproc.yml b/C-PAC_Derivatives/preproc/cpac_pipeline_config_2022-08-25T15-15-35Z_min.yml[m
[1mindex 9a465d2..b46253a 100644[m
[1m--- a/Users/clane/Projects/CMI-Onboarding/outputs/pipeline_config_preproc.yml[m
[1m+++ b/C-PAC_Derivatives/preproc/cpac_pipeline_config_2022-08-25T15-15-35Z_min.yml[m
[36m@@ -1,7 +1,7 @@[m
 %YAML 1.1[m
 ---[m
 # CPAC Pipeline Configuration YAML file[m
[31m-# Version 1.8.4[m
[32m+[m[32m# Version 1.8.4.dev[m
 #[m
 # http://fcp-indi.github.io for more info.[m
 #[m
[36m@@ -9,21 +9,47 @@[m
 [m
 FROM: default[m
 [m
[31m-[m
 pipeline_setup: [m
[32m+[m
   # Name for this pipeline configuration - useful for identification.[m
   pipeline_name: cpac_preproc[m
 [m
[32m+[m[32m  output_directory:[m[41m [m
[32m+[m
[32m+[m[32m    # Directory where C-PAC should write out processed data, logs, and crash reports.[m
[32m+[m[32m    # - If running in a con

: 1

The differences between the base and modified configs are not too bad. Although some sections with differences give me pause, e.g. `symmetric_registration` and `segmentation`. I would be more at ease if the only differences were run-specific (e.g. paths, resource requirements).

My guess is there is a work-in-progress update to the config (v1.8.4.dev) that was used to process these data but isn't on github yet.

#### Full pipeline config

The file `cpac_pipeline_config_2022-08-25T15-15-35Z.yml` appears to be a more complete config. I guess it includes options that are implicit defaults in the minimal config. Indeed, `git diff` shows that almost all of the differences between the files are additions.

In [18]:
git diff --no-index --stat \
    C-PAC_Derivatives/preproc/cpac_pipeline_config_2022-08-25T15-15-35Z_min.yml \
    C-PAC_Derivatives/preproc/cpac_pipeline_config_2022-08-25T15-15-35Z.yml

 ... cpac_pipeline_config_2022-08-25T15-15-35Z.yml} | 1481 [32m+++++++++++++++++++[m[31m-[m
 1 file changed, 1480 insertions(+), 1 deletion(-)


: 1

### Logs

Next we'll look at the `preproc/log` directory

In [19]:
tree C-PAC_Derivatives/preproc/log

C-PAC_Derivatives/preproc/log
├── cpac_individual_timing_cpac_preproc.csv
└── pipeline_cpac_preproc
    └── sub-0025429_ses-1
        ├── callback.log
        ├── callback.log.html
        ├── callback.log.resource_overusage.txt
        ├── pypeline.log
        ├── sub-0025429_ses-1_expectedOutputs.yml
        └── subject_info_sub-0025429_ses-1.pkl

2 directories, 7 files


The `cpac_individual_timing_cpac_preproc.csv` file is a CSV containing summary timing and status for the processing run.

In [20]:
cat C-PAC_Derivatives/preproc/log/cpac_individual_timing_cpac_preproc.csv

Pipeline,Cores_Per_Subject,Simultaneous_Subjects,Number_of_Subjects,Start_Time,End_Time,Elapsed_Time_(minutes),Status
cpac_preproc,30,1,1,2022-08-25_15:15:36,2022-08-25_16:12:56,57,Complete


Inside `log/pipeline_cpac_preproc/sub-0025429_ses-1` there is more detailed logging. The primary C-PAC log is `pypeline.log`, which contains logging events from [NiPype](https://nipype.readthedocs.io/en/latest/index.html).

In [21]:
(head -n 30; echo "..."; tail -n 30) <  C-PAC_Derivatives/preproc/log/pipeline_cpac_preproc/sub-0025429_ses-1/pypeline.log

220825-15:15:37,82 nipype.workflow INFO:
	 
    Run command: run /home/ubuntu /output participant --save_working_dir --skip_bids_validator --n_cpus 30 --mem_gb 45 --data_config_file /configs/data_config.yml --preconfig preproc --num_ants_threads 15 --participant_ndx 14

    C-PAC version: 1.8.4

    Copyright (C) 2022  C-PAC Developers.
    
    This program comes with ABSOLUTELY NO WARRANTY. This is free software,
    and you are welcome to redistribute it under certain conditions. For
    details, see https://fcp-indi.github.io/docs/v1.8.4/license or the COPYING and
    COPYING.LESSER files included in the source code.

    Setting maximum number of cores per participant to 30
    Setting number of participants at once to 1
    Setting OMP_NUM_THREADS to 1
    Setting MKL_NUM_THREADS to 1
    Setting ANTS/ITK thread usage to 15
    Maximum potential number of cores that might be used during this run: 30


220825-15:22:17,865 nipype.workflow INFO:
	 Connecting pipeline blocks:
	 - ana

The `callback.log` is a [JSON lines](https://jsonlines.org/) file containing timing and resource information for each processing step in the pipeline.

In [22]:
tail -n 2 C-PAC_Derivatives/preproc/log/pipeline_cpac_preproc/sub-0025429_ses-1/callback.log

{"id": "cpac_sub-0025429_ses-1.carpet_seg_220.carpet_plot", "hash": "a8e140638ddeb11b90549c6ea64bdaae", "start": "2022-08-25T20:12:50.019635", "finish": "2022-08-25T20:12:52.652067", "runtime_threads": 2, "runtime_memory_gb": 0.8076591494140625, "estimated_memory_gb": 4.0, "num_threads": 1}
{"id": "cpac_sub-0025429_ses-1.sinker_space-template_desc-preproc-1_bold_182", "hash": "34df9b23661ed143c3c4432921e74193", "start": "2022-08-25T20:12:51.955055", "finish": "2022-08-25T20:12:52.161116", "runtime_threads": 0, "runtime_memory_gb": 0.2379608154296875, "estimated_memory_gb": 2.0, "num_threads": 1}


We can read it quickly using [pandas](https://pandas.pydata.org/) and check (what I assume is) the maximum memory used by a processing step.

> **TODO**: find out what the columns actually mean.

(Nb: I took some hints from [here](https://trstringer.com/python-in-shell-script/) for how to run python in bash.)

In [23]:
callback_log="C-PAC_Derivatives/preproc/log/pipeline_cpac_preproc/sub-0025429_ses-1/callback.log"

cmd=$(cat <<EOF
import pandas as pd
pd.set_option('display.max_columns', None)
df = pd.read_json("${callback_log}", lines=True)
print(df.tail(2))
print("Shape:", df.shape)
print(f'Max mem: {df["runtime_memory_gb"].max():.3f}')
EOF
)

python -c "$cmd"

                                                     id  \
1015  cpac_sub-0025429_ses-1.carpet_seg_220.carpet_plot   
1016  cpac_sub-0025429_ses-1.sinker_space-template_d...   

                                  hash                       start  \
1015  a8e140638ddeb11b90549c6ea64bdaae  2022-08-25T20:12:50.019635   
1016  34df9b23661ed143c3c4432921e74193  2022-08-25T20:12:51.955055   

                          finish  runtime_threads  runtime_memory_gb  \
1015  2022-08-25T20:12:52.652067              2.0           0.807659   
1016  2022-08-25T20:12:52.161116              0.0           0.237961   

      estimated_memory_gb  num_threads  
1015                  4.0          1.0  
1016                  2.0          1.0  
Shape: (1017, 8)
Max mem: 4.549


The `callback.log.html` file represents some of the same data in an html format.

![callback html](../outputs/callback_screenshot.png)

> **TODO**: what does this visualization mean?

The `callback.log.resource_overusage.txt` shows which processing steps used more resources than expected. There were 4 total (out of 1017).

In [24]:
head -n 10 C-PAC_Derivatives/preproc/log/pipeline_cpac_preproc/sub-0025429_ses-1/callback.log.resource_overusage.txt

The following nodes used excessive resources:
---------------------------------------------

cpac_sub-0025429_ses-1
  .nuisance_regressors_Regressor-2_136
  .Functional_2mm_flirt
      **memory_gb**
        runtime > estimated
        3.080265044921875 > 1.8138461538461539



The `sub-0025429_ses-1_expectedOutputs.yml` file lists glob patterns for expected outputs. At the end of processing, C-PAC checks whether each expected output was indeed generated.

In [25]:
head -n 5 C-PAC_Derivatives/preproc/log/pipeline_cpac_preproc/sub-0025429_ses-1/sub-0025429_ses-1_expectedOutputs.yml

anat:
- desc-brain*_T1w
- desc-preproc*_T1w
- desc-reorient*_T1w
- from-T1w*_to-template*_mode-image*_desc-linear*_xfm


### Working

The `working` directory contains scratch folders for each processing step and a copy of the S3 data in their original (deeply nested) directory structure.

In [26]:
tree --filelimit 20 C-PAC_Derivatives/preproc/working

C-PAC_Derivatives/preproc/working
├── cpac_sub-0025429_ses-1  [141 entries exceeds filelimit, not opening dir]
└── fcp-indi
    └── data
        └── Projects
            └── CORR
                └── RawDataBIDS
                    └── HNU_1
                        └── sub-0025429
                            └── ses-1
                                ├── anat
                                │   └── sub-0025429_ses-1_run-1_T1w.nii.gz
                                └── func
                                    └── sub-0025429_ses-1_task-rest_run-1_bold.nii.gz

11 directories, 2 files


: 2

### Outputs

The `preproc/output` directory contains final outputs for the anatomical (`anat`) and functional (`func`) processing pipelines.

Specific outputs for `anat` include:

- brain-extracted structural T1 image
- registration transforms between the subject's anatomical and standard template space
- binary and probabilistic tissue segmentation masks

Specific outputs for `func` include:

- motion and SNR quality control data and figures
- registration transforms between subject functional and subject anatomical space
- estimated participant motion parameters
- motion corrected and nuisance regressed functional volume series

In [27]:
tree C-PAC_Derivatives/preproc/output

C-PAC_Derivatives/preproc/output
└── cpac_cpac_preproc
    └── sub-0025429_ses-1
        ├── anat
        │   ├── sub-0025429_ses-1_desc-brain_T1w.json
        │   ├── sub-0025429_ses-1_desc-brain_T1w.nii.gz
        │   ├── sub-0025429_ses-1_desc-preproc_T1w.json
        │   ├── sub-0025429_ses-1_desc-preproc_T1w.nii.gz
        │   ├── sub-0025429_ses-1_desc-reorient_T1w.json
        │   ├── sub-0025429_ses-1_desc-reorient_T1w.nii.gz
        │   ├── sub-0025429_ses-1_from-T1w_to-template_mode-image_desc-linear_xfm.json
        │   ├── sub-0025429_ses-1_from-T1w_to-template_mode-image_desc-linear_xfm.nii.gz
        │   ├── sub-0025429_ses-1_from-T1w_to-template_mode-image_desc-nonlinear_xfm.json
        │   ├── sub-0025429_ses-1_from-T1w_to-template_mode-image_desc-nonlinear_xfm.nii.gz
        │   ├── sub-0025429_ses-1_from-T1w_to-template_mode-image_xfm.json
        │   ├── sub-0025429_ses-1_from-T1w_to-template_mode-image_xfm.nii.gz
        │   ├── sub-0025429_ses-1_from-template_to-T

### `abcd-options`

The `abcd-options` directory under `C-PAC_Derivatives` contains just the final outputs, i.e. derivatives, for HBN/sub-NDARAD481FXF_ses-1 using the [`abcd-options` pipeline](https://github.com/FCP-INDI/C-PAC/blob/main/CPAC/resources/configs/pipeline_config_abcd-options.yml), adapted from the [ABCD study](https://www.biorxiv.org/content/10.1101/2021.07.09.451638v1.full).

A major difference between `abcd-options` and `preproc` is the inclusion of surface based processing. As a result, `abcd-options` derivatives include:

- reconstructed meshes for the subject's pial and white matter surfaces
- surface metrics such as thickness and curvature
- [cortical surface parcellations](https://surfer.nmr.mgh.harvard.edu/fswiki/CorticalParcellation)
- functional data mapped to HCP 32k "fsLR" surface space

In [28]:
tree C-PAC_Derivatives/abcd-options

C-PAC_Derivatives/abcd-options
└── sub-NDARAD481FXF_ses-1
    └── Derivatives
        ├── anat
        │   ├── sub-NDARAD481FXF_1_atlas-DesikanKilliany_space-fsLR_den-164k_dlabel.json
        │   ├── sub-NDARAD481FXF_1_atlas-DesikanKilliany_space-fsLR_den-164k_dlabel.nii
        │   ├── sub-NDARAD481FXF_1_atlas-DesikanKilliany_space-fsLR_den-32k_dlabel.json
        │   ├── sub-NDARAD481FXF_1_atlas-DesikanKilliany_space-fsLR_den-32k_dlabel.nii
        │   ├── sub-NDARAD481FXF_1_atlas-Destrieux_space-fsLR_den-164k_dlabel.json
        │   ├── sub-NDARAD481FXF_1_atlas-Destrieux_space-fsLR_den-164k_dlabel.nii
        │   ├── sub-NDARAD481FXF_1_atlas-Destrieux_space-fsLR_den-32k_dlabel.json
        │   ├── sub-NDARAD481FXF_1_atlas-Destrieux_space-fsLR_den-32k_dlabel.nii
        │   ├── sub-NDARAD481FXF_1_desc-brain_T1w.json
        │   ├── sub-NDARAD481FXF_1_desc-brain_T1w.nii.gz
        │   ├── sub-NDARAD481FXF_1_desc-preproc_T1w.json
        │   ├── sub-NDARAD481FXF_1_desc-preproc_T1w.nii.