# Subbundles Part 1: Data

**Subbundle** - a subgroup of streamlines with a set of common properties

Part 1: Download subject diffusion imaging data from dataset and run tractometry

##### <span style="color:red">**NOTE: Need to think about how to store results of expirements**</span>

Currently:

- Rerunning will overwrite files

- Do not store any metadata about what was run

- Save all artifacts and images

- Does not output `tck` files

#####  <span style="color:red">NOTE: several important variables are defined in `utils.py`</span>

- `dataset_names` - these are supported datesets

- `dataset_homes` - where datasets are stored under `AFQ_data`

- `dataset_subjects` - which subjects to include for each dataset

- `bundle_names` - which bundles to include

##### <span style="color:red">NOTE: assumes [MrTrix](https://mrtrix.readthedocs.io/en/latest/index.html) [`mrinfo`](https://mrtrix.readthedocs.io/en/latest/reference/commands/mrinfo.html#mrinfo) and [AWS-CLI](https://docs.aws.amazon.com/cli/index.html) [`aws`](https://docs.aws.amazon.com/cli/latest/reference/#aws) are installed</span>

In [None]:
from utils import *

import os.path as op

import pandas as pd
import nibabel as nib

from AFQ import api
import AFQ.data as afd

import matplotlib.pyplot as plt
import plotly
from IPython.display import Image

## Datasets

0. Simulated data (*optional*)

1. High Angular Resolution Diffusion MRI (`HARDI`) [<sup>1</sup>](https://figshare.com/articles/dataset/pyAFQ_Stanford_HARDI_tractography_and_mapping/3409882) [<sup>2</sup>](https://searchworks.stanford.edu/view/yx282xq2090)

2. Human Connectome Project ([`HCP`](http://www.humanconnectomeproject.org)) [<sup>1</sup>](https://www.humanconnectome.org/study/hcp-young-adult/document/1200-subjects-data-release)

3. Human Connectome Project Retest (`HCP_retest`)

In [None]:
compare_test_retest = True

dataset_name = 'HCP_retest'
subjects = get_subjects_medium(dataset_name)
print(dataset_name, subjects)

Confirm dwi data exists for each subject in dataset

In [None]:
dwi_files = {}

for subject in subjects:
    dwi_files[subject] = get_subject_session_dwi_file(dataset_name, get_subject_session_home(get_dmriprep_home(dataset_name), f'sub-{subject}'), f'sub-{subject}')

with pd.option_context('display.max_colwidth', -1):
    display(pd.Series(dwi_files).to_frame('dwi file'))

retrieve_data = False

if (not os.path.isdir(get_dmriprep_home(dataset_name)) or
    not np.all([os.path.exists(dwi_file) for dwi_file in dwi_files.values()])):
    retrieve_data = True

print('retrieve data:', retrieve_data)

if retrieve_data:
    fetch_data(dataset_name, subjects)

### 1. HARDI

#### Single subject single session

<span style="color:blue">**TODO: Download additional HARDI subjects?**</span>

- If going to do this need to create fetcher and extend/overwrite `organize_stanford_data` to make BIDs compliant

### 2. HCP

- <span style="color:red">**Question: Are there certain subjects we have been using for quality control?**</span>

- <span style="color:red">**Question: Should look for subjects with certain properties?**</span>
  
    Like: 

  - test-retest?
    
  - twins?
    
  - demographic?

#### HCP Single subject single session

https://www.humanconnectome.org/study/hcp-young-adult/document/1200-subjects-data-release

- ##### <span style="color:red">NOTE: Subject 100307 selected as exemplar subject from [HCP 1200 Subjects Data Release Reference Manual](https://www.humanconnectome.org/storage/app/media/documentation/s1200/HCP_S1200_Release_Reference_Manual.pdf)</span>

- ##### <span style="color:red">NOTE: Subject 103818 selected as exemplar test-retest subject from [Appendix 5: MR Data acquisition information for an exemplar subject](https://www.humanconnectome.org/storage/app/media/documentation/s1200/HCP_S1200_Release_Appendix_V.pdf) and we have processed data availabe on S3 </span>

  - <span style="color:red">**Question: Which Subject is 103818's twin?**</span>

#### HCP Retest Data

> Retest datasets are available in the separate WU-Minn HCP Retest Data project

> The retest data are released as a separate project to fully distinguish it from the first visit. Retest subjects retain the same Subject ID numbers as in the 1200 Subjects HCP project

https://db.humanconnectome.org/data/projects/HCP_Retest

> 45 Subjects with 3T Retest MRI and Behavioral Measures (46 subjects)

> 46 subjects (all monozygotic twins, 21 twin pairs + 4 MZ twins without retest of co-twin) have 3T HCP protocol Retest data available.
> - 36 Retest subjects have fully complete retest data for all 3T modalities: structural, rfMRI, tfMRI, and dfMRI

> 7T data is available on 22 Retest subjects. 

**[Subjects with Retest MR data](https://db.humanconnectome.org/app/template/SubjectDashboard.vm?project=HCP_Retest&subjectGroupName=Subjects%20with%20retest%20MR%20data)**

### Validate DWI

In [None]:
subject_dwi_niis = []

for subject in subjects:
    subject_dwi_niis.append(nib.load(dwi_files[subject]))

#### Check header

<span style="color:blue">**TODO: Confirm the [reference space](https://nipy.org/nibabel/coordinate_systems.html#the-scanner-subject-reference-space) is RAS+**</span>

- HARDI and HCP hemispheres are reversed; could be related to reference space

In [None]:
check_header = False

if check_header:
    for dwi_nii in subject_dwi_niis:
#         !mrinfo {dwi_file}
        print(dwi_nii.header)

#### Display DWI Slice

In [None]:
display_dwi = False

if display_dwi:
    for subject, dwi_nii in zip(subjects, subject_dwi_niis):
        if compare_test_retest:
            # by definition same subject should exist in HCP dataset
            hcp_dwi_file = get_subject_session_dwi_file(get_subject_session_home(get_dmriprep_home('HCP'), f'sub-{subject}'), f'sub-{subject}')
            hcp_dwi_nii = nib.load(hcp_dwi_file)
            display_dwi_slice(f'HCP {subject}', hcp_dwi_nii)

        display_dwi_slice(f'{dataset_name} {subject}', dwi_nii)

## Generate Tract Profiles

1. Single individual from HARDI dataset

2. Multiple individuals from HARDI dataset

3. Multiple individuals from HCP dataset


<span style="color:blue">**TODO: Run and compare deterministic and probabilistic tractography**</span>

<span style="color:blue">**TODO: Custom Bundles (SLF Subbundles, Callosum Bundle)**</span>

In [None]:
myafq = get_afq(dataset_name)

display(myafq.data_frame)

##### <span style="color:red">NOTE: Number of individuals depends on the number of subject in `bids_path`</span>

### Segmentation

Confirm AFQ derivatives exist for each subject in dataset

##### <span style="color:red">NOTE: Processed HCP data is available from S3</span>

https://s3.console.aws.amazon.com/s3/buckets/profile-hcp-west?region=us-west-2&prefix=hcp_reliability/

In [None]:
# check if there are any subjects in our subject list that are not already in afq derivatives
segment = (set(subjects) > set(myafq.subjects))

print('segment:', segment)

if not segment: 
    results_dirs = {}

    for subject in subjects:
        iloc = get_iloc(myafq, subject)
        results_dirs[subject] = myafq.data_frame.iloc[iloc]['results_dir']

    with pd.option_context('display.max_colwidth', -1):
        display(pd.Series(results_dirs).to_frame('results directory'))

    # NOTE: this logic is incorrect -- checking the afq derivatives results dirs is insufficient
    # must check whether contain files...
    segment = (not np.all([os.path.isdir(results_dir) for results_dir in results_dirs.values()]))

print('segment:', segment)

if segment:
    if dataset_name == 'HARDI':
        myafq.export_all()
    else:
        for subject in subjects:
            subject_base_dir = op.join(get_afq_home(dataset_name), f'sub-{subject}')

            # Create afq derivatives and subject directories if do not exist
            os.makedirs(subject_base_dir, exist_ok=True)

            hcp_s3_url = get_hcp_s3_url(dataset_name, subject)

            # fetch hcp data from s3
            !aws s3 cp {hcp_s3_url} {subject_base_dir} --recursive

### Vizualization

optional quality control check

#### Bundles

<span style="color:blue">**TODO: Left and Right hemispheres appear flipped between HARDI and HCP**</span>

In [None]:
show_bundles = False

if show_bundles:
    for subject in subjects:
        loc = get_iloc(myafq, subject)
        bundle_html = myafq.viz_bundles(export=True, n_points=50)
        plotly.io.show(bundle_html[loc])

#### Tract Profiles

##### <span style="color:blue">TODO: `plot_tract_profiles` failing with `KeyError` for `HCP` and `HCP_retest`</span>

In [None]:
show_tract_profiles = False

if show_tract_profiles:
    myafq.plot_tract_profiles()
    display(Image(filename=myafq.data_frame['tract_profiles_viz'][0][0]+'.png', width = 500))