WASAA 2023 - Ressources for projects 1 and 2 
--

These ressources are common to the two projects : 
- "Classification of patients with Autism Spectrum Disorder using spontaneous brain activity" 
- "Networks in brain pathologies using spontaneous brain activity"

There are two datasets on which you can do experiments : ABIDE and ADHD. 

ADHD is probably easier to work with as there are only 40 subjects.

ADHD dataset
--

To download the ADHD dataset, use the [nilearn fetcher]([documentation](https://nilearn.github.io/stable/modules/generated/nilearn.datasets.fetch_abide_pcp.html#nilearn.datasets.fetch_abide_pcp)) as done in the next cell. A few comments : 
- The code first checks if the dataset has been downloaded or not, and will not download if the files are detected. 
- Here I download only two subjects so that it's not too long, but there are 40 subjects in total 
- the `data_dir` argument can be changed. There are only two relevant options : using the network drive (if you leave `data_dir = './'`) or using the drive of the local machine you are using `data_dir = '/users/local/'`. Local drive will be faster and you have a lot more space, but you'll have to download again if you switch to another machine
- You can also check the official page of the dataset [here](http://fcon_1000.projects.nitrc.org/indi/adhd200/index.html)

In [6]:
from nilearn.datasets import fetch_adhd

n_subjects = 2
dataset = fetch_adhd(n_subjects=n_subjects,data_dir = './')

In [10]:
dataset.keys()

dict_keys(['func', 'confounds', 'phenotypic', 'description'])

'func', 'confounds' will be lists of files, each element in the list correspond to a subject. 'func' is the functional MRI data file, and 'confounds' is a text file with an array corresponding to nuisance variables that need to be regressed out, as explained in [this tutorial](https://nilearn.github.io/stable/auto_examples/03_connectivity/plot_signal_extraction.html#sphx-glr-auto-examples-03-connectivity-plot-signal-extraction-py)

In [11]:
print(dataset['description'])

ADHD 200


Notes
-----
Part of the 1000 Functional Connectome Project. Phenotypic
information includes: diagnostic status, dimensional ADHD symptom measures,
age, sex, intelligence quotient (IQ) and lifetime medication status.
Preliminary quality control assessments (usable vs. questionable) based upon
visual timeseries inspection are included for all resting state fMRI scans.

Includes preprocessed data from 40 participants.

Project was coordinated by Michael P. Milham.

Content
-------
    :'func': Nifti images of the resting-state data
    :'phenotypic': Explanations of preprocessing steps
    :'confounds': CSV files containing the nuisance variables

References
----------
For more information about this dataset's structure:
http://fcon_1000.projects.nitrc.org/indi/adhd200/index.html

Licence: usage is unrestricted for non-commercial research purposes.



In [12]:
dataset['confounds']

['./adhd/data/0010042/0010042_regressors.csv',
 './adhd/data/0010064/0010064_regressors.csv']

In [13]:
dataset['func']

['./adhd/data/0010042/0010042_rest_tshift_RPI_voreg_mni.nii.gz',
 './adhd/data/0010064/0010064_rest_tshift_RPI_voreg_mni.nii.gz']

And finally 'phenotype' gives other information on the subjects. For example here is the phenotypic info for the first subject

In [28]:
dataset['phenotypic'][0]

(b'"21"', 10042, b'"rest_1"', 0.0559, 0, 0.2365, 0.0922, 0., 2.2915, 1.0089, b'"NYU"', b'NA', b'"data_set"', 10.65, b'"M"', b'"0.91"', b'NA', b'108', b'100', b'115', b'2', 0, 1, b'NA', b'NA', b'NA', b'""', b'""', b'""', b'NA', b'NA', b'NA', b'NA', b'NA', b'NA', b'NA', b'NA', b'NA', b'NA', b'NA', b'59', b'56', b'65', b'NA', b'"pass"', b'""', b'"pass"', b'""', b'""', b'""', b'NA', b'NA', b'NA', b'NA', b'NA', b'NA', b'"pass"', b'NA', b'NA', b'NA', b'NA', b'NA', b'NA', b'""', b'""')

Here are the keys to interpret what these values correspond to 

In [33]:
dataset['phenotypic'].dtype

dtype([('f0', 'S4'), ('Subject', '<i8'), ('RestScan', 'S8'), ('MeanFD', '<f8'), ('NumFD_greater_than_020', '<i8'), ('rootMeanSquareFD', '<f8'), ('FDquartiletop14thFD', '<f8'), ('PercentFD_greater_than_020', '<f8'), ('MeanDVARS', '<f8'), ('MeanFD_Jenkinson', '<f8'), ('site', 'S12'), ('sibling_id', 'S7'), ('data_set', 'S10'), ('age', '<f8'), ('sex', 'S3'), ('handedness', 'S6'), ('full_2_iq', 'S3'), ('full_4_iq', 'S3'), ('viq', 'S3'), ('piq', 'S3'), ('iq_measure', 'S2'), ('tdc', '<i8'), ('adhd', '<i8'), ('adhd_inattentive', 'S2'), ('adhd_combined', 'S2'), ('adhd_subthreshold', 'S2'), ('diagnosis_using_cdis', 'S12'), ('notes', 'S24'), ('sess_1_anat_2', 'S6'), ('oppositional', 'S2'), ('cog_inatt', 'S2'), ('hyperac', 'S2'), ('anxious_shy', 'S2'), ('perfectionism', 'S2'), ('social_problems', 'S2'), ('psychosomatic', 'S2'), ('conn_adhd', 'S2'), ('restless_impulsive', 'S2'), ('emot_lability', 'S2'), ('conn_gi_tot', 'S2'), ('dsm_iv_inatt', 'S2'), ('dsm_iv_h_i', 'S2'), ('dsm_iv_tot', 'S2'), ('stu

We are probably only interested in age, sex, handedness, iq_measure, tdc, adhd, adhd_innatentive, adhd_combined, adhd_subthreshold, diagnosis. You'll have to extract the corresponding columns from the phenotypic vectors of each subject. 
Other values correspond to specific steps in preprocessing but you can ignore them. 

The coding of some of the fields is described in [this file](http://fcon_1000.projects.nitrc.org/indi/adhd200/general/ADHD-200_PhenotypicKey.pdf)  (Key explaining the values used to code site, gender, handedness, diagnosis, ADHD measure, IQ measure, medication status and quality control in each sample's phenotypic.csv file.) 

ABIDE dataset
--
Unfortunately the automatic fetching of the ABIDE data seems to not work currently (probably a S3 server is down or so)

Next steps
--

- Parcellate the data: use a masker to generate a time series on this parcellation. You may also use the confounds csv file to regress out confounds while masking 
- Estimate a functional connectivity matrix for each subject
- Project 1 : extract the upper triangular values of this matrix as a vector (using [triu](https://numpy.org/doc/stable/reference/generated/numpy.triu.html)), and use supervised learning (e.g. KNN) to try to differentiate subjects based on phenotype (eg ADHD diagnosis or other variables)
- Project 2 : Use the matrix to compute Graph metrics for each subject using the package bctpy (can be installed with pip), and try to relate those graph metrics with phenotype using simple statistics (regression, correlation). 

Of course the project is left open for you to explore. Feel free to test other ideas as well, and check out papers that have been published with the same dataset. And feel free to reach out by email. 

Good luck ! 