Can We Diagnose Mental Disorders in Children? – A Large-Scale Assessment of Machine Learning on Structural Neuroimaging of 6916 Children in the ABCD Study

Goal: Explore predictability of various psychiatric diagnoses based on neuroimaging features in subjects from the ABCD study.

Getting started

Copy the following files into data/raw/:

From the baseline release of the ABCD study:
```
abcd_ksad01.txt
abcd_ksad501.txt
acspsw03.txt
btsv01.txt
```
A table with FreeSurfer features (you need to run FreeSurfer on the sMRI data of the ABCD Study):
```
abcd_freesurfer.csv
```
A table with processed sociodemographic features. These features are the same as in the ABCD Neurocognitive Prediction Challenge. The respective R code can be found on the challenge website (https://sibis.sri.com/abcd-np-challenge/).
```
sociodem_bl.csv
```
Run python src/runnable/make_dataset.py to process and combine these data into one dataframe. Use the following options:
```
--select-one-child-per-family: Whether to randomly select only one child per family
--seed: Random number seed for selecting one child per family
```
In our article, a seed of 77 was used.

Running the experiments

To fit and obtain training, validation, and test set predictions by the OVR logistic regression, CCE logistic regression, and CCE Bayesian optimized XGBoost models on the processed dataset, run python src/runnable/run_unpermuted.py. Use the following options:
```
--seed: Random number seed (int)
--k: Number of cross validation folds (int, default 5)
--n: Number of successive k-fold CV runs (int)
```

To fit and obtain predictions on random permutations of the processed dataset, run python src/runnable/run_permuted.py using the following options:

--seed: Random number seed (int)
--k: Number of cross validation folds (int, default 5)
--n: Number of successive k-fold CV runs (int)
--num_permutations: Number of random permutations (int)

Note: Running these experiments will take extended amounts of time (about 20 hours for a single repeat of 5-fold cross validation on a fast machine). Consider parallelizing computations on several machines by using different seeds.

In our article, a seed of 77 was used.

Evaluation and visualization

All raw predictions are saved to results/.

Replication of published results

We have provided a table (data/splits.csv) with the subject IDs in the training, validation, and test sets of each fold in our repeated cross validation scheme. You may use it to reproduce the results of our article Can We Predict Mental Disorders in Children? A Large-Scale Assessment of Machine Learning on Structural Neuroimaging of 6916 Children in the ABCD Study.

Project based on the cookiecutter data science project template. #cookiecutterdatascience

Name		Name	Last commit message	Last commit date
Latest commit History 49 Commits
data		data
docs		docs
notebooks		notebooks
reports		reports
src		src
.gitignore		.gitignore
CITATION.cff		CITATION.cff
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
requirements.yaml		requirements.yaml
setup.py		setup.py
test_environment.py		test_environment.py
tox.ini		tox.ini

License

ai-med/abcd_study

Folders and files

Latest commit

History

Repository files navigation

Can We Diagnose Mental Disorders in Children? – A Large-Scale Assessment of Machine Learning on Structural Neuroimaging of 6916 Children in the ABCD Study

Getting started

Running the experiments

Evaluation and visualization

Replication of published results

About

Resources

License

Stars

Watchers

Forks

Languages