Skip to content

Project for CMEPDA course to predict the age of the healthy subjects from brain data using regression models.

License

Notifications You must be signed in to change notification settings

SimLoss/brain_age_predictor

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Documentation Status GitHub license CircleCI

brain_age_predictor

This repository contains a project for Computing Methods for Experimental Physics and Data Analysis course.

The aim is to design and implement a regression model to predict the age of the healthy subjects from brain data features extracted from T1-weighted MRI images. Datas are taken from to the well known ABIDE dataset, in which are present subjects affected by Autism Spectre Disorder (ASD) and healthy control subjects (CTR).

The algorithm allows to:

  • visualize and explore ABIDE datas;
  • make data harmonization by site;
  • train different regression models;
  • confront two alternative approaches to the problem.

The repository is structured as follows:

brain_age_predictor/
├── docs
├── LICENSE
├── dataset/
├── brain_age_predictor/
│   ├── images/
│   ├── metrics/
│   ├── best_estimator/
│   ├── preprocess.py
│   ├── brain_age_pred.py
│   ├── brain_age_site.py
│   ├── grid_CV.py
│   ├── __init__.py
│   ├── variability.py
│   ├── DDNregressor.py
│   └── predict_helper.py
│   
├── README.md
├── requirements.txt
└── tests
    └── test.py
    └── __init__.py

Data

Datas from ABIDE (Autism Brain Imaging Data Exchange) are contained in .csv files inside brain_age_predictor/dataset folder and are handled with Pandas. This dataset contains 419 brain morphological features (volumes, thickness, area, etc.) of different brain segmented area (via Freesurfer sofware) belonging to 915 male subjects (451 cases, 464 controls) pespectively with with total mean age of 17.47 ± 0.36 and 17.38 ± 0.40. The age distribution of subjects, although heterogeneous between CTR and ASD groups, presents quite a skewed profile, as shown below:

Also age distribution across sites change quite drastically as shown in the following boxplot:

Site harmonization

On top of these differencies, another important confounding factor is related to the effect of the different acquisition sites on the features. To mitigate this effect, the state-of-art harmonization tool neuroHarmonize implemented by Pomponio et al. has been used.

neuroHarmonize corrects differences introducted by multi-site image acquisition preserving specified covariates. So, harmonization can be safely performed without affecting age-related biological variability of the dataset. This is particulary important as different sites have different age distribution. The analysis has been conducted using 'unharmonized' and 'harmonized' datas.

Analysis

Method

Models have been trained using only control cases (CTR) and then evaluated separately on CTR set and cases set (ASD). Differences through residual plots are shown in the results avalaible in /images folder. Being very poorly represented (<4%), subjects with age >40 years have been discarded from the present study, similarly to other studies in the field.(1)

Pipelines

Two different pipelines have been followed based on Leave-One-Site-Out approach:

  • 1) Datas have been previously separeted in train/test sets using one provenance site as test and the others as train and consequently cross-validated with KFold CV.(2)(3)
  • 2) Datas have been processed without discrimination based on site and validated through a regular GridSearch CV. Both scikitlearn's models and a custom neural network have been used.

Results

Typical regression metrics (MAE, MSE) have been evaluated. Pearson correlation coefficient (PR) has been also calculated too. For pipeline 1, results' plots are collected in 'images' folder, while fitted models and relative metrics' results are stored respectively in 'best_estimator' and 'metrics/grid' folders. Variability plots are stored in 'images_SITE/grid' folder. For pipeline 2, results' metrics are also stored in 'metrics/site' and summarizing plots are stored in 'images_SITE/site' folder.

Requirements

To use these Python codes the following packages are required:

  • keras
  • matplotlib
  • neuroHarmonize
  • numpy
  • pandas
  • prettytable
  • scikit-learn
  • scipy
  • seaborn
  • sphinx
  • statsmodels
  • tensorflow

Usage

  • 1) Download the repository from github git clone https://github.com/Pastiera/brain_age_predictor
  • 2) Change directory: cd path/to/brain_age_predictor/brain_age_predictor
  • 3) Modules brain_age_pred.py, brain_age_site.py, variability.py, preprocess.py are executable following relative help instruction by typing -h on std-out line as positional argument or simply running them. Example of usage (Pipeline1):

usage: brain_age_pred.py [-h] [-dp DATAPATH] [-grid] [-pred] [-neuroharm] [-verb]

Main module for brain age predictor package.

optional arguments:
  -h, --help            show this help message and exit
  
  -dp,  --datapath DATAPATH
                        Path to the data folder.
  
  -grid, --gridcv       Use GridSearch cross validation to train and fit models.
  
  -pred, --predict      Make predictions with models pre-trained with GridSearchCV.
  
  -neuroharm, --harmonize
                        Use NeuroHarmonize to harmonize data by provenance site.
  
  -verb, --verbose      Set DDN Regressor model's verbosity. If True, it shows model summary.Default = False


Pre-trained model in /best_estimator can be run for reproducibility and newly trained model will be saved in the same folder. If no fitted models is already present in this folder, one shall firstly run brain_age_pred.py to use variability.py. Results' plots are collected in 'images' or 'images_site' folder, while fitted models and relative metrics' results are stored respectively in 'best_estimator' and 'metrics' folders.

References

About

Project for CMEPDA course to predict the age of the healthy subjects from brain data using regression models.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages