Skip to content

Parkinsons disease detection using causal forest for dimensionality reduction

Notifications You must be signed in to change notification settings

GabrielSolana29/fmri_causalForest

Repository files navigation

Causal Forest Machine Learning Analysis of Parkinson's Disease in resting-state Functional Magnetic Resonance Imaging

OS - Linux OS - Windows Python - 3.10.9

About the project

This work presents a methodology to analyze functional Magnetic Resonance Imaging (fMRI) from Parkinson's disease (PD) patients by pre-processing the images and extracting features from the time series generated from the activations in the Regions of Interest (ROI) in the brain. Then, a proposed pipeline to perform dimensionality reduction and subset selection is used by leveraging techniques like Causal Forest and Wrapper Feature Subset Selection (WFSS). Finally, we present the relations between the ROIs and the classes with bubble plots and with Multiple Correspondence Analysis to facilitate the visualization of the relationships.

Requirements

Libraries

  • Python 3.9
  • pandas 2.0.2
  • numpy 1.24.4
  • scikit-learn 1.2.2
  • scipy 1.11.0
  • econml 0.14.1
  • statsmodels 0.14.0
  • xgboost 1.7.6
  • prince 0.7.1
  • matplotlib 3.7.1
  • shap 0.41.0
  • mlxtend 0.22.0

Dataset

The data used in this project comes from fMRI obtained from the Parkinson's Progression Marker Initiative (PPMI) https://www.ppmi-info.org/, and 1000 functional Connectomes Project https://www.nitrc.org/projects/fcon_1000/. Even though both datasets are publicly available, the data included here does not contain any information to identify the patients, and has already gone through a pre-processing stage explained in the paper. In the CSV folder, the following files are available:

  • activations.csv: Contains the time series from each Region of Interest obtained with the changes in the the gray level of the fMRI from the brain of patients (class 1) and controls (class 0).
  • features.csv: Contains the features extracted from the time series for each patient and control. In total there are 11600 features per patient.
  • features_female_control_vs_allcausal_features.csv: Best ranked features from Causal Forest in multiple iterations for healthy female patients.
  • features_female_PD_vs_allcausal_features.csv: Best ranked features from Causal Forest in multiple iterations for PD female patients.
  • features_male_control_vs_allcausal_features.csv: Best ranked features from Causal Forest in multiple iterations for healthy male patients.
  • features_male_PD_vs_allcausal_features.csv: Best ranked features from Causal Forest in multiple iterations for PD male patients.
  • mca_pd_control_female_male: Contingency table where each row and column corresponds to a group and a brain region, respectively. 1 signals a causal effect, 0 indicates the contrary.
  • mca_updrs_pd_female_male: Contingency table where each row and column corresponds to a group and a brain region, respectively. 1 signals a causal effect, 0 indicates the contrary.

For more information on the datasets please refer to the article.

Feature selection with Causal Forest and Wrapper Features Subset Selection

  • feature_extraction.py: Loads the file "activations.csv" and extracts 11600 features per patient, including statistical, frequency-based and connectivity-based features. It generates the file "features.csv".
  • causal_feature_selection.py: Loads the file "features.csv" and performs feature selection using Causal Forest and a custom function which combines Causal Forest and WFSS, providing both the rankings of the features, and the best subset obtained by the WFSS algorithm.
  • bubble_plots.py: It loads the files containing the best ranking features provided by multiple iterations of the Causal Forest algorithm by creating 2 classes: (1) female PD patients and the rest of the observations (2) female controls and the rest of the observations (3) male PD patients and the rest of the observations (4) male controls and the rest of the observations. To visualize the importance and frequency of appearance of the features being selected, bubble plots are generated.

Multiple Correspondence Analisis

  • multiple_correspondence_analysis.py: Performs Multiple Correspondence Analysis to better visualize the relation between the ROI and the classes. Loads a continency table from one of the two provided files ("mca_pd_control_female_male.csv" or "mca_updrs_pd_female_male.csv").

About

Parkinsons disease detection using causal forest for dimensionality reduction

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages