pyebm - A toolbox for Event Based Models
Switch branches/tags
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
.idea
pyebm
resources
scripts
.gitignore
LICENSE
README.md
requirements.txt

README.md

pyebm - A toolbox for Event Based Models

The event-based model (EBM) for data-driven disease progression modeling estimates the sequence in which biomarkers for a disease become abnormal. This helps in understanding the dynamics of disease progression and facilitates early diagnosis by staging patients on a disease progression timeline. A more accurate and scalable EBM algorithm (Discriminative EBM) was introduced in [2].

Call ebm.fit to find the central ordering in a few biomarkers using method [1] Call debm.fit to find the central ordering in a few biomarkers using method [2]

EBM and its variants typically consists of 2 steps.

Step 1: Mixture Model to figure out biomarker distributions in Normal and Abnormal classes

Step 2: Estimating a mean ordering of biomarkers.

This toolbox supports 3 different Gaussian mixture models.

  1. Algorithm proposed in [3] by Alexandra Young et. al.
  2. Algorithm proposed in [2] by Vikram Venkatraghavan et. al.
  3. Algorithm proposed in [4] by Vikram Venkatraghavan et. al.

Required Libraries

Python 2.7 / 3.4, numpy 1.13, pandas 0.20, sklearn 0.19, scipy 0.18, seaborn 0.8, statsmodels 0.8

Explanation of Inputs:

DataIn:

String to the CSV File where the data is stored. This can also be a Pandas dataframe with necessary data. The CSV file or the dataframe must contain the following fields: PTID (Patient ID), Diagnosis (Clinical Label), Biomarkers, Confounding Factors, EXAMDATE (Date of Examination). See ADNI_7.csv for example.

(optional) MethodOptions:

Named Tuple with any or all of the following fields:

  • MixtureModel - Choose the mixture model algorithm (Options: 'vv1'[2],'vv2'(default)[4], 'ay'[3])
  • Bootstrap - Number of iterations in the bootstrapping [default - Turned Off].
  • PatientStaging - Choose the patient staging algorithm, with a two element list consisting of ['exp'/'ml','p'/'l']. The first element in the list chooses 'ml' for most likely stage[1,2,3] or 'exp' for expected stage[4]. The second element in the list chooses 'l' for likelihood[1,2,3] or 'p' for posterior probability[4].
  • (Only in EBM.Control) NStartpoints, Niterations and N_MCMC are algorithm specific parameters for EBM method.

(optional) VerboseOptions:

Named Tuple with any or all of the following fields:

  • Distributions - plots biomarker distributions [default - Turned Off]
  • Ordering - plots the central ordering as a positional variance diagram [default - Turned Off].
  • PlotOrder - positional variance diagram has mean positions along the main diagonal [default - Turned Off]. This is used only when Ordering is Turned on.
  • WriteBootstrapData - String which specifies the location and name of the files to save the data used in different bootstrap iterations. [default - Turned Off]
  • PatientStaging - plots the patient stages of subjects in different classes. [default - Turned Off]

(optional) Factors:

Confounding Factors used for correcting the biomarkers. By Default, it is Age, Sex, ICV (intra-cranial volume)

(optional) Labels:

Clinical list of labels in the dataset. By Default, it is CN, MCI, AD.

(optional) DataTest:

If given, DataTest will be used as a test-set to evaluate the disease progression model obtained using DataIn.

Explanation of Outputs:

ModelOutput:

A stucture with the following fields:

  • BiomarkerList - List of Biomarkers used in EBM
  • BiomarkerParameters - Mixture Model parameters for the biomarkers
  • CentralOrderings - Central Ordering in different boostrap iterations. When bootstrapping is turned off, this gives the central ordering for the entire dataset.
  • MeanCentralOrdering - Mean Central Ordering among different bootstrap iterations. When bootstrapping is turned off, this is the same as CentralOrderings.
  • EventCenters - Event centers which determins how close the events are to each other.

SubjTrainAll:

A list where each element is a pandas dataframe corresponding to different bootstrap iterations. Each dataframe consists of the the following fields :

  • PTID - patient identifiers used in training
  • Ordering - Subject-wise orderings of the subjects used for training the model
  • Weights - Probabilistic weights for the each position in the subject-wise ordering
  • Stages - Staging of each subject in the training dataset.

SubjTestAll:

A list where each element is a pandas dataframe corresponding to different bootstrap iterations. Each dataframe consists of the the following fields:

  • PTID - patient identifiers used in testing
  • Ordering - Subject-wise orderings of the subjects used for testing the model
  • Weights - Probabilistic weights for the each position in the subject-wise ordering
  • Stages - Staging of each subject in the testing dataset.

References:

[1] Fonteijn, H.M., et. al., ‘An event-based model for disease progression and its application in familial Alzheimer's disease and Huntington's disease’, NeuroImage 60(3), 1880–1889 (2012).

[2] Venkatraghavan V., et. al., ‘A Discriminative Event Based Model for Alzheimer's Disease Progression Modeling’, IPMI (2017).

[3] Young, A.L., et. al.: ‘A data-driven model of biomarker changes in sporadic Alzheimer’s disease’, Brain 137(9), 2564–2577 (2014).

[4] Venkatraghavan V., et. al., ‘Disease Progression Timeline Estimation for Alzheimer's Disease using Discriminative Event Based Modeling’, Manuscript Submitted.

Contact:

v.venkatraghavan@erasmusmc.nl