EHR DREAM Challenge
The EHR DREAM Challenge is a series of community challenges to pilot and develop a predictive analytic ecosystem within the healthcare system.
Evaluation of predictive models in the clinical space is an ever growing need for the Learning Health System. Healthcare institutions are attempting to move away from a rules-based approach to clinical care, toward a more data-driven model of care. To achieve this, machine learning algorithms are being developed to aid physicians in clinical decision making. However, a key limitation in the adoption and widespread deployment of these algorithms into clinical practice is the lack of rigorous assessments and clear evaluation standards. A framework for the systematic benchmarking and evaluation of biomedical algorithms - assessed in a prospective manner that mimics a clinical environment - is needed to ensure patient safety and clinical efficacy.
Alignment to program objectives
The primary objectives of the EHR DREAM Challenge are to :
- Lower barriers to piloting of innovative machine learning and data science methods in healthcare
- Establish clinically relevant prediction benchmarks and evaluation metrics
- Minimize the distance between the model developers and the clinic to reduce the time of model implementation
We are tackling the stated problem by focusing on a specific prediction problem: patient mortality. Due to it's well studied nature and relatively well-established predictiveness, patient mortality serves as a well-defined benchmarking problem for assessing predictive models. These models are also widely adopted and implemented at healthcare institutions and CTSAs, a feature we hope will stimulate participation from a wide range of institutions.
DREAM challenges are an instrumental tool for harnessing the wisdom of the broader scientific community to develop computational solutions to biomedical problems. While previous DREAM challenges have worked with complex biological data as well as sensitive medical data, running DREAM Challenges with Electronic Health Records present unique complications, patient privacy being at the forefront of those concerns. Previous challenges have developed a technique known as the Model to Data (MTD) approach to maintain the privacy of the data. We will be using this MTD approach, facilitated by docker, on an OMOP dataset provided by the University of Washington to make development of models standardized.
Patient Mortality DREAM Challenge
We will ask participants of this DREAM Challenge to predict the future mortality status of currently living patients within our OMOP repository. After participants predict, we will evaluate the model performances against a gold standard benchmark dataset. We will carry out this DREAM challenge in three phases (Fig 1).
The Open Phase will be a preliminary testing and validation phase. In this phase, the Synpuf synthetic OMOP data will be used to test submitted models. Participants will submit their predictive models to our system where those models will train and predict on the split Synpuf dataset. The main objectives of the first phase are to allow the participants to become familiar with the submission system, to allow the organizers to work out any issues in the pipeline, and to give participants a preliminary ranking on the performance of their model.
The Leaderboard Phase will be the prospective prediction phase carried out on UW OMOP data. Participants will submit their models which will have a portion of the UW OMOP repository available to them for training, making predictions on all living patients who have had at least one visit in the previous month. They will predict whether these patients will be deceased in the next 6 months by assigning a probability score to each of the patients. Participants will be expected to setup up their own training dataset but the patient numbers for which a prediction is expected will be provided to the docker models.
The Validation Phase will be the final evaluation phase where challenge admins are able to finalize the scores of the models.
Figure 1. The Open Phase will feature a synethic training and test set to test the pipeline and participant models. The Leaderboard Phase will feature model submissions being evaluated against the UW OMOP repository. The gold standard benchmark set will be withheld from the docker models and used to evaluate model performance. Model performance metrics will be returned to the participants via Synapse.
Project Scientific Leads
|Justin Guinney||Sage Bionetworks|
|Thomas Schaffter||Sage Bionetworks|
Project Team Members
See Team README
Repositories and Websites
- Code and Dockerized package for deploying an evaluation harness (e.g. Model to Data) for predictive algorithms applied against an OMOP CDM.
- NCATS deployed and hosted evaluation harnesss using the OMOP CDM (see Deliverable #1) populated with SynPUFF data.
- Best practices and Standard Operating Procedure documentation for prospective model evaluation and benchmarking on EHR data.
- Library of Dockerized algorithms for prediction of patient mortality - acquired through EHR DREAM Challenge - with performance metrics.
|Feb 4||Complete the aggregation and quality assessment of the UW cohort that will be used in the study.||Done|
|Feb 27||Conduct an internal evaluation by applying previously developed models to the UW cohort.||Done|
|March 6||Survey the CTSAs to find which sites have mortality and 30-day re-admission prediction models that would be willing to participate.||Ongoing|
|March 20||Build the Synapse pilot challenge site with instructions for participating in the challenge.||Ongoing|
|April||Build the infrastructure for facilitating the DREAM challenge, using Docker, Synapse, and UW servers.||Ongoing|
|June||Phase 1: Have a period of time where the parties identified in step 1 submit their models to predict on UW patients. This will not be a prospective evaluation.||Not Started|
|Summer||Phase 2: Prospectively evaluate model performances, evaluating accuracy and recall between models.||Not Started|
|Jan 2020||Make scripts and documentation available for the CTSAs.||Ongoing|
The project Google drive folder is accessible to onboarded participants.
The project slack room is accessible to onboarded participants.