Skip to content
isaacmg edited this page Jul 19, 2020 · 42 revisions

Task Time Series (TS)

Cross Roads

Task TS sits at the crossroads of a lot of interesting areas within epidemiology, machine learning, time-series forecasting, data analysis, and data engineering. We aim to provide value to the broader CoronaWhy organization and global AI4Good community in the following ways:

(1) Coronavirus and response

Directly reducing the loss of life and economic impact of Coronavirus through effective forecasting and gauging policy interventions remains our primary objective. Specifically:

  • Develop models that can effectively forecast new Coronavirus cases, deaths, and hospitalizations 14 days into the future. These can aid in public-policy decisions about when to lock-down or open up specific counties. They can also inform hospitals of when to stock up on PPE.
  • Use trained models to provide insights into virus transmission mechanics and utility of public policy interventions.
  • Develop a central publicly available temporal data lake for Coronavirus that other epidemiologists and researchers can leverage.
  • Develop models to help forecast the economic impact in terms of jobs and GDP of proposed interventions.
  • If possible develop models to forecast patient length of stay in hospitals and risk of decompensation to aid in discharge planning and early warning systems for physicians.

(2) Epidemiology

While epidemiology has long used simple statistical models few studies have explored using modern deep learning techniques such as LSTMs, NARX, etc. Specifically we aim to do the following:

  • Demonstrate the utility of temporal machine learning as a broader tool for studying pandemics, spread of disease, and public health crises.
  • Show how temporal machine learning models can help gauge the impact of public policy interventions.

(3) Machine Learning Research

Few deep learning research has studied few-shot learning for time series forecasting. We aim to develop broadly applicable models that learn to transfer knowledge in a hierarchical fashion so that in case of a pandemic, natural disaster, or any forecasting problem models perform better with limited or noisy training data. Similarly we hope to give time series its ImageNet or Roberta moment

  • Develop few shot learning techniques for time series forecasting that can effectively train models to forecast on limited data.
  • Devise multi-modal learning techniques that can effectively incorporate geo-spatial data and other forms of meta-data with temporal data.
  • Crack open the "black-box" on deep models and provide new ways to explain and visualize deep models predictions.
  • Figure out innovative ways to integrate traditional statistical time series methods with latest in DL research from NLP and CV.

(4) MLOps

MLOps is a relatively new field that seeks to integrate techniques from software engineering and DevOps to make ML models function well in a production setting. As a new field we have the opportunity to help set standards and best practices with the way we develop our models and deploy them in a production environment.

  • Showcase experiment tracking and extendibility best practices with configuration files.
  • Show a template for proper unit tests and continuous deployment of packages and ML models to production.
  • Demonstrate how versioned datasets enable completely reproducible results.
  • Develop continuous retraining pipelines that automatically train and re-deploy top models as new data comes in.

(5) Data Science Teams

  • Create a template for how a globally distributed volunteer data science team of various experience levels can effect positive social change.
  • Highlight pair-programming techniques and documentation strategies that allow new-teammates to quickly get up to speed.
  • Show how data scientist, data engineers, epidemiologists, and public health officials can effectively work together despite different backgrounds and vocabularies.

Goals/Concrete Deliverables

  1. Publications in top Machine Learning Conferences on novel time series forecasting models and few-shot times series learning methods

  2. Publications in top epidemiology journals on machine learning as a methodological tool in studying COVID-19 and recommended interventions based on our model.

  3. Daily dashboard of expected COVID-19 cases, deaths, hospitalizations in all U.S. and Western European Counties with contributing factors listed.

  4. Task agnostic open-source forecasting framework that can be leveraged to forecast time series problem. Along with pre-trained weights that can easily be loaded as a starting point for training models.

  5. Geo-spatial/temporal data lake of COVID-19, SARS, MERs, Ebola and other related datasets persisted to Dataverse.

Current Issues and Timeline

For more details on our active projects and issues see our GitHub board and view our latest meeting. Most Recent Meeting

  1. Analyze the effects of pre-training on COVID-19 forecasting performance.
  2. Analyze the effects of Google mobility data on COVID-19 forecasting performance.
  3. Create general pre-trained weights on flow data for n = 3 encoders.
  4. Working to incorporate new data sources into our data lake.

Accomplishments