Skip to content

cmu-delphi/forecast-eval

Repository files navigation

Forecast Eval

The forecast evaluation dashboard provides a robust set of tools and methods for evaluating the performance of epidemic forecasts. The project's goal is to help epidemiological researchers gain insights into the performance of their forecasts and lead to more accurate epidemic forecasting.

Background

This app collects and scores COVID-19 forecasts submitted to the CDC. The dashboard was developed by CMU Delphi in collaboration with the Reich Lab and US COVID-19 Forecast Hub from UMass-Amherst, as part of the Forecast Evaluation Research Collaborative.

The Reich Lab created and maintains the COVID-19 Forecast Hub, a collaborative effort with over 80 groups submitting forecasts to be part of the official CDC COVID-19 ensemble forecast. All Forecase Hub forecasters that are designated "primary" or "secondary" are scored and included in the dashboard.

The Delphi Group created and maintains COVIDcast, a platform for epidemiological surveillance data. COVIDcast provides the ground truth data used to score forecasts against.

The public version of the dashboard runs off of the main branch.

The version on the dev branch appears on the staging website. The username and password are included in the meeting notes doc and on Slack.

The dashboard is backed by the forecast evaluation pipeline. The pipeline runs three times a week, on Sunday, Monday, and Tuesday, using the code on the dev branch. It collects and scores forecasts from the Forecast Hub, and posts the resulting files to a publicly-accessible AWS S3 bucket.

See the "About" writeup for more information about the data and processing steps.

Contributing

main is the production branch and shouldn't be directly modified. Pull requests should be based on and merged into dev. When enough changes have accumulated on dev, a release will be made to sync main with it.

This project requires a recent version of GNU make and docker.

The easiest way to view and develop this project locally is to run the Shiny app from RStudio:

RStudio Screen Shot with Run App button circled

This is the same as running

shiny::runApp("<directory>")

in R. However, dashboard behavior can differ running locally versus running in a container (due to package versioning, packages that haven't been properly added to the container environment, etc), so the dashboard should be also tested in a container.

The dashboard can be run in a Docker container using make. See notes in the Makefile for workarounds if you don't have image repository access.

The pipeline can be run locally with the Report/create_reports.R script or in a container. See notes in the Makefile for workarounds if you don't have image repository access.

Running the scoring pipeline

The scoring pipline use a containerized R environment. See the docker_build directory for more details.

The pipeline can be run locally with the Report/create_reports.R script or in a container via

> make score_forecast

See notes in the Makefile for workarounds if you don't have image repository access.

Running the Shiny app

The dashboard can be run in a Docker container using

> make start_dashboard

See notes in the Makefile for workarounds if you don't have image repository access.

Releasing

main is the production branch and contains the code that the public dashboard uses. Code changes will accumulate on the dev branch and when we want to make a release, dev will be merged into main via the "Create Release" workflow. Version bump type (major, minor, etc) is specified manually when running the action.

If there's some issue with the workflow-based release process, a release can be done manually with:

git checkout dev
git pull origin dev
git checkout -b release_v<major>.<minor>.<patch> origin/dev

Update version number in the DESCRIPTION file and in the dashboard.

git add .
git commit -m "Version <major>.<minor>.<patch> updates"
git tag -a v<major>.<minor>.<patch> -m "Version <major>.<minor>.<patch>"
git push origin release_v<major>.<minor>.<patch>
git push origin v<major>.<minor>.<patch>

Create a PR into main. After the branch is merged to main, perform cleanup by merging main into dev so that dev stays up to date.

Dependencies

The scoring pipeline runs in a docker container built from docker_build/Dockerfile, which is a straight copy of the covidcast-docker image. The dashboard runs in a docker container built from devops/Dockerfile.

When updates are made in the evalcast package the behavior of the scoring script can be affected and the covidcast docker image must be rebuilt. The workflow in the covidcast-docker repository that does this needs to be triggered manually. Before building the new image, ensure that the changes in evalcast will be compatible with the scoring pipeline.

Currently, the scoring pipeline uses the the evalcast package from theevalcast branch of the covidcast repository. However, if we need to make forecast eval-specific changes to the evalcast package that would conflict with other use cases, we have in the past created a dedicated forecast-eval branch of evalcast.

Performing a manual rollback

For the dashboard

This should only be performed if absolutely necessary.

  1. Change this forecasteval line to point to the desired (most recently working) sha256 hash rather than the latest tag. The hashes can be found in the Delphi ghcr.io image repository -- these require special permissions to view. Ask Brian for permissions, ask Nat for hash info.
  2. Create a PR into main. Tag Brian as reviewer and let him know over Slack. Changes will automatically propagate to production once merged.
  3. When creating the next normal release, code changes will no longer automatically propagate via the latest image to the public dashboard; the tag in the ansible settings file must be manually changed back to latest.

For the pipeline

  1. Change the FROM line in the docker_build Dockerfile to point to the most recently working sha256 hash rather than the latest tag. The hashes can be found in the Delphi ghcr.io image repository -- these require special permissions to view. Ask Brian for permissions, ask Nat for hash info.
  2. Create a PR into dev. Tag Katie or Nat as reviewer and let them know over Slack. Changes will automatically propagate to production once merged.
  3. When building the next covidcast docker image, changes will no longer automatically propagate via the latest covidcast image to the local pipeline image; the tag in docker_build/Dockerfile must be manually changed back to latest.

Code Structure

  • .github
    • workflows contains GitHub Actions workflow files
      • ci.yml runs linting on branch merge. Also builds new Docker images and pushes to the image repo for the main and dev branches
      • create_release.yml triggered manually to merge dev into main. Increments app version number, and creates PR into main and tags reviewer (currently Katie).
      • release_main.yml runs on merge of release branch. Creates tagged release using release-drafter.yml and merges updated main back into dev to keep them in sync.
      • s3_upload_ec2.yml runs the weekly self-hosted data pipeline workflow action (preceded by s3_upload.yml that ran the pipeline on a GitHub-provided VM)
    • release-drafter.yml creates a release
  • Report contains the code for fetching, scoring, and uploading forecasts. Runs 3 times a week
  • app contains all the code for the Shiny dashboard
    • R contains supporting R functions
      • data.R defines data-fetching functions
      • data_manipulation.R defines various filter functions
      • delphiLayout.R defines dashboard main and sub- UIs
      • exportScores.R contains tools to support the score CSV download tool included in the dashboard
    • assets contains supporting Markdown text. about.md contains the code for the "About" tab in the dasboard; other .md files contain explanations of the scores and other text info that appears in the app.
    • www contains CSS stylesheets and the logo images
    • ui.R sets up the UI for the dashboard, and defines starting values for selectors
    • server.R defines dashboard behavior. This is where the logic for the dashboard lives.
    • global.R defines constants and helper functions
  • docker_buid contains the Docker build configuration for the scoring pipeline
  • devops contains the Docker build configuration for the Shiny dashboard
    • Note: when adding a new package dependency to the app, it must be specified in this Dockerfile
  • DESCRIPTION summarizes package information, such as contributors, version, and dependencies
  • Makefile contains commands to build and run the dashboard, and score and upload the data