Skip to content
several utilities to help wrangle COVID-19 data into a time-series format
Shell PLpgSQL Dockerfile
Branch: master
Clone or download

COVID-19 - time-series utilities

This repo contains several utilities for wrangling COVID-19 data from the John Hopkins University COVID-19 repository.



A note on cloning this repo, since the COVID19 directory is a git submodule:

  • after cloning, you must initiate the submodule. In the top level directory for the project, run git submodule init and git submodule update to clone the JHU Repo as a submodule


The files in this directory and how they're used:

  • script that converts the JHU COVID-19 daily-report data to a time-series database using TimescaleDB.
  • covid-refine: OpenRefine automation script that converts JHU COVID-19 time-series data into a normalized, enriched format and uploads it to TimescaleDB. (RECOMMENDED)
  • schema.sql: Data definition (DDL) to create the necessary tables & hypertables.
  • environment: Default environment values used in Docker containers.

Using the Timescale covid19-ingest script

  1. Create a TimescaleDB instance - download or signup
  2. Create a database named covid_19, and an application user covid19_user
  create database covid_19;
  create user covid19_user WITH PASSWORD 'your-password-here';
  alter database covid_19 OWNER TO covid19_user;
  1. Run schema.sql as the covid19_user. VACUUM/ANALYZE require owner privs

    psql -U covid19_user -h <the.server.hostname> -f schema.sql covid_19

  2. Install csvkit

    • Ubuntu: sudo apt-get install csvkit
    • MacOS: Using homebrew run brew install csvkit
  3. Using a text editor, replace the environment variables for PGHOST, PGUSER and PGPASSWORD in

  4. Run the script


  5. (OPTIONAL) add shell script to crontab to run daily

  6. Be able to slice-and-dice the data using the full power of PostgreSQL along with Timescale's time-series capabilities!

Using COVIDrefine

NOTE: Due to the changing file format of JHU's daily report data, covid-refine is recommended over COVIDrefine has the added benefit of producing fully normalized, non-sparse, geo-enriched data.

See the detailed README.

If you just want to download the COVIDrefine data, the latest version can be found here.

Using docker-compose

  1. Remember initiate the submodule, run git submodule init
  2. Run docker-compose build
  3. Run docker-compose up
  4. That's all. You can go to Swagger or PostgREST


  • the JHU COVID-19 repository is a git submodule. This was done to automate getting the latest data from their repo.
  • the script will only work in *nix environment (Linux, Unix, MacOS)
  • both scripts maintain a hidden directory called ~/.covid-19 in your home directory. checkslastcsvprocessed. Delete that file to process all daily-report files from the beginning, or change the date in the file to start processing files AFTER the entered date.


  • use postgREST to add a REST API in front of TimescaleDB database
  • create a Grafana dashboard
  • create a Carto visualization
  • create a Superset visualization


  • thanks to Avtar Sewrathan (@avthars), Prashant Sridharan (@CoolAssPuppy) and Mike Freedman (@michaelfreedman) at Timescale for their help & support to implement this project from idea to implementation in 5 days!
  • thanks to Julian Simioni (@orangejulius) at for allowing us to use the API!

Shield: CC BY-SA 4.0

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

CC BY-SA 4.0

You can’t perform that action at this time.