COVID-19 - time-series utilities

This repo contains several utilities for wrangling COVID-19 data from the John Hopkins University COVID-19 repository.

NOTE: The utilities currently do not work because of the new file formats. They will be updated shortly to work with the revised formats.

Requirements

A working instance of TimescaleDB in PostgreSQL v10+
csvkit
git
Unix/Linux operating system with bash
for OpenRefine time-series automation
- OpenRefine - installed automatically
- openrefine-batch - included
- a Geocode.earth API key - install or free trial. Used to enrich geographic data.
Docker with docker-compose in case you want to run it in containers (optional)

Cloning

A note on cloning this repo, since the COVID19 directory is a git submodule:

after cloning, you must initiate the submodule. In the top level directory for the project, run git submodule init and git submodule update to clone the JHU Repo as a submodule

Content

The files in this directory and how they're used:

covid-19_ingest.sh: script that converts the JHU COVID-19 daily-report data to a time-series database using TimescaleDB.
covid-refine: OpenRefine automation script that converts JHU COVID-19 time-series data into a normalized, enriched format and uploads it to TimescaleDB. (RECOMMENDED)
schema.sql: Data definition (DDL) to create the necessary tables & hypertables.
environment: Default environment values used in Docker containers.

Using the Timescale covid19-ingest script

Create a TimescaleDB instance - download or signup
Create a database named covid_19, and an application user covid19_user

  psql
  create database covid_19;
  create user covid19_user WITH PASSWORD 'your-password-here';
  alter database covid_19 OWNER TO covid19_user;
  \quit

Run schema.sql as the covid19_user. VACUUM/ANALYZE require owner privs

psql -U covid19_user -h <the.server.hostname> -f schema.sql covid_19
Install csvkit
- Ubuntu: sudo apt-get install csvkit
- MacOS: Using homebrew run brew install csvkit
Using a text editor, replace the environment variables for PGHOST, PGUSER and PGPASSWORD in covid-19_ingest.sh
Run the script

bash covid-19_ingest.sh
(OPTIONAL) add shell script to crontab to run daily
Be able to slice-and-dice the data using the full power of PostgreSQL along with Timescale's time-series capabilities!

Using COVIDrefine

NOTE: Due to the changing file format of JHU's daily report data, covid-refine is recommended over covid-19_ingest.sh. COVIDrefine has the added benefit of producing fully normalized, non-sparse, geo-enriched data.

See the detailed README.

If you just want to download the COVIDrefine data, the latest version can be found here.

Using docker-compose

Remember initiate the submodule, run git submodule init
Run docker-compose build
Run docker-compose up
That's all. You can go to Swagger or PostgREST

NOTES

the JHU COVID-19 repository is a git submodule. This was done to automate getting the latest data from their repo.
the script will only work in *nix environment (Linux, Unix, MacOS)
both scripts maintain a hidden directory called ~/.covid-19 in your home directory. -covid-19_ingest.sh checkslastcsvprocessed. Delete that file to process all daily-report files from the beginning, or change the date in the file to start processing files AFTER the entered date.

TODO

use postgREST to add a REST API in front of TimescaleDB database
create a Grafana dashboard
create a Carto visualization
create a Superset visualization

ACKNOWLEDGEMENTS

thanks to Avtar Sewrathan (@avthars), Prashant Sridharan (@CoolAssPuppy) and Mike Freedman (@michaelfreedman) at Timescale for their help & support to implement this project from idea to implementation in 5 days!
thanks to Julian Simioni (@orangejulius) at Geocode.earth for allowing us to use the Geocode.earth API!

Shield:

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

Name		Name	Last commit message	Last commit date
Latest commit History 32 Commits
COVID-19 @ 4effcd5		COVID-19 @ 4effcd5
compose		compose
covid19-refine		covid19-refine
.gitignore		.gitignore
.gitmodules		.gitmodules
LICENSE.md		LICENSE.md
README.md		README.md
covid19-ingest.sh		covid19-ingest.sh
docker-compose.yml		docker-compose.yml
environment		environment
schema.sql		schema.sql
upsert_covid19.sql		upsert_covid19.sql

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

COVID-19 - time-series utilities

Requirements

Cloning

Content

Using the Timescale covid19-ingest script

Using COVIDrefine

Using docker-compose

NOTES

TODO

ACKNOWLEDGEMENTS

About

Releases

Packages

Contributors 3

Languages

License

dathere/covid19-time-series-utilities

Folders and files

Latest commit

History

Repository files navigation

COVID-19 - time-series utilities

Requirements

Cloning

Content

Using the Timescale covid19-ingest script

Using COVIDrefine

Using docker-compose

NOTES

TODO

ACKNOWLEDGEMENTS

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages