Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Time series utilities and enhanced data now available #283

Open
jqnatividad opened this issue Mar 6, 2020 · 2 comments
Open

Time series utilities and enhanced data now available #283

jqnatividad opened this issue Mar 6, 2020 · 2 comments

Comments

@jqnatividad
Copy link

@jqnatividad jqnatividad commented Mar 6, 2020

First off, thanks JHU for exposing the data behind the dashboard. As an open data advocate, JHU's example should be encouraged and celebrated!

However, the data needs a little data-wrangling for it to be more useful for time-series analysis:

  • the numbers are running totals, not daily counts, making it hard to do queries over arbitrary date ranges and locations, and to compute things like rate of infection/deaths/recovery, and do benchmarking.
  • the location metadata is not detailed enough to do more granular geographic analysis

But since this is open data and open source, I decided to scratch an itch and pulled together these utilities: :)

https://github.com/dathere/covid19-time-series-utilities

Currently, there are two utilities.

  • covid-19_ingest.sh: script that converts the JHU COVID-19 daily-report data to a time-series database using TimescaleDB.
  • covid-refine: OpenRefine automation script that converts JHU COVID-19 time-series data into a normalized, enriched format and uploads it to TimescaleDB.

Here are some examples of the processed data:

Finally, here's a blogpost on the benefits of normalizing the data and feeding it to a true time-series database.

https://blog.timescale.com/blog/charting-the-spread-of-covid-19-using-timescale/

@ladris
Copy link

@ladris ladris commented Mar 6, 2020

If this goes in I'll close my feature request. Seems to nail what I was looking for.

@federico-vitale
Copy link

@federico-vitale federico-vitale commented Mar 25, 2020

Hi @jqnatividad
I've built some analysis getting data from the NYC dataset, but now that tables have changed it seems not to be updated anymore. Is there something in the works to fix it or should I search some other source somewhere else? Thank you very much from Italy!!!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
4 participants