Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

I created a CLEANED dataset that combines cases, recoveries, and deaths to 1 CSV file #1241

Open
jbarton311 opened this issue Mar 22, 2020 · 14 comments

Comments

@jbarton311
Copy link

@jbarton311 jbarton311 commented Mar 22, 2020

This is not an issue perse, but I wanted to share this info with the community.

I've created a SINGLE, cleaned dataset that includes ALL cases, recoveries, and deaths into a single CSV file. It also:

  • Contains data in a more "stacked" format and is rolled up by state, country, and date.
  • Cleans all historical US data to roll up at the state level.
  • Adds several additional columns for easier analysis.

I believe it is in an easier to use format than the CSVs posted to this repo (especially when performing analysis with BI tools).

The dataset along with more info (and the code used to create it) can be found here

Thank you JHU for all of the hard work here!

@valeriupredoi
Copy link

@valeriupredoi valeriupredoi commented Mar 22, 2020

very cool! I am already using JHU data for linear analyses here - might switch to yours soon if that's okay with you 🍺

@rufuspollock
Copy link

@rufuspollock rufuspollock commented Mar 22, 2020

Cool. We did this a couple of weeks ago for Open Data Day and repo is here (details in https://www.datopian.com/blog/2020/03/17/odd-covid-19/):

https://github.com/datasets/covid-19

Maybe we can join forces?

There's also json data from DataHub dataset we are publishing here https://datahub.io/core/covid-19#data

@jbarton311
Copy link
Author

@jbarton311 jbarton311 commented Mar 22, 2020

I should have also mentioned - I have leveraged this dataset to build a simple dashboard using Google Data Studio.

@paolinic03
Copy link

@paolinic03 paolinic03 commented Mar 22, 2020

Hey @jbarton311 so when you say the dataset contains all recoveries, why does your simple dashboard show no recoveries? Not seeing the difference. Thanks.

@jbarton311
Copy link
Author

@jbarton311 jbarton311 commented Mar 22, 2020

Hi @paolinic03 ,

There are currently issues with the JHU data for US recoveries. Once those are fixed, the dashboard will display proper US recoveries data. Here is a sample of what data is contained in my dataset. You'll see several recoveries columns.

@paolinic03
Copy link

@paolinic03 paolinic03 commented Mar 22, 2020

@jbarton311 ok makes sense. Appreciate it. Good job with your dashboard. Like the look.

@jbarton311
Copy link
Author

@jbarton311 jbarton311 commented Mar 22, 2020

FYI - based on the recent announcement from JHU, I will have to modify what I've done to match their new data moving forward. Output data format will have to change slightly.

@joshp112358
Copy link

@joshp112358 joshp112358 commented Mar 23, 2020

I made this based on the dataset. It does some basic times series visualisations.

https://joshyp.shinyapps.io/COVID_VIZ/

@verayanakieva
Copy link

@verayanakieva verayanakieva commented Mar 24, 2020

Hi,

Do you plan to continue updating the data (combined incl. recovered)?

https://datahub.io/core/covid-19#data

It'd be great if that's the case.

@jbarton311
Copy link
Author

@jbarton311 jbarton311 commented Mar 25, 2020

@verayanakieva

I plan to continue to update the data but am waiting for JHU to come out with the US specific datasets that they mentioned they'd be releasing. I will not being including recovered as they are no longer tracking that metric.

@valeriupredoi
Copy link

@valeriupredoi valeriupredoi commented Mar 25, 2020

@verayanakieva

I plan to continue to update the data but am waiting for JHU to come out with the US specific datasets that they mentioned they'd be releasing. I will not being including recovered as they are no longer tracking that metric.

I think the changes are out since yesterday's daily dataset, mate - I had to change my data module in my code since since yesterday there's heaps more on individual cities in the US and also a total restructuring of the order in which members are in the table 🍺

@rufuspollock
Copy link

@rufuspollock rufuspollock commented Mar 25, 2020

Do you plan to continue updating the data (combined incl. recovered)?

https://datahub.io/core/covid-19#data

Yes, we (@datasets / @datopian) plan to keep that updated.

BTW the github source is here https://github.com/datasets/covid-19

@jbarton311
Copy link
Author

@jbarton311 jbarton311 commented Mar 25, 2020

@valeriupredoi - I am planning on waiting for them to release the time series CSVs for US which hopefully will be very soon. From what I hear, the county data in the daily CSVs only goes back so far.

@rufuspollock - where are you sourcing recoveries data from? Is it a reliable source of data? As I understood, this metric was not reliably tracked.

@josebsalazar
Copy link

@josebsalazar josebsalazar commented Mar 25, 2020

Cool. We did this a couple of weeks ago for Open Data Day and repo is here (details in https://www.datopian.com/blog/2020/03/17/odd-covid-19/):

https://github.com/datasets/covid-19

Maybe we can join forces?

There's also json data from DataHub dataset we are publishing here https://datahub.io/core/covid-19#data

Do you plan to continue updating the data (combined incl. recovered)?
https://datahub.io/core/covid-19#data

Yes, we (@datasets / @datopian) plan to keep that updated.

BTW the github source is here https://github.com/datasets/covid-19

Thanks, i was going to write my own python script, but I will leverage your work.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
7 participants