Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deprecated time_series_19-covid-Confirmed.csv but no US State data in the Global confirmed file #1534

Open
allenbroadman opened this issue Mar 25, 2020 · 18 comments

Comments

@allenbroadman
Copy link

@allenbroadman allenbroadman commented Mar 25, 2020

Hi. Thank you for all the efforts at providing this data.

time_series_19-covid-Confirmed.csv has been deprecated and we are instructed in the Readme file to use the Global confirmed file instead.

I have been relying on time_series_19-covid-Confirmed.csv for United States individual state data which is not contained in the Global confirmed file.

Where are you putting the USA state data?

@davidlyon3
Copy link

@davidlyon3 davidlyon3 commented Mar 25, 2020

These new files is nothing short of a DISASTER

@smillerd
Copy link

@smillerd smillerd commented Mar 25, 2020

This is a big bummer. I was making decent progress on building my monitoring suite on this data for my US state. I'm looking for fresh sources, but they all seem to trace back to this repo! What a mess.

@cjparisi
Copy link

@cjparisi cjparisi commented Mar 25, 2020

refer to issue: #1250

@soaring52
Copy link

@soaring52 soaring52 commented Mar 25, 2020

Please add US states (and all other states/provinces that might be missing). Many of us depend on the full set of data.
Also, why the "recovered" cases are not reported? It would be good to bring them back. I am aware of issue: #1250!!!
The people age for all 3 cases (confirmed, deaths, and recovered) would significantly add the value to the "cleaned" data sets.
Thanks for all the effort and time putting this together
_i

@soaring52
Copy link

@soaring52 soaring52 commented Mar 25, 2020

### Canada States: ['Alberta' 'British Columbia' 'Grand Princess' 'Manitoba' 'New Brunswick'
'Newfoundland and Labrador' 'Nova Scotia' 'Ontario'
'Prince Edward Island' 'Quebec' 'Saskatchewan' 'Diamond Princess'
'Recovered']
The highlighted looks like a bug

@josephsdavid
Copy link

@josephsdavid josephsdavid commented Mar 25, 2020

Agreed with this entirely, the sudden data change is outrageous

@Tolga28A
Copy link

@Tolga28A Tolga28A commented Mar 25, 2020

Dear JH researchers,

I can't tell how much I appreciate this amazing work you've been putting in these dark times. I see that I am not the only one who has started building models based on your US-territory data until two days ago, and our deliverables might impact the progression of this disease in global scale (i.e. I am a data scientist in one of the largest global pharma companies).

I personally relied on the fact that you said you were going to release a separate dataset for US; however, we can see neither an explanation nor any data. And as you'd imagine, time is super critical especially for predictions like disease spread forecast and we can see the US data updated in the dashboard. So please could someone at least give an update about where you are at publishing US data again? If this is not going to happen for some reason, then we can go and try to find other data sources instead of losing time here...

@AndBurns
Copy link

@AndBurns AndBurns commented Mar 25, 2020

Concur with above

@mrwallison
Copy link

@mrwallison mrwallison commented Mar 25, 2020

This seems to have been a step backward and a loss of granularity, especially for the united states.

@cognospaul
Copy link

@cognospaul cognospaul commented Mar 25, 2020

Until the granular data files come back here, I've been sourcing my data from usafacts.org:
https://usafacts.org/visualizations/coronavirus-covid-19-spread-map/

I've seen a few issues with the data, such as deaths in Michigan disappearing. Overall I think it's a decent source.

@Tolga28A
Copy link

@Tolga28A Tolga28A commented Mar 25, 2020

Until the granular data files come back here, I've been sourcing my data from usafacts.org:
https://usafacts.org/visualizations/coronavirus-covid-19-spread-map/

I've seen a few issues with the data, such as deaths in Michigan disappearing. Overall I think it's a decent source.

This is great, thank you so much!

@mskymoore
Copy link

@mskymoore mskymoore commented Mar 25, 2020

Agreed. Would really appreciate the granularity for US states especially.

@cognospaul
Copy link

@cognospaul cognospaul commented Mar 25, 2020

also, another tidbit. If you want to include county population, you can pull that from:

https://www.census.gov/data/datasets/time-series/demo/popest/2010s-counties-total.html

The join would be on usafacts.statesFIPS to census.STATE and usafacts.countyFIPS to (if(census.COUNTY=0) then (0) else (census.COUNTY + (census.STATE*1000)))

@rks125
Copy link

@rks125 rks125 commented Mar 25, 2020

I recreated the Time Series CSV files for Confirmed, Recovered, and Deaths with US State data:

https://www.soothsawyer.com/john-hopkins-time-series-data-confirmed-case-csv-after-march-22-2020/

I also included the PowerQuery tool I created to make those CSV files. This should hold you over. I currently plan on refreshing the CSV files by 5:15pm PT daily, until no longer necessary.

@allenbroadman
Copy link
Author

@allenbroadman allenbroadman commented Mar 25, 2020

Thanks very much rks125. This collectively saves a lot of duplicated effort for all of us basing our models on the deprecated file format, until Johns Hopkins gets this working.

@soaring52
Copy link

@soaring52 soaring52 commented Mar 25, 2020

Please be aware that in the "deaths" data set the country "The West Bank and Gaza" is missing. If you want to correlate number of deaths vs. global reported cases you need to remove the "The West Bank and Gaza" from the "global" dataset, first.
It is great that others provide alternatives for the "US states" data, but it would be still the best if we can have access to the (CDC approved) official single and validated dataset for all of us to use.
Thanks all.

@pybokeh
Copy link

@pybokeh pybokeh commented Mar 31, 2020

FYI, time series data for US has been released: #1250

@josephsdavid
Copy link

@josephsdavid josephsdavid commented Mar 31, 2020

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet