There was a problem hiding this comment.
The reason will be displayed to describe this comment to others. Learn more.
I think are there some issues with this update:
Sorry, something went wrong.
Also, there is a duplicate entry for Taiwan as well.
The updated version includes an entry "Taiwan, Taipei and environs" which is inconsistent with the previous records, which were using "Taiwan, Taiwan".
The countries rename produced bad data. We have two records:
,Republic of Korea,36.0,128.0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,54
Old name haven't the last data and the new have all zeros except the last. The same for other renamed records.
Thank you for adding State-level US data.
To add to what @aatishb said: all US cities are missing data for 3/10/20.
This inconsistency might be coming from the latest daily report. Might be changing sources.
There's an issue open for this: #405
US is still 605......
I'm assuming the Hong Kong rename has obliterated Hong Kong from the "Total Confirmed" section of the website?
It looks like there's some inconsistency between state level and county level data in the US? Take Washington as an example - the new 'Washington' state classification shows 267 cases for 10-Mar but from county level data I only get to 162
In addition to @JBrooks137 comments, having double data for state and city/county is confusing. No other countries have double data like this. Please review.
First, let me say that I really really appreciate this data set. I'm sure I represent a large number of academics and software/data savvy people when I say that this has been an invaluable resource.
That said, I think it behooves you to maintain as much forwards and backwards compatibility as possible. If the structure of data is going to change, it's much better to give some advance warning and preserve backwards compatibility in existing files. If a format change is absolutely necessary, you can mark the old file deprecated and create a new file with a new name and new notation. The current time series files are broken going backwards and forwards for the COVID-19/csse_covid_19_data/csse_covid_19_time_series/time_series_19*.csv files are severely broken.
Please don't break the who_covid_19_situation_reports/who_covid_19_sit_rep_time_series/who_covid_19_sit_rep_time_series.csv without considering some of the steps I've outlined above. Thanks!
@danslee You have a really great point. But given the fact that the team working on this might not have a strong CS/Data Science background, I'm not sure whether they would have the capacity of maintaining this repo with these compatibilities.
Having said that -- I think a better way is to either help them with this repository or create a fork/seperate repo to support better usability.
I am not down-playing their contribution, I do think they have provided us a great resource, but I think people have different priorities and I would really like them to focus on the correctness and speedy update of the numbers.
Just my two cents.
@eugene-yang I'd love to help out, but the only data I can see is what they have checked into the repo. I am working on some scripts which will unify the time-series csv files with unified naming schemes and such, but have run into what are clearly some doubly entered data points around 2020-03-10 and 03-11 which brings the integrity of the entire file into question. Hopefully, the morning will bring some order to the data chaos.