Skip to content
Permalink
Browse files
update
  • Loading branch information
CSSEGISandData committed Mar 11, 2020
1 parent d417797 commit 0cea9b2179306618bd7917798819ebf6608d67de
Showing 3 changed files with 1,002 additions and 801 deletions.

13 comments on commit 0cea9b2

@aatishb
Copy link

@aatishb aatishb commented on 0cea9b2 Mar 11, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi,

I think are there some issues with this update:

  • Republic of Korea and South Korea are the same country
  • Iran and Iran (Islamic Republic of) are the same country
  • Hong Kong and Hong Kong (SAR) are the same country
  • Many entries for 3/10/20 have missing data

@eugene-yang
Copy link

@eugene-yang eugene-yang commented on 0cea9b2 Mar 11, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, there is a duplicate entry for Taiwan as well.
The updated version includes an entry "Taiwan, Taipei and environs" which is inconsistent with the previous records, which were using "Taiwan, Taiwan".

@vladchel
Copy link

@vladchel vladchel commented on 0cea9b2 Mar 11, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The countries rename produced bad data. We have two records:

,South Korea,36.0,128.0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,2,2,6,8,10,12,13,13,16,17,28,28,35,35,42,44,50,53,
,Republic of Korea,36.0,128.0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,54

Old name haven't the last data and the new have all zeros except the last. The same for other renamed records.

@halfvector
Copy link

@halfvector halfvector commented on 0cea9b2 Mar 11, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for adding State-level US data.
To add to what @aatishb said: all US cities are missing data for 3/10/20.

@eugene-yang
Copy link

@eugene-yang eugene-yang commented on 0cea9b2 Mar 11, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@aatishb
Copy link

@aatishb aatishb commented on 0cea9b2 Mar 11, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's an issue open for this: #405

@Moelf
Copy link

@Moelf Moelf commented on 0cea9b2 Mar 11, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

US is still 605......

@sjmackenzie
Copy link

@sjmackenzie sjmackenzie commented on 0cea9b2 Mar 11, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm assuming the Hong Kong rename has obliterated Hong Kong from the "Total Confirmed" section of the website?

@JBrooks137
Copy link

@JBrooks137 JBrooks137 commented on 0cea9b2 Mar 11, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi,
It looks like there's some inconsistency between state level and county level data in the US? Take Washington as an example - the new 'Washington' state classification shows 267 cases for 10-Mar but from county level data I only get to 162

@Tweeb123
Copy link

@Tweeb123 Tweeb123 commented on 0cea9b2 Mar 11, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In addition to @JBrooks137 comments, having double data for state and city/county is confusing. No other countries have double data like this. Please review.

@danslee
Copy link

@danslee danslee commented on 0cea9b2 Mar 12, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First, let me say that I really really appreciate this data set. I'm sure I represent a large number of academics and software/data savvy people when I say that this has been an invaluable resource.

That said, I think it behooves you to maintain as much forwards and backwards compatibility as possible. If the structure of data is going to change, it's much better to give some advance warning and preserve backwards compatibility in existing files. If a format change is absolutely necessary, you can mark the old file deprecated and create a new file with a new name and new notation. The current time series files are broken going backwards and forwards for the COVID-19/csse_covid_19_data/csse_covid_19_time_series/time_series_19*.csv files are severely broken.

Please don't break the who_covid_19_situation_reports/who_covid_19_sit_rep_time_series/who_covid_19_sit_rep_time_series.csv without considering some of the steps I've outlined above. Thanks!

@eugene-yang
Copy link

@eugene-yang eugene-yang commented on 0cea9b2 Mar 12, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@danslee You have a really great point. But given the fact that the team working on this might not have a strong CS/Data Science background, I'm not sure whether they would have the capacity of maintaining this repo with these compatibilities.

Having said that -- I think a better way is to either help them with this repository or create a fork/seperate repo to support better usability.
I am not down-playing their contribution, I do think they have provided us a great resource, but I think people have different priorities and I would really like them to focus on the correctness and speedy update of the numbers.

Just my two cents.

@danslee
Copy link

@danslee danslee commented on 0cea9b2 Mar 12, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@eugene-yang I'd love to help out, but the only data I can see is what they have checked into the repo. I am working on some scripts which will unify the time-series csv files with unified naming schemes and such, but have run into what are clearly some doubly entered data points around 2020-03-10 and 03-11 which brings the integrity of the entire file into question. Hopefully, the morning will bring some order to the data chaos.

Please sign in to comment.