Skip to content
This repository has been archived by the owner on Mar 10, 2023. It is now read-only.

Commit

Permalink
update
Browse files Browse the repository at this point in the history
  • Loading branch information
CSSEGISandData committed Mar 11, 2020
1 parent d417797 commit 0cea9b2
Show file tree
Hide file tree
Showing 3 changed files with 1,002 additions and 801 deletions.
Loading

13 comments on commit 0cea9b2

@aatishb
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi,

I think are there some issues with this update:

  • Republic of Korea and South Korea are the same country
  • Iran and Iran (Islamic Republic of) are the same country
  • Hong Kong and Hong Kong (SAR) are the same country
  • Many entries for 3/10/20 have missing data

@eugene-yang
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, there is a duplicate entry for Taiwan as well.
The updated version includes an entry "Taiwan, Taipei and environs" which is inconsistent with the previous records, which were using "Taiwan, Taiwan".

@vladchel
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The countries rename produced bad data. We have two records:

,South Korea,36.0,128.0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,2,2,6,8,10,12,13,13,16,17,28,28,35,35,42,44,50,53,
,Republic of Korea,36.0,128.0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,54

Old name haven't the last data and the new have all zeros except the last. The same for other renamed records.

@halfvector
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for adding State-level US data.
To add to what @aatishb said: all US cities are missing data for 3/10/20.

@eugene-yang
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@aatishb
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's an issue open for this: #405

@Moelf
Copy link

@Moelf Moelf commented on 0cea9b2 Mar 11, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

US is still 605......

@sjmackenzie
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm assuming the Hong Kong rename has obliterated Hong Kong from the "Total Confirmed" section of the website?

@JBrooks137
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi,
It looks like there's some inconsistency between state level and county level data in the US? Take Washington as an example - the new 'Washington' state classification shows 267 cases for 10-Mar but from county level data I only get to 162

@Tweeb123
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In addition to @JBrooks137 comments, having double data for state and city/county is confusing. No other countries have double data like this. Please review.

@danslee
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First, let me say that I really really appreciate this data set. I'm sure I represent a large number of academics and software/data savvy people when I say that this has been an invaluable resource.

That said, I think it behooves you to maintain as much forwards and backwards compatibility as possible. If the structure of data is going to change, it's much better to give some advance warning and preserve backwards compatibility in existing files. If a format change is absolutely necessary, you can mark the old file deprecated and create a new file with a new name and new notation. The current time series files are broken going backwards and forwards for the COVID-19/csse_covid_19_data/csse_covid_19_time_series/time_series_19*.csv files are severely broken.

Please don't break the who_covid_19_situation_reports/who_covid_19_sit_rep_time_series/who_covid_19_sit_rep_time_series.csv without considering some of the steps I've outlined above. Thanks!

@eugene-yang
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@danslee You have a really great point. But given the fact that the team working on this might not have a strong CS/Data Science background, I'm not sure whether they would have the capacity of maintaining this repo with these compatibilities.

Having said that -- I think a better way is to either help them with this repository or create a fork/seperate repo to support better usability.
I am not down-playing their contribution, I do think they have provided us a great resource, but I think people have different priorities and I would really like them to focus on the correctness and speedy update of the numbers.

Just my two cents.

@danslee
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@eugene-yang I'd love to help out, but the only data I can see is what they have checked into the repo. I am working on some scripts which will unify the time-series csv files with unified naming schemes and such, but have run into what are clearly some doubly entered data points around 2020-03-10 and 03-11 which brings the integrity of the entire file into question. Hopefully, the morning will bring some order to the data chaos.

Please sign in to comment.