New US file #1527
Comments
Second this problem. Please give an update. |
I've been pulling in state level data from daily report and joining that with time_series data since 3/23. It was a lot of work since daily_report and time_series are under different format, but could be a solution if you need the state-level data somehow urgently |
That's amazing. I don't have the time to do that, but thanks for the tip.
On Wednesday, March 25, 2020, 9:58:40 AM EDT, ei-JoanneT <notifications@github.com> wrote:
I've been pulling in state level data from daily report and joining that with time_series data since 3/23. It was a lot of work since daily_report and time_series are under different format, but could be a solution if you need the state-level data somehow urgently
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub, or unsubscribe.
|
Thanks. I saw the daily_report (most likely) includes the same data, but it seems much more efficient overall if the updated time series are provided centrally (as they were a few days ago). One other question: It looks like the daily_report is at US county level - have you confirmed that the sum of the county data in the daily_report gives the same total as state-level data in the time_series? |
The daily report has inconsistent data in the field that contains the Province/State. Also, in the 3/23 and later datasets the field is now called Province_State. This field contains a City, ST in some cases, in others it has the name of the state in long form. It will require a fair amount of cleaning up in order to make use of it. The datasets also have the Country/Region problems where the names of countries were changed. Mainland China is also listed as China. Taiwan is Taiwan, Taipei and environs and at least one other representation. Exercise caution when using this data due to the many inconsistencies. I've created a Jupyter notebook in Python 3.8 that reads all of these daily files and creates a single CSV. It does NOT do the data cleansing, yet. If you want it, it is attached. Note - there was a small coding error that has been fixed. The error caused the 3/22 dataset to be read and appended 3 times but the 3/23 and 3/24 datasets were not read. |
I totally agree with the efficiency issue. I had to do it because there was a urgent need, and I cannot wait for a more reasonable data structure to make my day easier I have been using the time_series_global since it was updated over the weekend, and it does not contain any state-level data so I cannot really check. Have they also updated the old time_series ones? I thought they already moved to the _global files so I am gathering everything of US (both state and county-level) from daily report |
Hello! Chief Data Officer of San Francisco here and we really need state and county level data so we can inform our response in San Francisco and comingle datasets with other local data. We can build our own pipeline of daily reports to timeseries, but I want to assess if this is necessary or if the US file is imminent. Any signal about when this is coming would be helpful. |
@jasonlally The encoding of counties in this data is inconsistent over time and I wouldn't expect it to be fixed any time soon. I found another source that has California county-level data here: https://coronavirus.1point3acres.com/#map You might consider using that as your source. |
Yeah, we are using that one at the moment. We'll continue to use then, but eagerly awaiting comprehensive datasets from John's Hopkins. Thanks for warning us away from trying to build a pipeline off the dailies. |
Please let us know where to get US state (or county) information. I have been unable to find this. |
For anybody doing USA by-state time series visualization in Javascript, here is an example of doing this aggregation on a webpage. It automatically loads and sums as many individual days as needed, starting 3/23 so can hold us over until the new time series appears. Although the data structure is not commented in detail, this is what's used for the visualization in https://covid19chart.org/, so you can see the raw merged datastructure by using the js debugger on that site, and inspecting the "csse" variable there. You can do time series rollups by executing a function call like this (to get a time series for Kansas for example):
Maybe somebody will find it helpful. |
Duplicate of #1534 |
davidbau, an outstanding site. Thanks. |
Where is it? You stopped supporting the database to offer a new "clean" dataabase with more US data fields. Where is it? We have zero state level data since the weekend.
The text was updated successfully, but these errors were encountered: