Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Upcoming changes in time series tables #1250

Open
CSSEGISandData opened this issue Mar 22, 2020 · 273 comments
Open

Upcoming changes in time series tables #1250

CSSEGISandData opened this issue Mar 22, 2020 · 273 comments

Comments

@CSSEGISandData
Copy link
Owner

@CSSEGISandData CSSEGISandData commented Mar 22, 2020

We will update the time series tables in the following days, aiming to provide a cleaner and more organized dataset consistent with our new/current naming convention. We will also be reporting a new variable (i.e, testing), as well as data at the county level for the US. All files will continue to be updated daily around 11:59PM UTC.

The followiing specific changes will be made:

  • Three new time series tables will be added for the US. The first two will be the confirmed cases and deaths, reported at the county level. The third, number of tests conducted, will be reported at the state level. These new tables will be named time_series_covid19_confirmed_US.csv, time_series_covid19_deaths_US.csv, time_series_covid19_testing_US.csv, respectively.

  • Changes to the current time series include the removal of the US state and county-level entries, which will be replaced with a new single country level entry for the US. The tables will be renamed time_series_covid19_confirmed_global.csv and time_series_covid19_deaths_global.csv, and time_series_covid19_testing_global.csv, respectively.

  • The ISO code will be added in the global time series tables.

  • The FIPS code will be added in the new US time series tables.

  • We will no longer provide recovered cases.

  • The current set of time series files will be moved to our archive folder, and the new files will be added to the current folder.

Thanks!

Update: time_series_covid19_recovered_global.csv is added.

@DataChant
Copy link

@DataChant DataChant commented Mar 22, 2020

Will recovered cases still be reported on the daily CSV files? Will they reflect the daily recovered or aggregated?

@CSSEGISandData
Copy link
Owner Author

@CSSEGISandData CSSEGISandData commented Mar 22, 2020

@DataChant No recovered cases will be reported in the daily reports and the time series tables.

Update: we newly added recovered time series table for most countries. Thanks!

@paolinic03
Copy link

@paolinic03 paolinic03 commented Mar 22, 2020

Woah, major news. Let’s do this. Bummed about no recovered but seems to be difficult to collect. County level data is going to be massive. Thank you

@billyburgoa
Copy link

@billyburgoa billyburgoa commented Mar 22, 2020

Thanks for your work. I'd like to know why you won't report or provide recovered cases.

@CSSEGISandData
Copy link
Owner Author

@CSSEGISandData CSSEGISandData commented Mar 22, 2020

No reliable data source reporting recovered cases for many countries, such as the US.

@ryanwoconnor
Copy link

@ryanwoconnor ryanwoconnor commented Mar 22, 2020

Can you please provide us a date/time for that cutover?
Can we place these new files into a different folder and leave the old files in place?
This way current dashboards that we may have running won't be full of errors when the cutover happens?

Thank you,
Ryan

@DMiradakis
Copy link

@DMiradakis DMiradakis commented Mar 22, 2020

Thanks so much! I'm making a Power BI Report now, so it's good to know about these upcoming changes!

@bevanward
Copy link

@bevanward bevanward commented Mar 22, 2020

Thanks @CSSEGISandData
With respect to your second bullet point, will Province/State remain for countries (excluding US) where you can source the data?

Changes look good - thanks for all the hard work - this is a very important data set!
Bevan

@christophGeoHealthCentre

How do you count actice cases without having recovered available?

@paolinic03
Copy link

@paolinic03 paolinic03 commented Mar 22, 2020

You don’t, just confirmed, deaths, and testing.

@DMiradakis
Copy link

@DMiradakis DMiradakis commented Mar 22, 2020

How do you count actice cases without having recovered available?

I'm just grouping the difference together into a group called "Active or Recovered". Like @paolinic03 said , it's the best we can do for the moment.

@analyzewithpower
Copy link

@analyzewithpower analyzewithpower commented Mar 22, 2020

THANK YOU!!! :)

@shahesam84
Copy link

@shahesam84 shahesam84 commented Mar 23, 2020

Will there be a release for those mentioned tables today? I don't see US tables yet.

@aatishb
Copy link

@aatishb aatishb commented Mar 23, 2020

Thank you for this. This really is an amazing resource, and I'm excited for these changes. I recommend pinning this issue so that folks don't miss it. https://help.github.com/en/github/managing-your-work-on-github/pinning-an-issue-to-your-repository

@zevarito
Copy link

@zevarito zevarito commented Apr 7, 2020

I cannot find time_series_covid19_testing_global can you point me out where is that published? Thanks!

@osrodas007
Copy link

@osrodas007 osrodas007 commented Apr 7, 2020

@jcampos8782
Copy link

@jcampos8782 jcampos8782 commented Apr 7, 2020

In case anyone needs it, I have written some scripts that append the US data to the global files using the same format as the global file. I commit the concatenated files up to my github account once per day.

https://github.com/jcampos8782/COVID19-API/tree/master/init/data/processed/covid19

This data is part of a bigger project which puts this data set into MongoDB and enables geolocation lookups of the series data. In other words, given a latitude and longitude, you can resolve regions associated with the geolcation and then find series data for those regions. The location data is from Google's APIs and falls back on the names provided in the data files if the region names cannot be resolved for the coordinates (for example, the Grand Princess data row).

The documentation for the mongo database is here:
https://github.com/jcampos8782/COVID19-API/tree/master/init/mongo-init

There are also REST APIs using SpringBoot in this project and a React/Redux UI. If any particular part of this project is useful to you, feel free to fork or contribute. Thanks!

@jcampos8782
Copy link

@jcampos8782 jcampos8782 commented Apr 7, 2020

Has anyone come across time series data for individual Mexico states by any chance? Looking for something similar to what we have here for US states.

@nugget-dot
Copy link

@nugget-dot nugget-dot commented Apr 7, 2020

Very interesting 🤔

RouxRC added a commit to boogheta/coronavirus-countries that referenced this issue Apr 8, 2020
CSSEGISandData/COVID-19#1250 (comment)
Remove disclaimer and whole oldrecovered data mess
only disable recovered and sick for Canada and USA scopes with missing source
@jcampos8782
Copy link

@jcampos8782 jcampos8782 commented Apr 8, 2020

I have added Mexico state-level data to my data sets from some scraping. I can scrap daily to get data from this point forward, but I have no historical data. If anyone comes across it, could you please point me in the direction?

https://github.com/jcampos8782/COVID19-API/tree/master/init/mongo-init/data/processed/covid19

@owahltinez
Copy link

@owahltinez owahltinez commented Apr 8, 2020

@jcampos8782 check out https://github.com/open-covid-19/data. We data data all the way back to the first reported case.

@jcampos8782
Copy link

@jcampos8782 jcampos8782 commented Apr 8, 2020

@owahltinez Thank you! I appreciate your work. It looks like you are using https://github.com/carranco-sga/Mexico-COVID-19 as the canonical source for Mexico so I am going to go with that. I've got to give credit where credit is due.

@owahltinez
Copy link

@owahltinez owahltinez commented Apr 8, 2020

@jcampos8782 I'm not trying to take any credit for the source, the Open COVID-19 dataset is not the canonical source of any of the data -- and neither is the repo you linked, canonical data comes from the local authorities. The main purpose of my repo is to provide a consistent dataset automatically populated with a source of data as close as possible to the official, local authority.

@jcampos8782
Copy link

@jcampos8782 jcampos8782 commented Apr 8, 2020

@owahltinez I didn't mean to make it seem like you were. Sorry if it sounded like that. Thanks for your work! I just wanted to link to the data set you were deriving yours from to give him some credit. I realize that none of this is canonical data and most of its from screen scrapes from all over the web. Like you said, just trying to get as close as possible.

@leandroparedes
Copy link

@leandroparedes leandroparedes commented Apr 10, 2020

Is testing going to refer to ‘performed tests’ or ‘individuals tested’?

Any update on this?

@blisx264
Copy link

@blisx264 blisx264 commented Apr 10, 2020

We will update the time series tables in the following days, aiming to provide a cleaner and more organized dataset consistent with our new/current naming convention. We will also be reporting a new variable (i.e, testing), as well as data at the county level for the US. All files will continue to be updated daily around 11:59PM UTC.

The followiing specific changes will be made:

  • Three new time series tables will be added for the US. The first two will be the confirmed cases and deaths, reported at the county level. The third, number of tests conducted, will be reported at the state level. These new tables will be named time_series_covid19_confirmed_US.csv, time_series_covid19_deaths_US.csv, time_series_covid19_testing_US.csv, respectively.
  • Changes to the current time series include the removal of the US state and county-level entries, which will be replaced with a new single country level entry for the US. The tables will be renamed time_series_covid19_confirmed_global.csv and time_series_covid19_deaths_global.csv, and time_series_covid19_testing_global.csv, respectively.
  • The ISO code will be added in the global time series tables.
  • The FIPS code will be added in the new US time series tables.
  • We will no longer provide recovered cases.
  • The current set of time series files will be moved to our archive folder, and the new files will be added to the current folder.

Thanks!

Update: time_series_covid19_recovered_global.csv is added.

Good decision, not sure about anywhere else in the world, but Australia / QLD was only tested travellers, or those showing severe respiratory. Many have been sent home untested, hence they may have recovered and there's no data to say they have. Our state is now moving to testing non-travelling with fevers or contact with others. While the health systems are mitigating loads, recovered data is a feel good factor and cant be gathered completely

@Kratoklastes
Copy link

@Kratoklastes Kratoklastes commented Apr 12, 2020

Is there any plan for GSSE to include counts of cases/comorbidities/hospitalisations/deaths by age-cohort?

There is a very large change in age-group CFR - so much so that's it's criminal to report an age-agnostic CFR: it understates risk for over-70s by a factor of 4, and overstates risk for under50s by two orders of magnitude. Any metric with that characteristic is not worth having

At the moment very few countries are making age-cohort data readily available: (in the US, age-cohort deaths and hospitalisations are available with a long lag; age-cohort cases are somewhat easier; age-cohort case-comorbidities data are like hen's teeth. NYC's API makes all of it available daily, so it's obviously doable.

Hospitalisations by age-cohort are in CDC's COIVD-NET database - they are presented on a webpage - but the mechanism to download is like someone copied a udemy GUI assignment (clunky and not readily amenable to algorithmic download). There's an API endpoint at gis.cdc.gov/grasp/covid19_3_api but it's undocumented (easy enough to work out, tho).

In Australia cases by age-cohort are updated daily, but put in a repository that's hard to find, in a way that's moderately difficult to scrape: another 'HelloWorld' effort (got it done tho: python requests FTW).

The UK appears committed to not producing it (Oxford actually used Chinese age-cohort data to do a study in mid-March: why not their own?)

Iceland has a good API.

GSSE's efforts thus far seem more about giving HelloWorld coders a change to put coloured dots on maps, and to scare people who don't understand the irrelevance of CFR when 'F' has an inherently different probability structure in a priori identifiable cohorts of the population.

Italian data look benign if you get age-cohort data - that's how important age-cohort data is are.

@osrodas007
Copy link

@osrodas007 osrodas007 commented Apr 12, 2020

@hrebohm
Copy link

@hrebohm hrebohm commented Apr 12, 2020

Maybe there is an error in the dataset of Germany. The number of deaths decreased today in the datafile „time_series_covid19_deaths_global.csv“. Please correct the numbers.

@CaptainChemist
Copy link

@CaptainChemist CaptainChemist commented Apr 15, 2020

What's the ETA on adding the ISO codes for time_series_covid19_deaths_global and time_series_covid19_confirmed_global? It would make things much easier to compare to maps rather than the Province/State Country/Region. I've also tried the lat and long as a way to get the ISO from the lookup table that you provide, but the numbers don't exactly match for a given country.

Thanks for all of your hard work, this is amazing!

@cipriancraciun
Copy link

@cipriancraciun cipriancraciun commented Apr 15, 2020

@CaptainChemist -- as I've written a few weeks ago on this same issue, thus sorry for repeating this -- I have derived and augmented the JHU, NY Times and ECDC datasets, and among other augmentations I've also included the ISO country codes for all three datasets (and their variants).

I have described this in #1281 and it is available at https://github.com/cipriancraciun/covid19-datasets

Thus you could use these derived datasets until JHU does the updates you require. Moreover, you can also use and compare the ECDC dataset against the JHU one.

@SoundSpinning
Copy link

@SoundSpinning SoundSpinning commented Apr 15, 2020

The latest (cumulative) on 14-Apr-2020 Confirmed value for France is lower than the previous day value, typo?
image

@osrodas007
Copy link

@osrodas007 osrodas007 commented Apr 15, 2020

@SoundSpinning
Copy link

@SoundSpinning SoundSpinning commented Apr 15, 2020

Worldometer marks: 143,303 Atentamente / Best Regards ------------------------------------------------- Oscar Rodas "Life's too short, stop fooling around."

On Wed, Apr 15, 2020 at 1:03 AM Sound Spinning @.***> wrote: The latest (cumulative) on 14-Apr-2020 Confirmed value for France is lower than the previous day value, typo? [image: image] https://user-images.githubusercontent.com/12704331/79307897-8c0a7300-7eef-11ea-881a-d5d7932062c3.png — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#1250 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABKAGX6UVMWG6SSRM27D2L3RMVL4ZANCNFSM4LRMVWIQ .

I'm referring to this github database we are dealing with here. If you go to their live map app, and click on France see below the graph showing the lower latest point. Where did you get your value from?
image

@zevarito
Copy link

@zevarito zevarito commented Apr 15, 2020

@SoundSpinning probably they have to adjust counters, for example in my country they adapted the number once they realized that you can't count total positive tests as total confirmed cases since some people have been tested twice. Not saying that the same is happening with France but is a possibility.

@SoundSpinning
Copy link

@SoundSpinning SoundSpinning commented Apr 15, 2020

@SoundSpinning probably they have to adjust counters, for example in my country they adapted the number once they realized that you can't count total positive tests as total confirmed cases since some people have been tested twice. Not saying that the same is happening with France but is a possibility.

Anything is possible since most countries are counting cases differently, and with non-consistent methods. Good point though, it could just be an adjust made on the last entry alone. Just odd to see a cumulative value lower than a previous one, but not that important.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet