Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

US Recovered Cases #1113

Open
CSSEGISandData opened this issue Mar 20, 2020 · 28 comments
Open

US Recovered Cases #1113

CSSEGISandData opened this issue Mar 20, 2020 · 28 comments

Comments

@CSSEGISandData
Copy link
Owner

@CSSEGISandData CSSEGISandData commented Mar 20, 2020

Due to the inconsistency in reporting recovered cases around the US, we have decided to report recovered cases at the country level until a more reliable source for recovered cases becomes available.

@JiPiBi
Copy link

@JiPiBi JiPiBi commented Mar 20, 2020

Pls , what is the impact in time series .csv ? You create a line US, US in every timeseries ? Please explain

@aheib987
Copy link

@aheib987 aheib987 commented Mar 20, 2020

The time series data on 3-18 had a US, US field where the total recovery count was listed, which was rolling up the total recoveries in the US (that was great). On the upload yesterday evening, there is no US, US row showing the total recoveries.

Was this an error? Will US recoveries show today?

Thanks for all the work you have been doing around this! :)

@JiPiBi
Copy link

@JiPiBi JiPiBi commented Mar 20, 2020

In the recovered I see a value of 108 from 19th
There is also a US,US line in Confirmed , with a value of 1 from today ? why ?
and in Deaths a value 0

@smouksassi
Copy link

@smouksassi smouksassi commented Mar 20, 2020

bug

putting a visual on this

@aheib987
Copy link

@aheib987 aheib987 commented Mar 20, 2020

@JiPiBi yes, in the 03-19-2020 daily report, you do see the US, US column showing total recoveries. The problem we are talking about is there is no US, US row in the time series data that was posted on 3-19-2020.

@JiPiBi
Copy link

@JiPiBi JiPiBi commented Mar 20, 2020

@aheib987 the value I gave was observed in my time series

US,US,37.0902,-95.7129,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,108  
@aheib987
Copy link

@aheib987 aheib987 commented Mar 20, 2020

@JiPiBi can you send a link to that file? This time series file loaded last night does not have a row that lists the US in the Province/State field.

@paolinic03
Copy link

@paolinic03 paolinic03 commented Mar 20, 2020

aheib987 is right and I have not been able to use the file for the last 2 days because of this reading. It showing the US recoveries as 0 meaning that no one has recovered. A few days ago it was at 17. We are reporting misinformation and it is troubling.

@ryanwoconnor
Copy link

@ryanwoconnor ryanwoconnor commented Mar 20, 2020

Understood. I'm sure this is a challenging data collection effort. When can we expect to see this change? Even at a country level this would be good to have.

Can you also explain to us a bit on what some of the inconsistencies are?

Thank you,
Ryan

@JiPiBi
Copy link

@JiPiBi JiPiBi commented Mar 20, 2020

@aheib987 quite strange , my csv file comes from this site, my PR was made at about 1:00 UTC and now on the site I dont see anymore my line .....
But they made modifications : I see now that they suppressed duplicates , click on the left up link and you should see the suppressed lines in red as I did ( see line 478)

@aheib987
Copy link

@aheib987 aheib987 commented Mar 20, 2020

@JiPiBi yeah, it was correct on the 18th but then dropped off on the 19th. I have some python that calls out and deletes the old files and re-downloads the new ones and I noticed the US dropped to 0 for recovered. Hopefully in the upload today they'll place it back in. You can see in their daily report for the 19th, US is listed there like you showed above but gone in the time series file.

Daily File Link and Snapshot
C2ACBBC8-DA43-409D-A197-CD363688A8AC

@JiPiBi
Copy link

@JiPiBi JiPiBi commented Mar 20, 2020

I understand, but now for reliable data , my favourite site is https://www.worldometers.info/coronavirus/ the only issue is that I dont know how to get their data ...

@theronrr
Copy link

@theronrr theronrr commented Mar 20, 2020

It is the pass information changing and not merging into a common field is distrubin.
I use the the recovery to maintain active calc.

@paolinic03
Copy link

@paolinic03 paolinic03 commented Mar 20, 2020

JiPiBI, I check worldometers too and wish I could connect to that source :(...

@LunaPg
Copy link

@LunaPg LunaPg commented Mar 20, 2020

It seems we have to pay for the data ...
Even for non profit orgs

Not cool ...

https://www.worldometers.info/licensing/what/

@JiPiBi
Copy link

@JiPiBi JiPiBi commented Mar 20, 2020

JHU or CDC are perhaps more powerfull and have some arguments to obtain their data ?

@paolinic03
Copy link

@paolinic03 paolinic03 commented Mar 20, 2020

Not sure... what a bummer.

@gohkokhan
Copy link

@gohkokhan gohkokhan commented Mar 21, 2020

I'm wondering how can Worldometers have more reliable data than JHU, was it because they have a lot of manual intervention to ensure data accuracy?

@JiPiBi
Copy link

@JiPiBi JiPiBi commented Mar 21, 2020

@gohkokhan
For every new value , they give their sources , you can also read how they process their data on their site

@paolinic03
Copy link

@paolinic03 paolinic03 commented Mar 21, 2020

Are there any updates on this issue????????????????

@ryanwoconnor
Copy link

@ryanwoconnor ryanwoconnor commented Mar 21, 2020

Recovered cases are not being reported at the country level in the timeseries recovered data. Can we please get an update?

@JiPiBi
Copy link

@JiPiBi JiPiBi commented Mar 21, 2020

@paolinic03
If you get an answer on this site , consider that as a miracle . So many try to everyday ....

@paolinic03
Copy link

@paolinic03 paolinic03 commented Mar 21, 2020

@JiPiBi it’s all good lol. I’ve lost hope. Found a workaround for now.

@JiPiBi
Copy link

@JiPiBi JiPiBi commented Mar 22, 2020

@paolinic03
Yes some are fixing themselves the data
On my side , I close my issues some days after opening , without being fixed by the site .
A bit strange ...

@chrisjbillington
Copy link

@chrisjbillington chrisjbillington commented Mar 22, 2020

Anyone aware of timeseries data from worldometers?

This dataset is becoming increasingly complicated to deal with.

FWIW the setup of the github repository is totally the wrong approach. As you change the name of a country, you shouldn't be leaving old data with the old name in the repository and making an increasingly complex script to collect the data into a single timeseries. The repository should be a single timeseries, and changing the name of a country should be retroactive and universal. That is what version control is for. Instead, by having per-day datasets with the format and conventions changing over time, you're forced to deal with this heterogeneity when combining data together, and it's causing all kinds of issues. You're inventing your own version control.

If you just had a single timeseries csv file, you could correct errors by modifying and committing the file, instead of whatever manual list of overrides you currently have to fix the data as it is collated.

Then pull requests would make sense and people could help fix errors, and there would be a single source of truth instead of the mess we have now.

@anjankarpak2110
Copy link

@anjankarpak2110 anjankarpak2110 commented Mar 22, 2020

Is there any update on this issue? I don't see any changes to the data updated today. This issue exist for quite some time. Will this be fixed or we need to move to other source? Its disturbing when you put lot of efforts in developing a report and due to inconsistency in data we are not able to publish it.

@JiPiBi
Copy link

@JiPiBi JiPiBi commented Mar 22, 2020

The general issue of this dataset is that it is not considered and managed as a database with numerical id instead of strings as keys and one record for one day and one entity, all in the same file and a daily value not a cumulative one.

Result : when you have an error on one day , you have to change all the following days and values are never fixed in all the daily cumulative reports ...

And when you change one name you have to change all the values instead of dealing with a table of id associated with the name of the entity (state / country / continent) where you change only one element .
Result : we go on with daily reports never fixed and not coherent because the entities' labels have changed .

A bit disappointing ....

@jawz101
Copy link

@jawz101 jawz101 commented Apr 15, 2020

@CSSEGISandData

How you calculate recovered cases should be pretty easy:

If confirmed date was > 2-4 weeks ago and you are presently not dead, you're likely recovered.

The "number of U.S. patients tested" and this number of recoveries are unrealistic expectations:

I wouldn't expect to have billions of tests to administer to everyone in the world to periodically make sure each person is or is not infected. Because that's what you're telling me is the reasoning behind keeping a tally of tests done. We have patients who want a test each week because they're convinced they have a virus. Counting number of people tested is misguided.

"Let's make sure everyone gets tested" is frivolous when the 99.999% negative today will just need another test tomorrow. That's what you're saying we should count?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet