Skip to content
This repository has been archived by the owner on Mar 10, 2023. It is now read-only.

Small problem: calculating death rates #23

Open
PenelopeFudd opened this issue Feb 12, 2020 · 10 comments
Open

Small problem: calculating death rates #23

PenelopeFudd opened this issue Feb 12, 2020 · 10 comments
Labels
enhancement New feature or request

Comments

@PenelopeFudd
Copy link

It's tempting to try and calculate how deadly a virus is by just dividing the number of people who have died by the number of people who have died plus the number who have recovered, but there's a problem with the number you get: it's too high.

The calculation needs to compare people who were infected at the same time, because it takes longer to recover than it does to die.

Eventually the simple calculation and the complicated calculation will converge, but we're not there yet. :-/

@smiletwchien
Copy link

The association between the died(on Axis Y) and the recovered(on Axis X) might be enough to satisfy the need we concern at http://www.smilechien.idv.tw/kpiall/deathrecovery.asp

@PenelopeFudd
Copy link
Author

PenelopeFudd commented Feb 12, 2020

What a person who has just become infected wants to know is: What are my chances of dying (the death-to-case ratio)?

The easiest way is to wait until everyone has either recovered or died, then number died / total confirmed cases would be exactly correct. The problem is, we can't wait that long.

Because it takes longer for a person to recover than it does for a person to die, the death statistics (when compared to recovery statistics) arrive sooner. If the graph of cases over time is still going up, then the deaths-to-recoveries ratio will appear higher than it really is; if the graph is going down, then the ratio will appear lower than it really is.

The thing to do instead is to examine a group of patients (a 'cohort') who are no longer sick (one way or the other). In that group, we can do the calculation number died / total confirmed cases and get the right result.

@PenelopeFudd
Copy link
Author

PenelopeFudd commented Feb 12, 2020

Here's an article comparing this virus to Ebola and Influenza, from National Geographic: https://www.nationalgeographic.com/science/2020/02/graphic-coronavirus-compares-flu-ebola-other-major-outbreaks/

From that, it looks like Influenza has killed many more people than Ebola, MERS, SARS and COVID-19 (this virus) combined.

This article from Wired https://www.wired.com/story/coronavirus-is-bad-comparing-it-to-the-flu-is-worse/ says such comparisons are unhelpful. If this virus spreads as far as Influenza has, many more people will die, as it looks like this virus has a 2% death-to-case ratio, compared to less than 0.1% for Influenza.

@jmcastagnetto
Copy link

You might want to take a look at https://www.worldometers.info/coronavirus/coronavirus-death-rate/

@googled
Copy link

googled commented Feb 13, 2020

All deaths are accounted for, but not all infected people. Some people at home might have been asymptomatic or had recovered by themselves. The number of infected people is believed to be as high as 5 fold the numbers we currently have, therefore making the death-to-case ratio under 1%.

Professor Neil Ferguson, director of J-IDEA (UK), on the current 2019-nCoV coronavirus outbreak.
https://www.youtube.com/watch?v=ALQTdCYGISw

@Fromalaska49
Copy link

While enlightening, I believe the analysis and discussions here are beyond the scope of data sharing, the purpose of this repository.

@avatorl
Copy link

avatorl commented Feb 13, 2020

Is there a reason that number died / total confirmed cases would not work?

http://avatorl.org/en/novel-coronavirus-2019-ncov-fatality-rate-who-and-media-vs-reality/

@JHale716
Copy link

JHale716 commented Feb 14, 2020

The lag factor is problematic, an approach that could be considered is to compare on a infections X days ago against the deaths we know of today. Unlikely to be accurate, however, a better ‘guide’ to the trend than the raw today approach which hides the irate with the exponential new cases growth.

From the media; reports on progress this spread being 3-7 days gives an interesting picture, as too there is further lag with testing for confirmation creating up to a 48 hour lag on confirmation from presentation to confirmation. Then we have the new CT approach which now shortcuts that delay, making the data inconsistent if you are pursuing absolutes.

This is assuming data accuracy with complete reporting and no added manipulation. I have more confidence with the international data than I do with the Chinese data. Especially when we have reports that the death rate may not be being reported accurately as well as what has been said in this thread about infection data. We do what we do with what we have.

On this looking at the X-day lag, it is converging around 3-4% on a straight 3,5,7 day approach, ignoring the 48hour test delay. Which ‘feels’ about right when looking at the whole data set we have.

@PenelopeFudd
Copy link
Author

Yes: as time goes on, the lag will have less and less effect on the overall rate.

@designerfuzzi
Copy link

when the virus is not infecting more people the death rate data would drop again of course.
So any calculation is bound to the situation when the virus is still going around.

Because cases who end up dead earlier than cases recovering there should obviously a time where the time from being confirmed infected to death can be compared to the cases where the same time passed but the case survived/recovered. Both are treated usually in clinical circumstances, otherwise they where not confirmed cases. So any calculation based on a comparison of both inside this particular time span are expression of health care quality and lots of other factors. But you can compare them. It would express death rate even when confirmed inside a health care system.
It doesnt make sense to declare that cases without symptoms (unconfirmed cohort) are inside the cohort just to drop the mortality rate. While cases inside confirmed cohort are usually under health care treatment. You could include the influenza cohort and watch its axis curve, if its double the normal height then you know that one or the other has a higher rate. This is because the covid-19 cohort is new, not even a year old while influenza data is well known.

@CSSEGISandData CSSEGISandData added the enhancement New feature or request label Mar 9, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

9 participants