Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Linear fit of exponential cases and deaths data and THANK YOU #922

Open
valeriupredoi opened this issue Mar 17, 2020 · 24 comments
Open

Linear fit of exponential cases and deaths data and THANK YOU #922

valeriupredoi opened this issue Mar 17, 2020 · 24 comments

Comments

@valeriupredoi
Copy link

@valeriupredoi valeriupredoi commented Mar 17, 2020

Hey guys, feel free to close as soon as you have read this

I have started a project fitting the data you guys provide and have to say I am rather concerned with what they say on the media vs the actual data, see my project - I also wanted to thank you greatly for providing such a wealth of open source data!! Cheers from the UK 🍺

@JiPiBi
Copy link

@JiPiBi JiPiBi commented Mar 17, 2020

Hi
IMHO, What is also important to understand also is that in every infected country you have a province and a town that is more infected than the whole country with heavy rate of deaths and risks of being submergeg by the ICU needs :

  • In China, it was HUBEI and Wuhan
  • In Italia it is Lombardia and Bergamo
  • in France Alsace and Mulhouse but cases in Ile de France are growing

I think it shoud be interesting to focus also on these hotspots , because the risk of contamination to medical people and the lack of ICU due to the flood of sick people is a major risk

But the data are not available in this repo for Italy and France (there is another repo for Italy because official data are available from Protezione Civile , but nothing consistent available for France).
I dont know the situation in Spain , where the values are increasing at high speed these last days.

@JiPiBi
Copy link

@JiPiBi JiPiBi commented Mar 17, 2020

About your curves , is it possible to have a tendency line that is polynomial and not linear , because for example in Italy, I see a slight inflexion in total cases

@valeriupredoi
Copy link
Author

@valeriupredoi valeriupredoi commented Mar 17, 2020

Hi
IMHO, What is also important to understand also is that in every infected country you have a province and a town that is more infected than the whole country with heavy rate of deaths and risks of being submergeg by the ICU needs :

* In China, it was HUBEI and Wuhan

* In Italia it is Lombardia and Bergamo

* in France Alsace and Mulhouse but cases in  Ile de France are growing

I think it shoud be interesting to focus aloso on these hotspots , because the risk of contamination to medical people and the lack of ICU due to the flood of sick people is a major risk

But the data are not available in this repo for Italy and France (there is another repo for Italy because official data are available from Protezione Civile , but nothing consistent available for France).
I dont know the situation in Spain , where the values are increasing at high speed these last days.

completely agree! In the case of the UK it's England (and London) which drive the infection numbers - see the dailly indicators and the NHS England sheets at the UK Government data repo - London alone has 621 cases as of today.

Unfortunately I don't know how to get such numbers for other countries, and since this project is not actually for my day job, I think I won't have the time to start digging. But by all means, feel free to contribute to my repo - opening a Pull request shoud do it 🍺

@valeriupredoi
Copy link
Author

@valeriupredoi valeriupredoi commented Mar 17, 2020

About your curves , is it possible to have a tendency line that is polynomial and not linear , because for example in Italy, I see a slight inflexion in total cases

Good point and I should be able to do that by fitting a P(xn) rather than just a line but sadly, for Italy, I fail to see any tendency, it's as straight a line as it can be (see the R param too and the small LS errors) - what is good is the rate is lower than the usual 0.25-0.30 day-1 as is the case for almost all the other countries Poor Italy

@alkimiadev
Copy link

@alkimiadev alkimiadev commented Mar 17, 2020

I think fitting a curve to confirmed case counts is a little naive since confirmed case counts largely depend on the testing capacity that country has. I would say modeling deaths would make more sense since it is likely to be a number that is closer to the ground truth.

I wish I could find a reliable dataset that showed how many tests were given as well as the number confirmed. I think the ratio of confirmed cases to tests would be more useful than just confirmed cases.

@valeriupredoi
Copy link
Author

@valeriupredoi valeriupredoi commented Mar 17, 2020

yes, completely agree - the death rate is the only hard number (provided that the cases are completely due to the virus' complications) but even if the actual number of cases may be under estimated by an order of magnitude or more, given the number of confirmed cases is a result of random testing (people showing up at the hospital or being drive-thru tested) the rate is representative since the number of confirmed cases is a randomly-drawn sample from the true distribution of cases and obeys the same evolution over time. This of course breaks with introducing biases e.g. more and more people decide not to go get tested or they refuse them for testing (which is an emerging situation in Spain)

@valeriupredoi
Copy link
Author

@valeriupredoi valeriupredoi commented Mar 17, 2020

And as you say, the fatalities are the most reliable statistical measure of the actual cases: we know those cases were real cases and their distribution is drawn from the actual cases distribution of all infected individuals but might not be representative of the whole distribution (small numbers statistics) - the rate should be representative though, the rate should approach the infection rate we estimate from the number of tested cases (the one above), if the population doesn't display a massive age bias (like Italy) - but look at Germany, France, The Netherlands, these two rates are almost equal. Anyways - that's my 2 cents, cheers 🍺
Also, feel free to close the issue, I am not a virulogist so I may not be the right person to listen to 😁 Cheers again for the data!

@JiPiBi
Copy link

@JiPiBi JiPiBi commented Mar 17, 2020

@alkimiadev
IMHO , the quantity of tests realized is very difficult to interpret : I agree that if you make no test , you have less confirmed cases , but :

  • in Italy they make many tests , and without speaking of the results , they said to me that a test can be applied several times to te same person (fe : person in quarantine) , so it seems difficult to build a ratio between number of tests and discovered cases
  • in France , now they made less and less tests, if you have light symptoms , your doctor say to you to stay at home, and when your symptoms became worriyng , your are sent to the hospital , so both the number of tests and the number of cases are difficult to take into account for a theory or a trend

What is important now is to deal with the needs in ICU and beds in hospitals and mainly apply the confinment rules to contain the epidemia , it worked in China , we are eager to know what will be the results in Italy, Spain and France

Germany is for me a mystery , they have more cases than France and they have the smaller death rate (= death/confirmed even if this ratio is questionable : 1st the number of confirmed is not reliable and 2nd as it has already been said , you'll have to compare the dead of the moment to the confirmed some days ago , but how many days ? ) . If anyone has an explanation ....

@JiPiBi
Copy link

@JiPiBi JiPiBi commented Mar 17, 2020

@valeriupredoi

On this curve , it seems for me that the curve for italy in confirmed seems to improve ( it is difficult to compare countries but for a given country it must indicate something , in fact I mainly hope that confinment in Italy is usefull ...)

image

@JiPiBi
Copy link

@JiPiBi JiPiBi commented Mar 17, 2020

@valeriupredoi
about the age of Italians , I have no elements about such a bias in comparizon with other countries , but after saying that dead people in China and in Italy were old people with heavy pathologies , one heard in France things like : even young people with no known pathologies die , and it is more worrying

@JimBudde
Copy link

@JimBudde JimBudde commented Mar 18, 2020

@alkimiadev spot on. There are too many biases introduced in how testing is performed in each country and region. There are also structural differences in how each country manages its healthcare system. My take-away, be careful of trying to compare apples to apples. Simple take home message, get New Case volume down! That is something everyone can understand.

@valeriupredoi
Copy link
Author

@valeriupredoi valeriupredoi commented Mar 18, 2020

@valeriupredoi

On this curve , it seems for me that the curve for italy in confirmed seems to improve ( it is difficult to compare countries but for a given country it must indicate something , in fact I mainly hope that confinment in Italy is usefull ...)

image

yes true that! and it's great news! but I started running the automated daily plotting from March 1st only (I am very interested in seeing the UK trend, and the UK is delayed compared to Italy) and it really is linear after that date, for some reason it looks like the confirmed cases are steadily increasing (at a lower rate than before) but still increasing at a ~0.2 per day rate. I wonder when we'll see the next slowdown (hopefully soon!)

@alkimiadev
Copy link

@alkimiadev alkimiadev commented Mar 18, 2020

I have been using method similar to the one used by Tomas Pueyo used in his medium article(https://medium.com/@tomaspueyo/coronavirus-act-today-or-people-will-die-f4d3d9cd99ca) to estimate total cases. We have to make some assumptions for that simple method like death rate, infection to death time and days to double rate. For example:

death rate: 2%(has to be lower than deaths/confirmed cases since there are definitely more cases than currently confirmed)
infection to death in days: 20
case doubling in days: 5

When looking at the death rate for a given day we could say we're looking at infections that started roughly 20 days ago and with a case doubling rate of 5 the real case count would have doubled 4 times in that period. Which would make a formula that looks something like:

(deaths/0.02)^1.60206=estimated total cases

@valeriupredoi
Copy link
Author

@valeriupredoi valeriupredoi commented Mar 18, 2020

interesting, except that measured cases double every 2.5-3 days (2.7 days in the UK see here) so that would make it roughly 8 times doubling in the period - actually use the exp(bt) formula with b~0.27=220 times initial cases roughly eight times doubling

@alkimiadev
Copy link

@alkimiadev alkimiadev commented Mar 18, 2020

@valeriupredoi The assumptions I used there are a bit on the conservative side but overall that method is likely going to be more useful than trying to model the confirmed case rates. Confirmed cases largely depend on the testing capacity and as a country ramps up their testing we'll see that countries confirmed cases double at a rate faster than the actual case doubling rate.

@valeriupredoi
Copy link
Author

@valeriupredoi valeriupredoi commented Mar 18, 2020

aha, got you now! Good reasoning indeed, so that, in effect, would mean halving the observed rate of daily cases. Cheers, I'll use this to place lower limits on the numbers I plot 🍺

@TakeItAndRun
Copy link

@TakeItAndRun TakeItAndRun commented Mar 19, 2020

I would not put more trust in the numbers of confirmed death than the number of confirmed cases.
It is easy to say you don't need a test if you have only light symptoms.
It is easy to say your dead therefore you don't need a test anymore.
Both happens when your number of tests are limited.

@alkimiadev
Copy link

@alkimiadev alkimiadev commented Mar 19, 2020

@TakeItAndRun "All models are wrong, but some are useful" - George E. P. Box

@nick-lagrassa
Copy link

@nick-lagrassa nick-lagrassa commented Mar 21, 2020

I have been using method similar to the one used by Tomas Pueyo used in his medium article(https://medium.com/@tomaspueyo/coronavirus-act-today-or-people-will-die-f4d3d9cd99ca) to estimate total cases. We have to make some assumptions for that simple method like death rate, infection to death time and days to double rate. For example:

death rate: 2%(has to be lower than deaths/confirmed cases since there are definitely more cases than currently confirmed)
infection to death in days: 20
case doubling in days: 5

When looking at the death rate for a given day we could say we're looking at infections that started roughly 20 days ago and with a case doubling rate of 5 the real case count would have doubled 4 times in that period. Which would make a formula that looks something like:

(deaths/0.02)^1.60206=estimated total cases

Thanks for sharing this. There seems to be an increasing interest in using functions of observed deaths to estimate the number of true cases.

I just want to expand on the formual you provide for viewers who might want to use it.

The formula, as listed, is correct for the specific case where deaths = 2.

The general formula is:
estimated total cases = (observed_deaths/ fatality_rate) * 2^(infection_to_death_in_days / case_doubling_in_days)

So for the case where:
observed_deaths = 10
fatality rate = 2%
infection_to_death_in_days = 20
case_doubling_in_days = 5

estimated total cases = (10/0.02) * 2^(20/5) = 500 * 2^4 = 8,000

@alkimiadev
Copy link

@alkimiadev alkimiadev commented Mar 21, 2020

@nick-lagrassa if there were 100 cases 20 days ago and it doubled every 5 days there would be 1600 cases today. meaning an exponent of roughly 1.60206

@nick-lagrassa
Copy link

@nick-lagrassa nick-lagrassa commented Mar 21, 2020

@alkimiadev You're right. My point was just that the exponent of 1.60206 only works for that specific example (100 cases with a 2% death rate => 2 deaths).

@alkimiadev
Copy link

@alkimiadev alkimiadev commented Mar 21, 2020

@nick-lagrassa ah you are correct! thanks for pointing that out!

@JiPiBi
Copy link

@JiPiBi JiPiBi commented Mar 21, 2020

@nick-lagrassa

the daily values for deaths in Italy are the following , the pace of deaths seems to change
463 631 827 1016 1266 1441 1809 2158 2503 2978 3405 4032 4825
At the beginning it doubles in 2 days et now about 5 days , pls what are your forecasts on these values ?

Edit : Confirmed cases in parallel
9172 10149 12462 15113 17660 21157 24747 27980 31506 35713 41035 47021 53578

Complement ; I was said that in Italy whatever of what people die , if they have covid , they are declared as dead from covid , but it must not change the trends

@valeriupredoi
Copy link
Author

@valeriupredoi valeriupredoi commented Mar 21, 2020

it's clear that the deaths in italy are not following an exponential anymore and they are starting to plateau: Italy - note that those least squares errors on the last points are of order 200-300 daily so I am hoping this is the actual case and not some data artefact due to Italians not reporting the deaths as they were doing it before - any news on this maybe? Cheers guys 🍺

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
6 participants