# Calculating implied infection numbers

This notebook tries to compute what the full infection numbers in the past and present likely were/are.

It does so in the past by blending variables for "median days from infection to death" and "infection fatility rate" (IFR) with smoothed death rates. In other words, days_to_death days before date D, there must have been roughly (deaths_on_date_D / IFR) infections to end up with a given number of deaths on date D.

It does in the present to looking at what percentage of infections were confirmed on the last day calculated in the past, and applying that percentage to the new infections found since then. That doesn't quite take into account if there is a significant ramping of testing during that time, but it should be close enough.

The principal source of death data is files from the NY Times, supplemented by a more accurate DateOfDeath.csv from Massachusetts. The source of testing data is The COVID Tracking Project, maintained by The Atlantic.

NOTE: Prior to running this notebook, you should retrieve the latest DateOfDeath.csv file by:

1. going to https://www.mass.gov/info-details/covid-19-response-reporting,
2. downloading the raw data zip from the line saying "Raw data used to create the dashboard is available here:"
3. copying the DateofDeath.csv in that file to the same directory as the notebook

Yeah, that could be automated. Just haven't done it yet...

In [None]:
%matplotlib inline
import numpy
import pandas
import matplotlib
import matplotlib.pyplot as plt

from common import load_data

In [None]:
# Earliest date that there is sufficient data for all states, including MA
EARLIEST_DATE = pandas.Period('2020-03-10', freq='D')
LATEST_DATE = None
LATEST_DATE = pandas.Period('2020-07-31', freq='D')


In [None]:
meta, nyt_stats, ct_stats = load_data(EARLIEST_DATE, LATEST_DATE)

### Group on date and calculate new stats

In [None]:
nyt = nyt_stats.groupby('Date').sum().sort_index()[['Deaths']].copy()
ct = ct_stats.groupby('Date').sum().sort_index()[['Pos', 'Neg']].copy()

# Calculate per-capita values
ct['PctPos'] = ct.Pos / (ct.Pos + ct.Neg)

# Calculate daily deaths and smoothed (avg of trailing 7 days) deaths
nyt['Daily'] = (nyt.Deaths - nyt.shift().Deaths)
nyt['Deaths7'] = (nyt.Deaths - nyt.shift(7).Deaths) / 7

# Calculate confirmed tests based on smoothed weekly data
ct7 = ct.shift(7)[['Pos', 'Neg']]
ct['NRatio'] = (ct.Neg - ct7.Neg) / (ct.Pos - ct7.Pos)
ct['DailyConfirms'] = (ct.Pos - ct7.Pos) / 7

ct.tail()

## Now for the charts...

In [None]:
def get_infections_df(scenarios):
    data = {}
    for name, death_lag, ifr_high, ifr_low in scenarios:
        # Calculate the IFR to apply for each day
        ifr = pandas.Series(numpy.linspace(ifr_high, ifr_low, len(nyt)), index=nyt.index)
        # Calculate the infections in the past
        infections = nyt.shift(-death_lag).Deaths7 / ifr
        
        # Find out the ratio of infections that were detected on the last date in the past
        last_date = infections.index[-(death_lag+1)]
        last_ratio = infections.loc[last_date] / ct.loc[last_date, 'DailyConfirms']
        
        # Apply that ratio to the dates since that date
        infections.iloc[-death_lag:] = ct.DailyConfirms.iloc[-death_lag:] * last_ratio

        print(1 / last_ratio)
        data[name] = infections

    return pandas.DataFrame(data)

In [None]:
SCENARIOS = (('20', 20, 0.01, 0.01), ('18', 18, 0.01, 0.01), ('16', 16, 0.01, 0.01), )

df = get_infections_df(SCENARIOS)
foo = df.plot(title="New Infections Estimates, varying average days to death, IFR = 1.0%", figsize=(10,5))

In [None]:
SCENARIOS = (('1.3%', 18, 0.013, 0.013), ('1.0%', 18, 0.01, 0.01), ('0.7%', 18, 0.007, 0.007), )

df = get_infections_df(SCENARIOS)
foo = df.plot(title="New Infections Estimates, varying IFR, days to death = 18", figsize=(10,5))

In [None]:
SCENARIOS = (('1.2% - 0.8%', 18, 0.012, 0.008), ('1.0% - 0.7%', 18, 0.01, 0.007), ('0.9% - 0.6%', 18, 0.009, 0.006), )

df = get_infections_df(SCENARIOS)
foo = df.plot(title="Infection Estimations, improving IFR, days to death = 18", figsize=(10,5))

In [None]:
SCENARIOS = (('1.2% - 0.6%', 19, 0.011, 0.006), )

df = get_infections_df(SCENARIOS)
foo = df.plot(title="Infection Estimations, my hunch, 19 median days to death", figsize=(10,5))

In [None]:
df.sum()

In [None]:
df.tail()

In [None]:
SCENARIOS = (('1.2% - 0.5%', 21, 0.012, 0.005), )

df = get_infections_df(SCENARIOS)
foo = df.plot(title="Worst case? 21 days to death, improving IFR", figsize=(10,5))