In [None]:
import warnings
warnings.filterwarnings('ignore')
from helpers import *
import numpy as np
import plotly.express as px

# Estimating Pandemic Risk

Here, we will estimate the expected number of deaths this century from the following types of pandemics:
1. Natural pandemics (based on estimates from [Marani et al. 2021](https://doi.org/10.1073/pnas.2105482118))
2. Accidental pandemics (based on estimates from [Klotz 2021](https://armscontrolcenter.org/wp-content/uploads/2017/04/LWC-paper-final-version-for-CACNP-website.pdf))
3. Deliberate pandemics (based on estimates from [Esvelt 2022](https://dam.gcsp.ch/files/doc/gcsp-geneva-paper-29-22?_gl=1*ieur8b*_ga*MTk1NzA0MTU3My4xNjk2NzcyODA0*_ga_Z66DSTVXTJ*MTY5Njc3MjgwNC4xLjAuMTY5Njc3MjgwNi41OC4wLjA.))

In [None]:
# Define vars
params = Params()

We will will estimate the expected number of deaths this century i.e. from 2024 to 2100. For simplicity, we'll assume a fixed population of 9.2 billion people over this time period (based on [UN population projections](https://www.worldometers.info/world-population/world-population-projections/)):

$
\text{Average population} = \frac{\text{Starting population} + \text{Population in 2100}}{2} = \frac{8.0 \text{ billion} + 10.3 \text{ billion}}{2} = 9.2 \text{ billion}
$

For the estimates, we'll be using Monte Carlo simulations. These parameters which are used for multiple of the estimates are shown below:

In [None]:
params.print_category('Global')

## 1. Estimating Natural Pandemic Risk

We'll be estimating the expected number of deaths from natural pandemics this century. This analysis will involve three main sections:

1. Load data
2. Load the GPD model
3. Use the GPD model to estimate the expected number of deaths from natural pandemics this century

To esimate natural pandemic risk, we'll use the following parameters from [Marani et al.](https://doi.org/10.1073/pnas.2105482118):

In [None]:
params.print_category('Natural')

### 1.1. Load Data

[Marani et al.](https://doi.org/10.1073/pnas.2105482118) assembled a dataset of epidemics that have occured since 1500.

For their analysis, which we'll replicate next, they only used data from 1600 due to reliability. They also excluded epidemics that were ongoing at the time of writing (eg: AIDS/HIV, malaria, and COVID-19) or were ended by pharmaceuticals (eg: smallpox) which excluded all epidemics after the end of WWII. This is because they likely have a different distribution.

The full dataset is visualised below with the time period from 1600 to 1944, that we'll use for subsequent analyses, highlighted in green.

In [None]:
marani_xls = params.Natural.dataset.val
marani_df, disease_totals = load_and_preprocess_natural_data(marani_xls)
color_map = generate_color_map(disease_totals)
fig = plot_disease_timeline(marani_df, disease_totals, color_map)

### 1.2. Load the GPD model

Epidemic intensity refers to the number of deaths due to an epidemic in a given year. When we plot this intensity against its exceedance probability, it provides insights into the likelihood of an epidemic of a given intensity occuring in a given year. The exceedance probability is the probability that the epidemic intensity will be exceeded in a given year.

Marani et al. demonstrated that this relationship between epidemic intensity and exceedance probability is well-described by the Generalized Pareto Distribution (GPD), a family of continuous probability distributions often used to model the tail of the distribution of extreme events.

The GPD is defined mathematically defined as:

$
H(i) = P_0 + (1 - P_0) \times \left(1 + \frac{\xi \times (i - \mu)}{\sigma}\right)^{-\frac{1}{\xi}}
$

Where:
- $ H(i) $ is the exceedance probability of the epidemic intensity $ i $.
- $ \mu $ is the threshold parameter.
- $ \sigma $ is the scale parameter.
- $ \xi $ is the shape parameter.
- $ P_0 $ is the probability for intensities below the threshold $ \mu $.

The epidemic intensity is plotted against its exceedance probability below, including the GPD using parameter estimates from [Marani et al.](https://doi.org/10.1073/pnas.2105482118). Note: both axes are on a log scale.

<!-- #### Understanding Exceedance Probability

The exceedance probability of the yearly maximum epidemic intensity, $ H_1(i) $, is given by $ H_1(i) = 1 - P_1(i) $. This value represents the likelihood that an extreme novel epidemic (irrespective of its causative disease) with an intensity equal to or greater than $ i $, will occur anywhere in the world within a year.

To put this into perspective, consider an epidemic with an intensity similar to the 1918-1920 "Spanish flu." For this epidemic, with an intensity of $ i = 5.7 $ deaths per thousand per year, the yearly probability of exceedance is represented by $ H_1(i = 5.7 \text{‰/year}) $. -->

<!-- Epidemic intensity refers to the number of deaths due to an epidemic in a given year. When we plot this intensity against its exceedance probability, it provides insights into the likelihood of extreme epidemic events. Specifically, the exceedance probability indicates the probability that the epidemic intensity will be exceeded in a given year.

Marani et al. demonstrated that this relationship between epidemic intensity and exceedance probability is well-described by the Generalized Pareto Distribution (GPD). The GPD is particularly useful for modeling extreme events, making it a suitable choice for representing extreme epidemic intensities.

The Generalized Pareto Distribution (GPD) is mathematically defined as:



Marani et al. showed that the when the epidemic intesity, that is the number of deaths per year, is plotted against the exceedance probability, which is the probability that the epidemic intensity will be exceeded in a given year, the data is well described by a generalized Pareto distribution (GPD). The GPD is a probability distribution that is often used to model extreme events. The GPD is defined as:

$
\text{GPD}(x) = \begin{cases}
\frac{1}{\sigma} \left(1 + \frac{\xi x}{\sigma} \right)^{-\frac{1}{\xi} - 1} & \text{if } \xi \neq 0 \\
\frac{1}{\sigma} e^{-\frac{x}{\sigma}} & \text{if } \xi = 0
\end{cases}
$

where $\sigma$ is the scale parameter and $\xi$ is the shape parameter. The exceedance probability is defined as:

$
\text{Exceedance probability} = \frac{\text{Number of epidemics with intensity greater than } x}{\text{Total number of epidemics}}
$

The GPD is visualised below: -->

In [None]:
mu = params.Natural.mu.val
sigma = params.Natural.sigma.val
xi = params.Natural.xi.val

marani_1e_5_df = marani_df[(marani_df["Intensity (deaths per mil/year)"] > 1e-5) & (marani_df["Start Year"].between(1600, 1944))]
# marani_1e_5_df = marani_df[(marani_df["Intensity (deaths per mil/year)"] > 1e-5) & (marani_df["Start Year"] > 1944)]

fig = plot_exceedance_probability(
        marani_df=marani_1e_5_df,
        plot_gpd=True,
        plot_lognorm=False,
        mu=mu,
        sigma=sigma,
        xi=xi
    )

### 1.3 Use the GPD Model to Estimate the Expected Number of Deaths from Natural Pandemics this Century

To estimate the yearly expected intensity, I first truncated the GPD at 33.3 million deaths per year, as this is the highest credible reports of deaths for a pandemic, 100 million deaths from 1918 influenza over 3 years, similar to the method used by [Glennerster et al.](https://www.nber.org/system/files/working_papers/w30565/w30565.pdf) I then calculated the intergral of the GPD from 0 to 33.3 million.

In [None]:
max_intensity = params.Natural.max_intensity.val

intensities= np.linspace(mu, max_intensity, 100)
gpd_values = compute_gpd(intensities, mu, sigma, xi)
expected_yearly_intensity = np.trapz(intensities * gpd_values, intensities)

fig = plot_exceedance_probability(intensities, gpd_values, log_axis=False, title_text=f"Expected yearly intensity = {expected_yearly_intensity:.1f} million deaths")

To get the expected number of natural pandemic deaths this century, I simply multiplied the yearly expected intensity by the number of years remaining this century:

$
E[\text{#deliberate_deaths}] = E[\text{#deliberate_deaths_per_year}] \times \text{#years}
$

In [None]:
num_years = params.Global.num_years.val
E_natural_deaths = expected_yearly_intensity * num_years

display_text(f"Expected number of deaths from natural epidemics this century = <strong>{E_natural_deaths:.1f} million deaths</strong>")

### Limitations

Limitations of this analysis include:
- The estimate may be inaccurate as:
    - The GPD may not be a good fit for the data. It is particular hard to determine the probability of extreme events from a small sample size. 
- The estimate may be an overestimate as: 
    - Based on visual inspection, the GPD appears to overestimate the probability of extreme events.
    - The GPD is fit to data from before the widespread use of antibiotics and vaccines.

## 2. Estimating Accidental Pandemic Risk

We'll be examining the potential risk of a pandemic from the accidental release of viruses from research facilities. This analysis will involve three main sections:

1. Probability a single facility in a single year seeds a pandemic
2. Expected number of accidental pandemics this century
3. Expected number of deaths from accidental pandemics this century

In [None]:
params.print_category('Accidental')

### 2.1 Probability a single facility in a single year seeds a pandemic

The probability that a single facility seeds a pandemic in a single year is given by:

$
P_{\text{single_pandemic}} = P_{\text{release}} \times P_{\text{seeds_pandemic}}
$

Where:
- $P_{\text{release}}$ is the probability of community release from a single facility in a single year.
- $P_{\text{seeds_pandemic}}$ is the probability that a virus release seeds a pandemic. Since $P_{\text{seeds_pandemic}}$ is given as a range, we will sample from a uniform distribution between 0.05 and 0.4.

In [None]:
P_seeds_pandemic_min, P_seeds_pandemic_max = params.Accidental.P_seeds_pandemic.val
num_simulations = params.Global.num_simulations.val
P_release = params.Accidental.P_release.val
accidental_colour = params.Accidental.colour.val

P_seeds_pandemic = np.random.uniform(P_seeds_pandemic_min, P_seeds_pandemic_max, num_simulations)
P_single_pandemic = P_release * P_seeds_pandemic

# Plot the distribution of P_single_pandemic
fig = plot_P_single_pandemic_hist(P_single_pandemic, accidental_colour)

### 2.2. Expected number of accidental pandemics this century

The expected number of pandemics seeded by any facility in a $num\_years$ is given by:

$
E[\text{#Pandemics}] = P_{\text{single_pandemic}} \times \text{#years} \times \text{#facilities}
$

Where:
- $\text{#years}$ is the number of years in the simulation (typically set to 100 for a century).
- $\text{#facilities}$ is the number of facilities in the world.
- $P_{\text{single_pandemic}}$ is the probability that a single facility seeds a pandemic in a single year.

In [None]:
num_facilities = params.Accidental.num_facilities.val

E_accidental_pandemics = P_single_pandemic * num_years * num_facilities

# Plot the distribution of E_accidental_pandemics
fig = plot_E_accidental_pandemics_hist(E_accidental_pandemics, accidental_colour)

### 2.3. Expected number of deaths from accidental pandemics this century

For each pandemic, the number of deaths is calculated as:

$
E[\text{#Deaths} ]= E[\text{#Pandemics}] \times \text{Population} \times \text{Infection Rate} \times \text{CFR}
$

Where:
- $\text{Population}$ is the world population at the time of the pandemic.
- $\text{Infection Rate}$ is the infection rate of the pandemic.
- $\text{CFR}$ is the case fatality rate of the pandemic.

In [None]:
population = params.Global.population.val
infection_rate = params.Accidental.infection_rate.val
fatality_rate = params.Accidental.fatality_rate.val

E_accidental_deaths = E_accidental_pandemics * population * infection_rate * fatality_rate

# Plot the distribution of E_accidental_deaths
fig = plot_E_accidental_deaths_hist(E_accidental_deaths, accidental_colour)

### Limitations

Limitations of this analysis include:
- The estimate may be inaccurate as:
    - The results are highly dependent on the probabilities of release computed in [Klotz 2020](https://armscontrolcenter.org/wp-content/uploads/2020/03/Quantifying-the-risk-9-17-Supplementary-material-at-end.pdf)
- The results may be an underestimate as:
    - We are only considering the risk from highly pathogenic avian influenza (HPAI) viruses, and not other types of viruses
    - We are only considering the risk from research facilities, and not other sources of accidental release
    - We are assuming that the number of facilities conducting HPAI research will remain constant over the this century

## 3. Estimating Deliberate Pandemic Risk

To estimate the expected number of deaths from deliberate pandemics this century, we'll perform the following steps:
1. Estimate the number of bioterrorists this century<br>
    1.1. Using previous terrorist attacks (Method 1)<br>
    1.2. Using historical examples of bioterrorists (Method 2)<br>
2. Estimating the number of bioterrorists this century (accounting for individuals retraining)
3. Estimating the number of deaths (when pandemic blueprints become available)

In [None]:
params.print_category('Deliberate')

In [None]:
gtd_xls = params.Deliberate.dataset.val
deaths_per_attack = params.Deliberate.deaths_per_attack.val
population_us_1995 = params.Deliberate.population_us_1995.val
num_indv_capability = params.Deliberate.num_indv_capability.val
num_indv_capability_intent_last_century = params.Deliberate.num_indv_capability_intent_last_century.val
retrain_indv_multiplier_min, retrain_indv_multiplier_max = params.Deliberate.retrain_indv_multiplier.val
deliberate_multiplier_min, deliberate_multiplier_max = params.Deliberate.deliberate_multiplier.val
num_years_blueprints_min, num_years_blueprints_max = params.Deliberate.num_years_until_blueprints.val
deliberate_colour = params.Deliberate.colour.val

### 3.1 Estimating the number of bioterrorists this century
We'll use two different methods to estimate the number of individuals who already have the capability and intent i.e. scientists turned bioterrorists.

#### 3.1.1 Using previous terrorist attacks (Method 1)

$
E[\text{#Bioterrorists}] = \text{Individuals With Capability} \times \text{Fraction of People With Intent}
$

For the number of individuals with capability, we'll use the number of new individuals who learn to assemble a virus each year from [Esvelt 2022](https://dam.gcsp.ch/files/doc/gcsp-geneva-paper-29-22).

For the fraction of these people with intent, we'll estimate this based on previous terrorist incidents in the US. Given the widespread access to firearms in the US we'll assume that any individual who wants to cause mass harm, defined here as killing at least two people, can do so. Specifically, we'll look at the number of terrorist events in the Global Terrorism Database ([GTD](https://www.start.umd.edu/gtd/)) in the US between 1970 and 2020.

In [None]:
# Load and preprocess data
gtd_df = load_and_preprocess_deliberate_data(gtd_xls, deaths_per_attack)
num_events = len(gtd_df)

# Plot 
fig = plot_deaths_per_attack_scatter(gtd_df, deaths_per_attack, num_events)

In [None]:
frac_invd_intent = num_events / population_us_1995
display_text(format_intent_fraction(frac_invd_intent))

E_num_bioterrorists_per_year_1 = num_indv_capability * frac_invd_intent

#### 3.1.2 Using historical examples of bioterrorists (Method 2)
For this method we'll look at the historical base rate of bioterrorists. Historically terrorists have failed to use bioweapons. In the last century, arguably the individuals who came the closest to releasing an infectious agent were members from the terrorist organisations Aum Shinrikyo and al-Qaeda. Here, we'll assume that they failed due a lack of capability and they would have succeded using biotechnology from this century.

In [None]:
E_num_bioterrorists_per_year_2 = num_indv_capability_intent_last_century / 100

In [None]:
E_num_bioterrorists_per_year = np.mean([E_num_bioterrorists_per_year_1, E_num_bioterrorists_per_year_2])
display_text(f"""
    Expected number of bioterrorists this century
    = (Method 1 + Method 2) / 2
    = ({E_num_bioterrorists_per_year_1 * num_years:.1f} + {E_num_bioterrorists_per_year_2 * num_years:.1f}) / 2
    = <strong> {E_num_bioterrorists_per_year * num_years:.1f} scientists turned bioterrorists </strong>
    """
    )

### 3.2 Estimating the number of bioterrorists this century (accounting for individuals retraining)
Here, we'll adjust our estimate to account for terrorists who become bioterrorists using estimate from [Esvelt 2023](https://dam.gcsp.ch/files/doc/securing-civilisation-against-catastrophic-pandemics-gp-31).

In [None]:
# Account for retraining
retrain_indv_multiplier = np.random.uniform(retrain_indv_multiplier_min, retrain_indv_multiplier_max, num_simulations)
E_num_bioterrorists_per_year = E_num_bioterrorists_per_year * retrain_indv_multiplier
E_num_bioterrorists_this_century = E_num_bioterrorists_per_year * num_years

display_text(f"Expected Number of Bioterrorists this century (accounting for individuals retraining) = <strong> {np.mean(E_num_bioterrorists_this_century):.1f} total bioterrorists </strong>") 

### 3.3 Estimating the number of deaths (when pandemic blueprints become available)

Here we assume that:
1. Blueprints for a pathogen with pandemic potential will become available within the next 50 years
2. Blueprints for a pathogen with pandemic potential have an equal probability of becoming available at any point in this 50-year window
2. Bioterrorists are unable to cause a deliberate pandemic until such blueprints become available
3. That a deliberate pandemic would be 1-10x worse than the 1918 influenza pandemic due to factors such as multiple pathogens and/or releases

In [None]:
year_blueprints_available = np.random.uniform(num_years_blueprints_min, num_years_blueprints_max, num_simulations)
deliberate_multiplier = np.random.uniform(deliberate_multiplier_min, deliberate_multiplier_max, num_simulations)
E_deliberate_deaths = np.zeros((num_simulations, num_years))
for i in range(num_simulations):
    for j in range(num_years):
        if j < year_blueprints_available[i]:
            E_deliberate_deaths[i, j] = 0
        else:
            E_deliberate_deaths[i, j] = E_num_bioterrorists_per_year[i] * population * infection_rate * fatality_rate * deliberate_multiplier[i]

E_deliberate_deaths_avg = (np.mean(E_deliberate_deaths, axis=0))

# total_individuals = calculate_individual_capability(params)
# fig = plot_capability_growth(params, deliberate_colour)

fig = go.Figure(data=[
    go.Scatter(x=list(range(num_years)), y=E_deliberate_deaths_avg, mode='lines', name='Expected Deaths', line=dict(color=deliberate_colour))
])
fig.update_layout(title="Expected number of deaths from deliberate pandemics over this century", xaxis_title="Years", yaxis_title="Expected Deaths")
fig.show()

In [None]:
fig = plot_E_deliberate_deaths_hist(np.sum(E_deliberate_deaths, axis=1), deliberate_colour)