In [None]:
import warnings
warnings.filterwarnings('ignore')
from helpers import *
import numpy as np
import plotly.express as px

# Estimating Pandemic Risk

Here, we will estimate the expected number of deaths this century from the following types of pandemics:
1. Natural pandemics (based on estimates from [Marani et al. 2021](https://doi.org/10.1073/pnas.2105482118))
2. Accidental pandemics (based on estimates from [Klotz 2021](https://armscontrolcenter.org/wp-content/uploads/2017/04/LWC-paper-final-version-for-CACNP-website.pdf))
3. Deliberate pandemics (based on estimates from [Esvelt 2022](https://dam.gcsp.ch/files/doc/gcsp-geneva-paper-29-22?_gl=1*ieur8b*_ga*MTk1NzA0MTU3My4xNjk2NzcyODA0*_ga_Z66DSTVXTJ*MTY5Njc3MjgwNC4xLjAuMTY5Njc3MjgwNi41OC4wLjA.))

In [None]:
# Define vars
params = Params()

We will will estimate the expected number of deaths this century i.e. from 2023 to 2100. For simplicity, we'll assume a fixed population of 9.2 billion people over this time period (based on [UN population projections](https://www.worldometers.info/world-population/world-population-projections/)):

$
\text{Average population} = \frac{\text{Starting population} + \text{Population in 2100}}{2} = \frac{8.0 \text{ billion} + 10.3 \text{ billion}}{2} = 9.2 \text{ billion}
$

For the estimates, we'll be using Monte Carlo simulations. These parameters which are used for multiple of the estimates are shown below:

In [None]:
params.print_category('Global')

## 1. Estimating Natural Pandemic Risk
To esimate natural pandemic risk, we'll use the following parameters from [Marani et al.](https://doi.org/10.1073/pnas.2105482118):

In [None]:
params.print_category('Natural')

Marani et al. assembled a dataset of epidemics that have occured since 1500. The dataset includes the start and end dates of each epidemic and the number of deaths. The dataset only contains epidemics were not currently active. Therefore, AIDS/HIV, malaria, and COVID-19 were excluded.

The dataset is visualised below:

In [None]:
marani_xls = params.Natural.dataset.val
marani_df, disease_totals = load_and_preprocess_natural_data(marani_xls)
color_map = generate_color_map(disease_totals)
fig = plot_disease_timeline(marani_df, disease_totals, color_map)

### Epidemic Intensity and Exceedance Probability

Epidemic intensity refers to the number of deaths due to an epidemic in a given year. When we plot this intensity against its exceedance probability, it provides insights into the likelihood of extreme epidemic events. The exceedance probability indicates the probability that the epidemic intensity will be exceeded in a given year.

Marani et al. demonstrated that this relationship between epidemic intensity and exceedance probability is well-described by the Generalized Pareto Distribution (GPD) which is mathematically defined as:

$
H(i) = P_0 + (1 - P_0) \times \left(1 + \frac{\xi \times (i - \mu)}{\sigma}\right)^{-\frac{1}{\xi}}
$

Where:
- $ H(i) $ is the exceedance probability of the epidemic intensity $ i $.
- $ \mu $ is the threshold parameter.
- $ \sigma $ is the scale parameter.
- $ \xi $ is the shape parameter.
- $ P_0 $ is the probability for intensities below the threshold $ \mu $.

The epidemic intensity is plotted against its exceedance probability below, including the GPD using parameter estimates from Marani et al.

<!-- #### Understanding Exceedance Probability

The exceedance probability of the yearly maximum epidemic intensity, $ H_1(i) $, is given by $ H_1(i) = 1 - P_1(i) $. This value represents the likelihood that an extreme novel epidemic (irrespective of its causative disease) with an intensity equal to or greater than $ i $, will occur anywhere in the world within a year.

To put this into perspective, consider an epidemic with an intensity similar to the 1918-1920 "Spanish flu." For this epidemic, with an intensity of $ i = 5.7 $ deaths per thousand per year, the yearly probability of exceedance is represented by $ H_1(i = 5.7 \text{‰/year}) $. -->

<!-- Epidemic intensity refers to the number of deaths due to an epidemic in a given year. When we plot this intensity against its exceedance probability, it provides insights into the likelihood of extreme epidemic events. Specifically, the exceedance probability indicates the probability that the epidemic intensity will be exceeded in a given year.

Marani et al. demonstrated that this relationship between epidemic intensity and exceedance probability is well-described by the Generalized Pareto Distribution (GPD). The GPD is particularly useful for modeling extreme events, making it a suitable choice for representing extreme epidemic intensities.

The Generalized Pareto Distribution (GPD) is mathematically defined as:



Marani et al. showed that the when the epidemic intesity, that is the number of deaths per year, is plotted against the exceedance probability, which is the probability that the epidemic intensity will be exceeded in a given year, the data is well described by a generalized Pareto distribution (GPD). The GPD is a probability distribution that is often used to model extreme events. The GPD is defined as:

$
\text{GPD}(x) = \begin{cases}
\frac{1}{\sigma} \left(1 + \frac{\xi x}{\sigma} \right)^{-\frac{1}{\xi} - 1} & \text{if } \xi \neq 0 \\
\frac{1}{\sigma} e^{-\frac{x}{\sigma}} & \text{if } \xi = 0
\end{cases}
$

where $\sigma$ is the scale parameter and $\xi$ is the shape parameter. The exceedance probability is defined as:

$
\text{Exceedance probability} = \frac{\text{Number of epidemics with intensity greater than } x}{\text{Total number of epidemics}}
$

The GPD is visualised below: -->

In [None]:
mu = params.Natural.mu.val
sigma = params.Natural.sigma.val
xi = params.Natural.xi.val

fig = plot_exceedance_probability(
        marani_df=marani_df,
        plot_gpd=True,
        mu=mu,
        sigma=sigma,
        xi=xi
    )

### Using the GPD to Estimate the Expected Number of Deaths from Natural Pandemics this Century

In [None]:
max_intensity = params.Natural.max_intensity.val

intensities= np.linspace(mu, max_intensity, 100)
gpd_values = compute_gpd(mu, sigma, xi, intensities)
expected_yearly_intensity = np.trapz(intensities * gpd_values, intensities)

fig = plot_exceedance_probability(intensities, gpd_values, log_axis=False, title_text=f"Expected yearly intensity = {expected_yearly_intensity:.1f} million deaths")

In [None]:
num_years = params.Global.num_years.val
mean_duration = marani_df['Duration'].mean()
E_natural_deaths = expected_yearly_intensity * mean_duration * num_years

display_text(f"Expected number of deaths from natural epidemics over {num_years} years, assuming {mean_duration:.1f} years per outbreak = {E_natural_deaths:.1f} million deaths")

## 2. Estimating Accidental Pandemic Risk

We'll be examining the potential risk of a pandemic from the accidental release of viruses from research facilities. This analysis will involve three main sections:

1. Probability a single facility in a single year seeds a pandemic
2. Expected number of accidental pandemics this century
3. Expected number of deaths from accidental pandemics this century

Note, limitations of this analysis include:
- The results are highly dependent on the probabilities of release computed in [Klotz 2020](https://armscontrolcenter.org/wp-content/uploads/2020/03/Quantifying-the-risk-9-17-Supplementary-material-at-end.pdf)
- We are only considering the risk from highly pathogenic avian influenza (HPAI) viruses, and not other types of viruses
- We are only considering the risk from research facilities, and not other sources of accidental release
- We are assuming that the number of facilities conducting HPAI research will remain constant over the this century


In [None]:
params.print_category('Accidental')

### 2.1 Probability a single facility in a single year seeds a pandemic

The probability that a single facility seeds a pandemic in a single year is given by:

$
P_{\text{single_pandemic}} = P_{\text{release}} \times P_{\text{seeds_pandemic}}
$

Where:
- $P_{\text{release}}$ is the probability of community release from a single facility in a single year.
- $P_{\text{seeds_pandemic}}$ is the probability that a virus release seeds a pandemic. Since $P_{\text{seeds_pandemic}}$ is given as a range, we will sample from a uniform distribution between 0.05 and 0.4.

In [None]:
P_seeds_pandemic_min = params.Accidental.P_seeds_pandemic_min.val
P_seeds_pandemic_max = params.Accidental.P_seeds_pandemic_max.val
num_simulations = params.Global.num_simulations.val
P_release = params.Accidental.P_release.val
accidental_colour = params.Accidental.colour.val

P_seeds_pandemic = np.random.uniform(P_seeds_pandemic_min, P_seeds_pandemic_max, num_simulations)
P_single_pandemic = P_release * P_seeds_pandemic

# Plot the distribution of P_single_pandemic
fig = plot_P_single_pandemic_hist(P_single_pandemic, accidental_colour)

### 2.2. Expected number of accidental pandemics this century

The expected number of pandemics seeded by any facility in a $num\_years$ is given by:

$
E[\text{#Pandemics}] = P_{\text{single_pandemic}} \times \text{num_years} \times \text{num_facilities}
$

Where:
- $\text{num_years}$ is the number of years in the simulation (typically set to 100 for a century).
- $\text{num_facilities}$ is the number of facilities in the world.
- $P_{\text{single_pandemic}}$ is the probability that a single facility seeds a pandemic in a single year.

In [None]:
num_facilities = params.Accidental.num_facilities.val

E_accidental_pandemics = P_single_pandemic * num_years * num_facilities

# Plot the distribution of E_accidental_pandemics
fig = plot_E_accidental_pandemics_hist(E_accidental_pandemics, accidental_colour)

### 2.3. Expected number of deaths from accidental pandemics this century

For each pandemic, the number of deaths is calculated as:

$
E[\text{#Deaths} ]= E[\text{#Pandemics}] \times \text{Population} \times \text{Infection Rate} \times \text{CFR}
$

Where:
- $\text{Population}$ is the world population at the time of the pandemic.
- $\text{Infection Rate}$ is the infection rate of the pandemic.
- $\text{CFR}$ is the case fatality rate of the pandemic.

In [None]:
population = params.Global.population.val
infection_rate = params.Accidental.infection_rate.val
fatality_rate = params.Accidental.fatality_rate.val

E_accidental_deaths = E_accidental_pandemics * population * infection_rate * fatality_rate

# Plot the distribution of E_accidental_deaths
fig = plot_E_accidental_deaths_hist(E_accidental_deaths, accidental_colour)

## 3. Estimating Deliberate Pandemic Risk

In [None]:
params.print_category('Deliberate')

In [None]:
gtd_xls = params.Deliberate.dataset.val
deaths_per_attack = params.Deliberate.deaths_per_attack.val
population_us_1995 = params.Deliberate.population_us_1995.val
num_indv_capability = params.Deliberate.num_indv_capability.val
deliberate_multiplier_max = params.Deliberate.deliberate_multiplier_max.val
deliberate_colour = params.Deliberate.colour.val

# Load and preprocess data
gtd_df = load_and_preprocess_deliberate_data(gtd_xls, deaths_per_attack)
num_events = len(gtd_df)
frac_invd_intent = num_events / population_us_1995

# Plot 
fig = plot_deaths_per_attack_scatter(gtd_df, deaths_per_attack, num_events, frac_invd_intent)

In [None]:
total_individuals = calculate_individual_capability(params)
fig = plot_capability_growth(params, deliberate_colour)

In [None]:
# Expected number of individuals with the intent to cause mass harm this century
num_indv_capability = total_individuals[-1]
E_num_indv_capability_per_century = num_indv_capability * frac_invd_intent
deliberate_multiplier = np.random.uniform(1, deliberate_multiplier_max, num_simulations)
E_deliberate_deaths = E_num_indv_capability_per_century * population * infection_rate * fatality_rate * deliberate_multiplier

# Plot the distribution of E_deliberate_deaths
fig = plot_E_deliberate_deaths_hist(E_deliberate_deaths, deliberate_colour)