# How to resume a simulation

This tutorial shows how to resume a simulation with the states coming from a previous simulation. It also speeds up the generation of results because after estimating parameters on some data, you can use the last day of the simulation results to run multiple counterfactual simulations.

In the following we use the same model as in the [how to simulate](how_to_simulate.ipynb) tutorial.

1. We will simulate data for some periods.
2. Inspect the simulation results.
3. Restart the simulation.

In [1]:
%matplotlib inline

import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import sid
from sid.config import INDEX_NAMES

## Simulate the data

Let's create an artificial population of 10 000 people. 
Every individual will be characterized by its region and age group.

The age group will affect the progression of the disease.
Both region and age group will have an influence on who our individuals are going to meet.

In [2]:
available_ages = [
    "0-9",
    "10-19",
    "20-29",
    "30-39",
    "40-49",
    "50-59",
    "60-69",
    "70-79",
    "80-100",
]

ages = np.random.choice(available_ages, size=10_000)
regions = np.random.choice(["North", "South"], size=10_000)

initial_states = pd.DataFrame({"age_group": ages, "region": regions}).astype("category")
initial_states.head(5)

Unnamed: 0,age_group,region
0,70-79,South
1,50-59,South
2,50-59,South
3,60-69,South
4,20-29,North


### Specifying the Contact Models

Next, let's define how many contacts people have every day. 
We assume people have two types of contacts, close and distant contacts. They also have fewer close than distant contacts.

In [3]:
def meet_distant(states, params):
    possible_nr_contacts = np.arange(10)
    contacts = np.random.choice(possible_nr_contacts, size=len(states))
    return pd.Series(contacts, index=states.index)


def meet_close(states, params):
    possible_nr_contacts = np.arange(5)
    contacts = np.random.choice(possible_nr_contacts, size=len(states))
    return pd.Series(contacts, index=states.index)


assort_by = ["age_group", "region"]

contact_models = {
    "distant": {"model": meet_distant, "assort_by": assort_by, "is_recurrent": False},
    "close": {"model": meet_close, "assort_by": assort_by, "is_recurrent": False},
}

### Specifying the model parameters

sid allows to estimate one infection probability per contact type. 
In this example, close contacts are more infectious than distant contacts with 5% versus 3%. 

In [4]:
epidemiological_parameters = pd.read_csv("infection_probs.csv", index_col=INDEX_NAMES)
epidemiological_parameters

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,value,note,source
category,subcategory,name,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
infection_prob,close,close,0.05,,
infection_prob,distant,distant,0.03,,
infection_prob,household,household,0.2,,


Similarly, we specify for each contact model how assortatively
people meet across their respective `assort_by` keys. 

We assume that 90% of contacts are with people from the same region and 50% with contacts of the same age group as oneself for both "meet_close" and "meet_distant". 
The rest of the probability mass is split evenly between the other regions and age groups.

In [5]:
assort_probs = pd.read_csv("assort_by_params.csv", index_col=INDEX_NAMES)
assort_probs

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,value,note,source
category,subcategory,name,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
assortative_matching,close,age_group,0.5,,
assortative_matching,close,region,0.9,,
assortative_matching,distant,age_group,0.5,,
assortative_matching,distant,region,0.9,,


Lastly, we load some parameters that specify how Covid-19 progresses. This includes asymptomatic cases and covers that sever cases are more common among the elderly.

`cd_` stands for countdown. When a countdown is -1 the event never happens. So for example, 25% of infected people will never develop symptoms and the rest will develop symptoms 3 days after they start being infectious. 

In [6]:
disease_params = sid.get_epidemiological_parameters()
disease_params.head(6).round(2)

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,value
category,subcategory,name,Unnamed: 3_level_1
health_system,icu_limit_relative,icu_limit_relative,50.0
cd_immune_false,all,1000,1.0
cd_infectious_true,all,1,0.39
cd_infectious_true,all,2,0.35
cd_infectious_true,all,3,0.22
cd_infectious_true,all,5,0.04


In [7]:
params = pd.concat([disease_params, epidemiological_parameters, assort_probs])

### Contact Policies

We also allow the government to react to the rising number of infections by enforcing a
mild curfew which halves the contacts of all individuals if 10% of all individuals are
infectious.

You could also implement policies inside the contact models. This allows you more flexibility. For example, we could implement that sick individuals stay home, that vulnerable people react with stricter social distancing or implement the [locally adaptive lockdown policy in Germany](https://www.dw.com/en/merkel-cautiously-optimistic-as-she-announces-lockdown-rollback/a-53346427).

In [8]:
def contact_policy_is_active(states):
    return states["infectious"].mean() > 0.1


contact_policies = {
    "policy_close_contacts": {
        "start": "2020-03-12",
        "policy": 0.5,
        "is_active": contact_policy_is_active,
        "affected_contact_model": "close",
    },
    "policy_distant_contacts": {
        "start": "2020-03-12",
        "policy": 0.5,
        "is_active": contact_policy_is_active,
        "affected_contact_model": "distant",
    },
}

Finally, there must be some initial infections in our population. This is specified via the initial conditions which are thouroughly explained in the [how-to guide](../how_to_guides/how_to_use_initial_conditions.ipynb). For now, we assume that there are 100 infected individuals and 50 with pre-existing immunity.

In [9]:
initial_conditions = {"initial_infections": 100, "initial_immunity": 50}

### Run the simulation

We are going to simulate this population for 200 periods.

In [10]:
simulate = sid.get_simulate_func(
    initial_states=initial_states,
    contact_models=contact_models,
    params=params,
    contact_policies=contact_policies,
    initial_conditions=initial_conditions,
    duration={"start": "2020-02-27", "periods": 50},
    saved_columns={"time": "date"},
    seed=0,
)
df = simulate(params=params)

In [12]:
df = df.time_series.to_dataframe()

NameError: name 'asdsad' is not defined

The return of `simulate` is a [Dask DataFrame](https://docs.dask.org/en/latest/dataframe.html) which lazily loads the data. If your data fits your working memory, do the following to convert it to a pandas DataFrame.

Let us take a look at various statistics of the sample.

In [None]:
fig, axs = plt.subplots(2, 2, figsize=(12, 8))
fig.subplots_adjust(bottom=0.15, wspace=0.2, hspace=0.4)

axs = axs.flatten()

df.resample("D", on="date")["ever_infected"].mean().plot(ax=axs[0], color="#5c7f8e")
df.resample("D", on="date")["infectious"].mean().plot(ax=axs[1], color="#5c7f8e")
df.resample("D", on="date")["dead"].sum().plot(ax=axs[2], color="#5c7f8e")
infectious_last_seven_days = df.cd_infectious_false.between(-7, 0)
df.loc[infectious_last_seven_days].resample("D", on="date")[
    "n_has_infected"
].mean().plot(ax=axs[3], color="#5c7f8e")

for ax in axs:
    ax.set_xlabel("")
    ax.spines["right"].set_visible(False)
    ax.spines["top"].set_visible(False)

axs[0].set_title("Share of Infected People")
axs[1].set_title("Share of Infectious People in the Population")
axs[2].set_title("Total Number of Deaths")
axs[3].set_title("$R_t$ (Effective Reproduction Number)")

plt.show()