# Counterfactual simulation
This notebook demonstrates how to do a counterfactual simulation across different interventions and detection times in pathosim.

In [1]:
import unittest 
import pathosim as inf
import sciris as sc
import os
import numpy as np
    
result_keys = [ 
    'n_infectious',  
    'n_symptomatic', 
    'n_severe',       
    'n_recovered',   
    'n_dead']

PathoSim 3.1.2 (2022-01-16) — © 2023 by McGill University


##### Baseline simulation parameters

In [2]:
sim_pars = dict(
        use_waning    = True,           
        pop_size      = 5000,       
        pop_type      = 'behaviour_module',      
        n_days        = 100,            
        verbose       = 0,             
        rand_seed     = 42,                        
    )  

##### Intervention packages
We define two different intervention packages which each consist of several interventions

In [3]:

# interventions
school_closure = inf.change_beta(days=[40], changes=[0], layers='s')
home_office = inf.change_beta(days=[20], changes=[0.2], layers='w')
event_cancellation = inf.change_beta(days=[20], changes=[0.8], layers='c')
event_and_business_closure = inf.change_beta(days=[40], changes=[0.3], layers='c') # currently, we cannot combine several change_betas at the same layer, so they have to be combined a priori

# intervention packages (stored as dict)
packages = {
    "medium" : [home_office, event_cancellation],
    "strong" : [home_office, event_and_business_closure, school_closure]
}

##### Pathogen

In [4]:
pathogen = inf.SARS_COV_2(10)

##### Set up counterfactual simulation

In [5]:
cf = inf.CounterfactualMultiSim(sim_pars, pathogens = [pathogen], intervention_packages=packages, n_sims=5, maxcpu=0.9, maxmem = 0.9, parallelize = True)

Run the baseline scenario (no interventions). This also checks whether a larger epidemic occurs in the simulated scenario and determines a realistic range of detection times.

In [6]:
cf.run_baseline(verbose = True)

CPU ✓ (0.35<0.90), memory ✓ (0.54<0.90): starting process 0 after 1 tries
Running baseline (seed 42) simulation.
CPU ✓ (0.28<0.90), memory ✓ (0.55<0.90): starting process 1 after 1 tries
Running baseline (seed 43) simulation.
CPU ✓ (0.37<0.90), memory ✓ (0.55<0.90): starting process 2 after 1 tries
Running baseline (seed 44) simulation.
CPU ✓ (0.44<0.90), memory ✓ (0.55<0.90): starting process 3 after 1 tries
Running baseline (seed 45) simulation.
CPU ✓ (0.54<0.90), memory ✓ (0.55<0.90): starting process 4 after 1 tries
Running baseline (seed 46) simulation.


In [7]:
i = 2
print(cf.sims[i].sim_baseline)
print(f"Detection range: {cf.sims[i].sim_baseline.get_detection_ranges()}")
print(f"Is epidemic: {cf.sims[i].sim_baseline.is_epidemic()}")

Sim(<no label>; 2020-03-01 to 2020-06-09; pop: 5000 behaviour_module; epi: 9322⚙, 36☠)
Detection range: [{'lower': 4.0, 'upper': 27}]
Is epidemic: [True]


You can also get a summary of the baseline simulation in the form of a pandas dataframe.

In [8]:
cf.sims[i].summary_baseline

Unnamed: 0,pathogen,cum_infections,cum_reinfections,cum_infectious,cum_symptomatic,cum_severe,cum_critical,cum_recoveries,cum_deaths,cum_tests,...,doubling_time,test_yield,rel_test_yield,frac_vaccinated,pop_imm,pop_nabs,pop_protection,pop_symp_protection,new_diagnoses_custom,cum_diagnoses_custom
0,0,9322.0,4330.0,9233.0,5262.0,351.0,109.0,8975.0,36.0,0.0,...,30.0,0.0,0.0,0.0,0.0,4.514877,0.825653,0.32994,0.0,0.0


##### Run counterfactual simulations
To run a counterfactual simulation, you have to specify which intervention package should be used and what detection time to assume.

The simulations are parallelized across seeds.

In [15]:
cf.run_counterfactual(intervention_package_key="medium", detection_times=[2, 4], verbose = True)

CPU ✓ (0.30<0.90), memory ✓ (0.65<0.90): starting process 0 after 1 tries
Running counterfactual (seed 42) for intervention package "medium" with detection time 2.
CPU ✓ (0.31<0.90), memory ✓ (0.65<0.90): starting process 1 after 1 tries
Running counterfactual (seed 43) for intervention package "medium" with detection time 2.
CPU ✓ (0.43<0.90), memory ✓ (0.65<0.90): starting process 2 after 1 tries
Running counterfactual (seed 44) for intervention package "medium" with detection time 2.
CPU ✓ (0.62<0.90), memory ✓ (0.65<0.90): starting process 3 after 1 tries
Running counterfactual (seed 45) for intervention package "medium" with detection time 2.
Running counterfactual (seed 42) for intervention package "medium" with detection time 4.
CPU ✓ (0.59<0.90), memory ✓ (0.65<0.90): starting process 4 after 1 tries
Running counterfactual (seed 46) for intervention package "medium" with detection time 2.
Running counterfactual (seed 43) for intervention package "medium" with detection time 4.


Note that for debugging / testing purposes, you can also run simulations in non-parallel mode:

In [None]:
cf.run_counterfactual(intervention_package_key="medium", detection_times=[2, 4], verbose = True, parallelize=False)

You can inspect the stored counterfactual simulations in `cf.sims_counterfactual`.

In [14]:
i = 1
cf.sims[i].sims_counterfactual

#0: 'medium': {2: Sim(<no label>; 2020-03-01 to 2020-06-09; pop: 5000
behaviour_module; epi: 8473⚙, 37☠), 4: Sim(<no label>; 2020-03-01 to 2020-06-09;
pop: 5000 behaviour_module; epi: 8473⚙, 37☠)}
#1: 'strong': {}




This is a two-level dictionary. The first key represents the intervention package, and the second key the detection time.

In [12]:
i = 0
cf.sims[i].sims_counterfactual["medium"][2]

Sim(<no label>; 2020-03-01 to 2020-06-09; pop: 5000 behaviour_module; epi: 8702⚙, 28☠)

The summary of each counterfactual run is stored in an attribute called `summaries_counterfactual`. It is again a two-level dictionary with the same logic as the `sims_counterfactual` dictionary.

In [16]:
cf.sims[i].summaries_counterfactual["medium"][4]

Unnamed: 0,pathogen,cum_infections,cum_reinfections,cum_infectious,cum_symptomatic,cum_severe,cum_critical,cum_recoveries,cum_deaths,cum_tests,...,doubling_time,test_yield,rel_test_yield,frac_vaccinated,pop_imm,pop_nabs,pop_protection,pop_symp_protection,new_diagnoses_custom,cum_diagnoses_custom
0,0,8702.0,3715.0,8611.0,5033.0,321.0,97.0,8407.0,28.0,0.0,...,30.0,0.0,0.0,0.0,0.0,4.256794,0.805765,0.328463,0.0,0.0


But it is much more convenient to get a pandas data frame with all results and across all seeds:

In [14]:
cf.get_summaries_df()

Unnamed: 0,seed,intervention_package,delay,pathogen,cum_infections,cum_reinfections,cum_infectious,cum_symptomatic,cum_severe,cum_critical,...,doubling_time,test_yield,rel_test_yield,frac_vaccinated,pop_imm,pop_nabs,pop_protection,pop_symp_protection,new_diagnoses_custom,cum_diagnoses_custom
0,42,baseline,0,0,9284.0,4289.0,9169.0,5309.0,328.0,107.0,...,30.0,0.0,0.0,0.0,0.0,4.397622,0.819455,0.329671,0.0,0.0
0,42,medium,2,0,8702.0,3715.0,8611.0,5033.0,321.0,97.0,...,30.0,0.0,0.0,0.0,0.0,4.256794,0.805765,0.328463,0.0,0.0
0,42,medium,4,0,8702.0,3715.0,8611.0,5033.0,321.0,97.0,...,30.0,0.0,0.0,0.0,0.0,4.256794,0.805765,0.328463,0.0,0.0
0,43,baseline,0,0,9424.0,4428.0,9282.0,5327.0,338.0,114.0,...,30.0,0.0,0.0,0.0,0.0,4.634543,0.826632,0.33023,0.0,0.0
0,43,medium,2,0,8473.0,3484.0,8419.0,4917.0,307.0,107.0,...,30.0,0.0,0.0,0.0,0.0,4.233403,0.804375,0.328525,0.0,0.0
0,43,medium,4,0,8473.0,3484.0,8419.0,4917.0,307.0,107.0,...,30.0,0.0,0.0,0.0,0.0,4.233403,0.804375,0.328525,0.0,0.0
0,44,baseline,0,0,9322.0,4330.0,9233.0,5262.0,351.0,109.0,...,30.0,0.0,0.0,0.0,0.0,4.514877,0.825653,0.32994,0.0,0.0
0,44,medium,2,0,8446.0,3459.0,8376.0,4875.0,329.0,104.0,...,30.0,0.0,0.0,0.0,0.0,4.182454,0.80879,0.328658,0.0,0.0
0,44,medium,4,0,8446.0,3459.0,8376.0,4875.0,329.0,104.0,...,30.0,0.0,0.0,0.0,0.0,4.182454,0.80879,0.328658,0.0,0.0
0,45,baseline,0,0,9024.0,4027.0,8955.0,5236.0,331.0,93.0,...,30.0,0.0,0.0,0.0,0.0,4.395917,0.813721,0.329556,0.0,0.0


If you only want results for a specific seed, you can of course access the underlying object:

In [17]:
cf.sims[1].get_summaries_df()

Unnamed: 0,seed,intervention_package,delay,pathogen,cum_infections,cum_reinfections,cum_infectious,cum_symptomatic,cum_severe,cum_critical,...,doubling_time,test_yield,rel_test_yield,frac_vaccinated,pop_imm,pop_nabs,pop_protection,pop_symp_protection,new_diagnoses_custom,cum_diagnoses_custom
0,43,baseline,0,0,9424.0,4428.0,9282.0,5327.0,338.0,114.0,...,30.0,0.0,0.0,0.0,0.0,4.634543,0.826632,0.33023,0.0,0.0
0,43,medium,2,0,8473.0,3484.0,8419.0,4917.0,307.0,107.0,...,30.0,0.0,0.0,0.0,0.0,4.233403,0.804375,0.328525,0.0,0.0
0,43,medium,4,0,8473.0,3484.0,8419.0,4917.0,307.0,107.0,...,30.0,0.0,0.0,0.0,0.0,4.233403,0.804375,0.328525,0.0,0.0


##### Scan detection times with counterfactual simulation

This performs a scan over the detection range identified from the baseline simulation. The range is divided into parts according to `n_steps`.

In [18]:
cf.scan_detection_range(intervention_package_keys="medium", n_steps = 3, verbose = True)

CPU ✓ (0.19<0.90), memory ✓ (0.67<0.90): starting process 0 after 1 tries
Running counterfactual (seed 42) for intervention package "medium" with detection time 8.
CPU ✓ (0.38<0.90), memory ✓ (0.67<0.90): starting process 1 after 1 tries
Running counterfactual (seed 43) for intervention package "medium" with detection time 5.
Running counterfactual (seed 42) for intervention package "medium" with detection time 16.
CPU ✓ (0.46<0.90), memory ✓ (0.68<0.90): starting process 2 after 1 tries

Running counterfactual (seed 44) for intervention package "medium" with detection time 4.Running counterfactual (seed 43) for intervention package "medium" with detection time 15.
CPU ✓ (0.60<0.90), memory ✓ (0.69<0.90): starting process 3 after 1 tries
Running counterfactual (seed 45) for intervention package "medium" with detection time 4.
Running counterfactual (seed 42) for intervention package "medium" with detection time 23.
Running counterfactual (seed 44) for intervention package "medium" with

In [19]:
i = 0
cf.sims[i].sims_counterfactual

#0: 'medium': {2: Sim(<no label>; 2020-03-01 to 2020-06-09; pop: 5000
behaviour_module; epi: 8702⚙, 28☠), 4: Sim(<no label>; 2020-03-01 to 2020-06-09;
pop: 5000 behaviour_module; epi: 8702⚙, 28☠), 8: Sim(<no label>; 2020-03-01 to
2020-06-09; pop: 5000 behaviour_module; epi: 8702⚙, 28☠), 16: Sim(<no label>;
2020-03-01 to 2020-06-09; pop: 5000 behaviour_module; epi: 8702⚙, 28☠), 23:
Sim(<no label>; 2020-03-01 to 2020-06-09; pop: 5000 behaviour_module; epi:
8702⚙, 28☠)}
#1: 'strong': {}




In [20]:
cf.get_summaries_df()

Unnamed: 0,seed,intervention_package,delay,pathogen,cum_infections,cum_reinfections,cum_infectious,cum_symptomatic,cum_severe,cum_critical,...,doubling_time,test_yield,rel_test_yield,frac_vaccinated,pop_imm,pop_nabs,pop_protection,pop_symp_protection,new_diagnoses_custom,cum_diagnoses_custom
0,42,baseline,0,0,9284.0,4289.0,9169.0,5309.0,328.0,107.0,...,30.0,0.0,0.0,0.0,0.0,4.397622,0.819455,0.329671,0.0,0.0
0,42,medium,2,0,8702.0,3715.0,8611.0,5033.0,321.0,97.0,...,30.0,0.0,0.0,0.0,0.0,4.256794,0.805765,0.328463,0.0,0.0
0,42,medium,4,0,8702.0,3715.0,8611.0,5033.0,321.0,97.0,...,30.0,0.0,0.0,0.0,0.0,4.256794,0.805765,0.328463,0.0,0.0
0,42,medium,8,0,8702.0,3715.0,8611.0,5033.0,321.0,97.0,...,30.0,0.0,0.0,0.0,0.0,4.256794,0.805765,0.328463,0.0,0.0
0,42,medium,16,0,8702.0,3715.0,8611.0,5033.0,321.0,97.0,...,30.0,0.0,0.0,0.0,0.0,4.256794,0.805765,0.328463,0.0,0.0
0,42,medium,23,0,8702.0,3715.0,8611.0,5033.0,321.0,97.0,...,30.0,0.0,0.0,0.0,0.0,4.256794,0.805765,0.328463,0.0,0.0
0,43,baseline,0,0,9424.0,4428.0,9282.0,5327.0,338.0,114.0,...,30.0,0.0,0.0,0.0,0.0,4.634543,0.826632,0.33023,0.0,0.0
0,43,medium,2,0,8473.0,3484.0,8419.0,4917.0,307.0,107.0,...,30.0,0.0,0.0,0.0,0.0,4.233403,0.804375,0.328525,0.0,0.0
0,43,medium,4,0,8473.0,3484.0,8419.0,4917.0,307.0,107.0,...,30.0,0.0,0.0,0.0,0.0,4.233403,0.804375,0.328525,0.0,0.0
0,43,medium,5,0,8473.0,3484.0,8419.0,4917.0,307.0,107.0,...,30.0,0.0,0.0,0.0,0.0,4.233403,0.804375,0.328525,0.0,0.0


You can also leave out `intervention_package_keys` (or set it to `None`), in which case the detection time scan will be run for all packages.

In [21]:
cf.scan_detection_range(n_steps = 3, verbose = True)

CPU ✓ (0.31<0.90), memory ✓ (0.61<0.90): starting process 0 after 1 tries
Running counterfactual (seed 42) for intervention package "medium" with detection time 8.
Running counterfactual (seed 42) for intervention package "medium" with detection time 16.

CPU ✓ (0.49<0.90), memory ✓ (0.62<0.90): starting process 1 after 1 triesRunning counterfactual (seed 43) for intervention package "medium" with detection time 5.
CPU ✓ (0.52<0.90), memory ✓ (0.63<0.90): starting process 2 after 1 triesRunning counterfactual (seed 44) for intervention package "medium" with detection time 4.

Running counterfactual (seed 43) for intervention package "medium" with detection time 15.
Running counterfactual (seed 42) for intervention package "medium" with detection time 23.
CPU ✓ (0.61<0.90), memory ✓ (0.64<0.90): starting process 3 after 1 tries
Running counterfactual (seed 45) for intervention package "medium" with detection time 4.
Running counterfactual (seed 44) for intervention package "medium" with

In [22]:
i = 0
cf.sims[i].sims_counterfactual

#0: 'medium': {2: Sim(<no label>; 2020-03-01 to 2020-06-09; pop: 5000
behaviour_module; epi: 8702⚙, 28☠), 4: Sim(<no label>; 2020-03-01 to 2020-06-09;
pop: 5000 behaviour_module; epi: 8702⚙, 28☠), 8: Sim(<no label>; 2020-03-01 to
2020-06-09; pop: 5000 behaviour_module; epi: 8702⚙, 28☠), 16: Sim(<no label>;
2020-03-01 to 2020-06-09; pop: 5000 behaviour_module; epi: 8702⚙, 28☠), 23:
Sim(<no label>; 2020-03-01 to 2020-06-09; pop: 5000 behaviour_module; epi:
8702⚙, 28☠)}
#1: 'strong': {8: Sim(<no label>; 2020-03-01 to 2020-06-09; pop: 5000
behaviour_module; epi: 7908⚙, 32☠), 16: Sim(<no label>; 2020-03-01 to
2020-06-09; pop: 5000 behaviour_module; epi: 7908⚙, 32☠), 23: Sim(<no label>;
2020-03-01 to 2020-06-09; pop: 5000 behaviour_module; epi: 7908⚙, 32☠)}




In [23]:
cf.get_summaries_df()

Unnamed: 0,seed,intervention_package,delay,pathogen,cum_infections,cum_reinfections,cum_infectious,cum_symptomatic,cum_severe,cum_critical,...,doubling_time,test_yield,rel_test_yield,frac_vaccinated,pop_imm,pop_nabs,pop_protection,pop_symp_protection,new_diagnoses_custom,cum_diagnoses_custom
0,42,baseline,0,0,9284.0,4289.0,9169.0,5309.0,328.0,107.0,...,30.0,0.0,0.0,0.0,0.0,4.397622,0.819455,0.329671,0.0,0.0
0,42,medium,2,0,8702.0,3715.0,8611.0,5033.0,321.0,97.0,...,30.0,0.0,0.0,0.0,0.0,4.256794,0.805765,0.328463,0.0,0.0
0,42,medium,4,0,8702.0,3715.0,8611.0,5033.0,321.0,97.0,...,30.0,0.0,0.0,0.0,0.0,4.256794,0.805765,0.328463,0.0,0.0
0,42,medium,8,0,8702.0,3715.0,8611.0,5033.0,321.0,97.0,...,30.0,0.0,0.0,0.0,0.0,4.256794,0.805765,0.328463,0.0,0.0
0,42,medium,16,0,8702.0,3715.0,8611.0,5033.0,321.0,97.0,...,30.0,0.0,0.0,0.0,0.0,4.256794,0.805765,0.328463,0.0,0.0
0,42,medium,23,0,8702.0,3715.0,8611.0,5033.0,321.0,97.0,...,30.0,0.0,0.0,0.0,0.0,4.256794,0.805765,0.328463,0.0,0.0
0,42,strong,8,0,7908.0,2922.0,7881.0,4675.0,314.0,95.0,...,30.0,0.0,0.0,0.0,0.0,3.880356,0.78689,0.327339,0.0,0.0
0,42,strong,16,0,7908.0,2922.0,7881.0,4675.0,314.0,95.0,...,30.0,0.0,0.0,0.0,0.0,3.880356,0.78689,0.327339,0.0,0.0
0,42,strong,23,0,7908.0,2922.0,7881.0,4675.0,314.0,95.0,...,30.0,0.0,0.0,0.0,0.0,3.880356,0.78689,0.327339,0.0,0.0
0,43,baseline,0,0,9424.0,4428.0,9282.0,5327.0,338.0,114.0,...,30.0,0.0,0.0,0.0,0.0,4.634543,0.826632,0.33023,0.0,0.0


If you run a lot of simulations, it is advisable to use the argument `store_sims = False`. In this case, only the summaries are stored to reduce the overall memory demand.

In [None]:
cf.scan_detection_range(n_steps = 10, store_sims = False, verbose = True)

In [None]:
cf.get_summaries_df()