A Bayesian hierarchical approach to account for reporting uncertainty, variants of concern and vaccination coverage when estimating the effects of non-pharmaceutical interventions on the spread of infectious disease
Online material
This repo contains 4 directories:
application
data
model
simulation_study
Results from the paper can be reproduced using reproduce_results.sh
. Please note, that a long runtime is required. All R packages and, for python, a virtual environment virtualenv
with all installed dependencies is assumed to exist.
Results from the sampled Markov chain can be found at https://doi.org/10.6084/m9.figshare.24183246.v1.
Contains the files to fit the model to the real data:
fit_model_europe.py
: Script to run the model on 20 European countries, i.e. main results of the applicationsensitivity_analysis.py
: Script zu run the sensitivity analysis (8 settings)prepare_data.R
: Script to build the main data frame for the application. Downloads data, defines NPIs and brings everything togethercalculate_weighted_ifr.R
: Script which was used to calculate the IFR (also for the sensitivity analysis)estimate_new_mutants.R
: Script which was used to estimate the prevalence of the variants of concernplot_results.R
: Includes all code to plot the main results from the manuscript and suppporting information. It also conaints a part to plot the prior of the first infections at t=1plot_sensitivity.R
: Script to plot the results of the sensitivity analysisplot_time_series.R
: Script to plot case, death and hispital data along with virus variants, vaccination and NPIs; Plots for NPIs alone- subdirectories for storage
Please note that the data sources may have changed since the data preparation. Therefore, the download links may no longer work or the data structure of these may have changed. The creation of the final data set is therefore probably no longer possible
Folder contains the used data. These are:
- All time-shifting distributions required for the model (
gamma_generation_time.csv
,Xi_C_incubation_period.csv
,Xi_D_symptoms_to_death_weekdays.csv
,XiH_all.csv
,Xi_R_reporting_times_weekdays_estimated_lgl.csv
) - The IFRs (subdirectory
ifr
) as generated bycalculate_weighted_ifr.R
for main results and sensitivity - Prevalence of variants of concern (
variants_of_concern.csv
) as calculated byestimate_new_mutants.R
- Data for the application
real_data.csv
(generated byprepare_data.R
) - json-files which contain info about good proposal sds for the sampler
- Subfolder
simulated_data
with a few datasets generated bysimulate_data_dynamic.R
andsimulate_data_dynamic_diffusion.R
All python modules of the custom MCMC sampler which is used in the application and simulation study
MCMC.py
: Main class for samplingParameter.py
: Contains all parameter classes (standard parameters, parameter vectors, fixed parameters)LatentVariables.py
: Implements the Latent variable class i.e. the latent number of infectionsPrediction.py
: Class to make posterior predictions within considered range possible outsideupdate.py
: A module containing schemes for the update, distribution ratios, the likelihood, etcbasics.py
: A module which implements many basic and helper functions
Contains all files for the simulation study:
fit_model_simulation_study.py
: Main fail to fit the model to the simulated datafit_model_simulation_study.py
: Main fail to fit the model to the simulated data with stratification and diffusion which uses the aggregated form of the datasimulate_data_dynamic.R
: Script to simulate the data for the simulation studysimulate_data_dynamic_diffusion.R
: Script to simulate the data with stratification and diffusionplot_results_standard.R
: Script to plot the results for the standard scenarioplot_results_diffusion.R
: Script to plot the results of the data with diffusion- Subdirectories for the storage
To fit the models run of the files
fit_model_europe.py
: Is used to get the main results of the paperfit_model_simulation_study.py
: Is used to get the results from the simulation studysensitivity_analysis.py
: Sensitivity analysis from the supporting information on real datafit_model_simulation_study_diffusion.py
: Fits the model (which assumes aggregated data) on stratified data
The calculation time and resources vary depending on the length of the chains, number of sampled chains and used cores. It is recommended to run short chains (e.g. 1000 iterations) to get an approximate sampling time. The code uses multiple cores. Please make sure to set an appropriate number. Despite the parallelization , the calculation time is rather high as the sampling is mainly done in python. The complexity of the model requires a sufficient number of iterations with thinning due to high autocorrelations. The time horizon is probably, depending on the machine and number of cores, 1 to several days. Also note that the written results require some free space: ~3GB for the application and ~137GB for the full simulation study (both scripts).