Skip to content

RaphaelRe/BayesModelCOVID

Repository files navigation

DOI

A Bayesian hierarchical approach to account for reporting uncertainty, variants of concern and vaccination coverage when estimating the effects of non-pharmaceutical interventions on the spread of infectious disease

Online material

This repo contains 4 directories:

  • application
  • data
  • model
  • simulation_study

Results from the paper can be reproduced using reproduce_results.sh. Please note, that a long runtime is required. All R packages and, for python, a virtual environment virtualenv with all installed dependencies is assumed to exist. Results from the sampled Markov chain can be found at https://doi.org/10.6084/m9.figshare.24183246.v1.

application

Contains the files to fit the model to the real data:

  • fit_model_europe.py: Script to run the model on 20 European countries, i.e. main results of the application
  • sensitivity_analysis.py: Script zu run the sensitivity analysis (8 settings)
  • prepare_data.R: Script to build the main data frame for the application. Downloads data, defines NPIs and brings everything together
  • calculate_weighted_ifr.R: Script which was used to calculate the IFR (also for the sensitivity analysis)
  • estimate_new_mutants.R: Script which was used to estimate the prevalence of the variants of concern
  • plot_results.R: Includes all code to plot the main results from the manuscript and suppporting information. It also conaints a part to plot the prior of the first infections at t=1
  • plot_sensitivity.R: Script to plot the results of the sensitivity analysis
  • plot_time_series.R: Script to plot case, death and hispital data along with virus variants, vaccination and NPIs; Plots for NPIs alone
  • subdirectories for storage

Please note that the data sources may have changed since the data preparation. Therefore, the download links may no longer work or the data structure of these may have changed. The creation of the final data set is therefore probably no longer possible

data

Folder contains the used data. These are:

  • All time-shifting distributions required for the model (gamma_generation_time.csv, Xi_C_incubation_period.csv, Xi_D_symptoms_to_death_weekdays.csv, XiH_all.csv, Xi_R_reporting_times_weekdays_estimated_lgl.csv)
  • The IFRs (subdirectory ifr) as generated by calculate_weighted_ifr.R for main results and sensitivity
  • Prevalence of variants of concern (variants_of_concern.csv) as calculated by estimate_new_mutants.R
  • Data for the application real_data.csv (generated by prepare_data.R)
  • json-files which contain info about good proposal sds for the sampler
  • Subfolder simulated_data with a few datasets generated bysimulate_data_dynamic.Rand simulate_data_dynamic_diffusion.R

model

All python modules of the custom MCMC sampler which is used in the application and simulation study

  • MCMC.py: Main class for sampling
  • Parameter.py: Contains all parameter classes (standard parameters, parameter vectors, fixed parameters)
  • LatentVariables.py: Implements the Latent variable class i.e. the latent number of infections
  • Prediction.py: Class to make posterior predictions within considered range possible outside
  • update.py: A module containing schemes for the update, distribution ratios, the likelihood, etc
  • basics.py: A module which implements many basic and helper functions

simulation_study

Contains all files for the simulation study:

  • fit_model_simulation_study.py: Main fail to fit the model to the simulated data
  • fit_model_simulation_study.py: Main fail to fit the model to the simulated data with stratification and diffusion which uses the aggregated form of the data
  • simulate_data_dynamic.R: Script to simulate the data for the simulation study
  • simulate_data_dynamic_diffusion.R: Script to simulate the data with stratification and diffusion
  • plot_results_standard.R: Script to plot the results for the standard scenario
  • plot_results_diffusion.R: Script to plot the results of the data with diffusion
  • Subdirectories for the storage

Run the model on data

To fit the models run of the files

  • fit_model_europe.py: Is used to get the main results of the paper
  • fit_model_simulation_study.py: Is used to get the results from the simulation study
  • sensitivity_analysis.py: Sensitivity analysis from the supporting information on real data
  • fit_model_simulation_study_diffusion.py: Fits the model (which assumes aggregated data) on stratified data

The calculation time and resources vary depending on the length of the chains, number of sampled chains and used cores. It is recommended to run short chains (e.g. 1000 iterations) to get an approximate sampling time. The code uses multiple cores. Please make sure to set an appropriate number. Despite the parallelization , the calculation time is rather high as the sampling is mainly done in python. The complexity of the model requires a sufficient number of iterations with thinning due to high autocorrelations. The time horizon is probably, depending on the machine and number of cores, 1 to several days. Also note that the written results require some free space: ~3GB for the application and ~137GB for the full simulation study (both scripts).