Data and code for "The global landscape of smallpox vaccination history and implications for current and future orthopoxvirus susceptibility: a modelling study."
This repository provides the data and source code for the following study: Juliana C. Taube, Eva C. Rest, James O. Lloyd-Smith, Shweta Bansal. "The global landscape of smallpox vaccination history and implications for current and future orthopoxvirus susceptibility: a modelling study." The Lancet Infectious Diseases. https://doi.org/10.1016/S1473-3099(22)00664-8
To rerun the analysis and reproduce the figures, start by opening mpx_landscape.Rproj. You will need to download GADM shapefile data version 4.0 to reproduce the maps and US Census data to reproduce fine US estimates. From here, run the make_figures.R script to reproduce the main figures in the text and many of the supplementary figures. To reproduce Figure 3, run the global_susceptibility_profiles.R script. Brief descriptions of the other scripts can be found below.
We hope for this to be a living database, with updated historical and future data on vaccination, and invite the global health community to contribute. (We'll add details on how you can contribute soon. In the meantime, please reach out at shweta.bansal@georgetown.edu with any data you would like to share).
latest_cessation_coverage_estimates.csvcontains smallpox vaccination coverage and cessation estimates, lower and upper bounds, and sources for values used in our analyses. We anticipate this resource will change as more historical data become available. The grading scheme for vaccination cessation and coverage data is as follows:- A. Good evidence (e.g., known scar/serum survey; known cessation date from literature)
- B. Some evidence (e.g., vaccination coverage estimate mentioned in text or model without known scar/serum survey data; cessation date range)
- C. No direct evidence, use country-specific assumptions (e.g., average of coverage estimates from neighboring countries; only one known bound (high or low) for cessation date)
- D. Default values
Pre-run analyses and estimates that can be used by others without re-running the code.
uncertainty-differences.csvcontains mean percentage vaccinated estimates and standard deviations from our parametric bootstrapping uncertainty analysisvaxxed-us-pumas.csvcontains percentage vaccinated estimates at the PUMAs level in the U.S.vaxxed-world-by-country.csvcontains percentage vaccinated estimates at the country levelvaxxed-world.csvcontains percentage vaccinated estimates at the admin-1 level
Inputs for data analysis can be found in this folder, outputs from running the code will populate here. some intermediate data inputs called in the code are provided, and others can be reproduced.
age_dist_absolute_differences_stdevs.csvcontains country-specific standard deviations for adding age distribution uncertainty based on a comparison of 2010 GPW data and 2020 UN age distribution databootstrapped_estimates_country_age_5000_mu_sig.csvcontains results of parametric bootstrapping analysis with mean and standard deviation of vaccination percentage for each age group in each countrycessation_coverage_estimates.csv, version of cessation and coverage estimates that can be used to reproduce results in our paper, contains upper and lower bounds of vaccination coverage and cessation estimates for uncertainty analysis. The latest version of these estimates are provided inlatest_cessation_coverage_estimates.csvcleaned_gadm_data_no_shapefile.csv, GADM data without shapefiles, precursor to mappingcleaned_gpw_age_data_props.csv, cleaned GPW admin-1 level age distribution datagpw_to_gadm_country_join.csvandone_gpw_to_multiple_gadm.csvallow for correct joining of GPW and GADM datanational_age_dist.csvandworld_age_dist.csvcontain age distribution data if all admin-1s within a country or across the world adopt the national or global average, respectivelynatural_immunity.csvcontains case count data from Fenner et al.pock_survey_coverage.csvcontains pock mark survey datapolio95_3dose_states.csvcontains coverage estimates for 3 dose polio vaccination in the U.S. at the state level, used to add spatial heterogeneity to national smallpox vaccination coverage in U.S.state_fips.csvhelps converting PUMAs level estimates in the U.S. to the state levelWPP2022_POP_F02_1_POPULATION_5-YEAR_AGE_GROUPS_BOTH_SEXES copy.csvis a converted version of UN age distribution data in 5-year age groups with both sexes from: https://population.un.org/wpp/Download/Standard/Population/
The files below are quite large and will be added in the future. To access immediately, you can download from the respective websites.
extracted_pums_2019data_age_birthplace_weights_region.csvfrom https://www.census.gov/programs-surveys/acs/microdata/access/2019.html contains PUMS 2019 data used in U.S. analysesgadm404-levels.gpkgfrom https://gadm.org/data.html contains the GADM shape files data version 4.0gpw-v4-admin-unit-center-points-population-estimates-rev11_global_csv 2/from https://sedac.ciesin.columbia.edu/data/set/gpw-v4-basic-demographic-characteristics-rev11 contains GPW data
Scripts to prepare demography data, join mapping data, calculate the proportion of a population vaccinated, calculate world and national age distributions, and calculate the landscape of orthopoxvirus susceptibility can be found in this folder.
admin1_avg_age.rcalculates the average age for each admin-1, maps it, and calculates the global average ageage_dist_comparison.rcompares aggregated GPW 2010 age distribution data with 2020 UN country level age distribution datacalc_immunity_split.rcontains the functions to calculate the percent vaccinated at the admin-1 level across all age groups or at the country level for each age groupcalc_immunity_us.rcontains the functions to calculate the percent vaccinated at the PUMAs level in the U.S.draw_maps.rcontains the functions to map global admin-1 level dataglobal_susceptibility_profile.rcalculates and plots susceptibility profiles for each country, and produces Figure 3 with four case studies with various vaccine effectiveness and waning ratesgpw_admin2.rcleans admin-2 level and admin-3 level GPW data for some spatial analysesload_files_for_run.rloads all files and functions to make figuresmake_figures.rruns the analysis and creates most figures in the papermonkeypox_world_data_cleaning.routput is used to make figures, this cleans and prepares for joining the demographic (GPW data) and shapefiles (GADM data)natural_immunity.ranalyzes and plots natural immunity and pock mark survey datascar_survey_coverage_calcs.rcontains functions to calculate the proportion of an age group eligible and subsequently vaccinated based on cessation dates and vaccination coverage dataspatial_analysiscalculates Moran's I and tests associations between population size and density and current vaccination historyuncertainty_analysis.rruns the parametric bootstrapping uncertainty analysis (note this code takes longer to run)us_gpw_data.rcleans and aggregates GPW data on U.S. age distributions not provided in the main GPW datasetworld_national_age_dists.rcalculates average global and national age distributions for use in counterfactual analyses
When the code is run, this folder will contain the figure outputs shown in the manuscript.