
In epidemiology, the basic reproduction number (sometimes called basic reproductive ratio, or incorrectly basic reproductive rate, and denoted R0, pronounced R nought or R zero) of an infection can be thought of as the expected number of cases directly generated by one case in a population where all individuals are susceptible to infection. The definition describes the state where no other individuals are infected or immunized (naturally or through vaccination). Some definitions, such as that of the Australian Department of Health, add absence of "any deliberate intervention in disease transmission". The basic reproduction number is not to be confused with the effective reproduction number R, which is the number of cases generated in the current state of a population, which does not have to be the uninfected state. By definition, R0 cannot be modified through vaccination campaigns. Also, it is important to note that R0 is a dimensionless number and not a rate, which would have units of time like doubling time.


# Set up the notebook

In [1]:

%pprint

Pretty printing has been turned OFF


In [2]:

import sys

# Insert at 1, 0 is the script path (or '' in REPL)
sys.path.insert(1, '../py')

import numpy as np
import pandas as pd
import re
from stats_scraping_utils import StatsScrapingUtilities
from storage import Storage

s = Storage()
ssu = StatsScrapingUtilities(s=s)


## Build the Basic Reproduction Number Dataset

In [3]:

ev_explanation_str = 'Basic Reproduction Number'
url = 'https://en.wikipedia.org/wiki/Basic_reproduction_number'
print(f'The {ev_explanation_str} data is from {url}.')
tables_list = ssu.get_page_tables(url)
R0s_df = tables_list[0].copy()
print(R0s_df.columns.tolist())
display(R0s_df.sample(7).T)

The Basic Reproduction Number data is from https://en.wikipedia.org/wiki/Basic_reproduction_number.
[(0, (23, 4)), (1, (15, 2)), (3, (5, 2)), (2, (3, 2)), (4, (2, 2))]
['Disease', 'Transmission', 'R0', 'HIT[a]']


Unnamed: 0,8,2,6,22,1,13,20
Disease,Smallpox,Mumps,Pertussis,MERS,Chickenpox (varicella),Diphtheria,Andes hantavirus
Transmission,Respiratory droplets,Respiratory droplets,Respiratory droplets,Respiratory droplets,Aerosol,Saliva,Respiratory droplets and body fluids
R0,3.5–6.0[39],10–12[31],5.5[37],0.5 (0.3–0.8)[53],10–12[30],2.6 (1.7–4.3)[45],1.2 (0.8–1.6)[51]
HIT[a],71–83%,90–92%,82%,0%[c],90–92%,62% (41–77%),16% (0–36%)[c]


In [4]:

# Cast and split columns
R0s_df.columns = ['disease_name', 'transmitted_by', 'R0', 'HIT']
dash_regex = re.compile('[–-]')
for i, cn in enumerate(['R0_low', 'R0_high']):
    R0s_df[cn] = np.nan
    mask_series = R0s_df.R0.map(lambda x: bool(dash_regex.search(str(x))))
    R0s_df.loc[~mask_series, cn] = R0s_df[~mask_series].R0.map(lambda x: str(x).strip().split('[')[0])
    R0s_df.loc[mask_series, cn] = R0s_df[mask_series].R0.map(lambda x: re.split(r'[^0-9\.]+', dash_regex.split(str(x))[i], 0)[i-1])
    R0s_df[cn] = pd.to_numeric(R0s_df[cn], errors='coerce', downcast='float')
display(R0s_df.sample(7).T)

Unnamed: 0,0,16,21,2,18,9,4
disease_name,Measles,Influenza (1918 pandemic strain),Nipah virus,Mumps,Influenza (2009 pandemic strain),COVID-19 (Alpha variant),Rubella
transmitted_by,Aerosol,Respiratory droplets,Body fluids,Respiratory droplets,Respiratory droplets,Respiratory droplets and aerosol,Respiratory droplets
R0,12–18[29][7],2[48],0.5[52],10–12[31],1.6 (1.3–2.0)[2],4–5[40][medical citation needed],6–7[b]
HIT,92–94%,50%,0%[c],90–92%,37% (25–51%),75–80%,83–86%
R0_low,12.0,2.0,0.5,10.0,1.3,4.0,6.0
R0_high,18.0,2.0,0.5,12.0,2.0,5.0,7.0


In [5]:

# Remove disease misspellings
R0s_df['short_disease_name'] = R0s_df.disease_name.map(lambda x: ssu.disease_name_dict.get(x, x))

In [6]:

# Assume all the strains are the same R0 as the original and add them in
mask_series = (R0s_df.short_disease_name == 'Seasonal Flu')
flu_df = R0s_df[mask_series]
flu_df.short_disease_name = '1968 Flu'
flu_df.disease_name = 'Hong Kong (1968–69) flu'
R0s_df = R0s_df.append(flu_df)
flu_df.short_disease_name = '1956 Flu'
flu_df.disease_name = 'Asian (1956–58) flu'
R0s_df = R0s_df.append(flu_df)
flu_df.short_disease_name = 'H5N1 Flu'
flu_df.disease_name = 'Influenza A virus subtype H5N1'
R0s_df = R0s_df.append(flu_df)

In [7]:

# https://watermark.silverchair.com/taac037.pdf
# The Omicron variant has an average basic reproduction number of 9.5
# and a range from 5.5 to 24
# (median 10 and interquartile range, IQR: 7.25, 11.88).
mask_series = (R0s_df.disease_name == 'COVID-19 (Omicron variant)')
R0s_df.loc[mask_series, 'R0_low'] = 7.25
R0s_df.loc[mask_series, 'R0_high'] = 11.88
R0s_df[mask_series].to_dict('records')[0]

{'disease_name': 'COVID-19 (Omicron variant)', 'transmitted_by': 'Respiratory droplets and aerosol', 'R0': '9.5[32]', 'HIT': '89%', 'R0_low': 7.25, 'R0_high': 11.880000114440918, 'short_disease_name': 'COVID-19 Omicron'}

In [8]:

# https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2889930/
# Our main findings are that the seroincidence of infections as estimated with the first method
# in all countries lies between 1% and 6% per annum with a peak in the adolescent age groups
# and a second lower peak in young adults. The incidence of infections as estimated by the second method
# lies slightly lower with ranges between 1% and 4% per annum.
# There is a remarkably good agreement of the results obtained with the two methods.
# The basic reproduction numbers are similar across countries at around 5.5.
mask_series = (R0s_df.disease_name == 'Pertussis')
R0s_df.loc[mask_series, 'R0_low'] = 1
R0s_df.loc[mask_series, 'R0_high'] = 6
R0s_df[mask_series].to_dict('records')[0]

{'disease_name': 'Pertussis', 'transmitted_by': 'Respiratory droplets', 'R0': '5.5[37]', 'HIT': '82%', 'R0_low': 1.0, 'R0_high': 6.0, 'short_disease_name': 'Pertussis'}

In [9]:

# https://www.npr.org/sections/goatsandsoda/2021/08/11/1026190062/covid-delta-variant-transmission-cdc-chickenpox
# For the delta variant, the R0 is now calculated at between six and seven
mask_series = (R0s_df.disease_name == 'COVID-19 (Delta variant)')
R0s_df.loc[mask_series, 'R0_low'] = 6
R0s_df.loc[mask_series, 'R0_high'] = 7
R0s_df[mask_series].to_dict('records')[0]

{'disease_name': 'COVID-19 (Delta variant)', 'transmitted_by': 'Respiratory droplets and aerosol', 'R0': '5.1[38]', 'HIT': '80%', 'R0_low': 6.0, 'R0_high': 7.0, 'short_disease_name': 'COVID-19 Delta'}

In [10]:

# https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7095078/
# We estimate that R for 1918 pandemic influenza was approximately 2–3 (Fig. 2).
# The median estimated R for 45 cities was 2 (interquartile range 1.7, 2.3),
# based on the first three weeks of each epidemic curve with
# greater than one excess P&I death per 100,000 population. 
mask_series = (R0s_df.disease_name == 'Influenza (1918 pandemic strain)')
R0s_df.loc[mask_series, 'R0_low'] = 2
R0s_df.loc[mask_series, 'R0_high'] = 3
R0s_df[mask_series].to_dict('records')[0]

{'disease_name': 'Influenza (1918 pandemic strain)', 'transmitted_by': 'Respiratory droplets', 'R0': '2[48]', 'HIT': '50%', 'R0_low': 2.0, 'R0_high': 3.0, 'short_disease_name': '1918 Flu'}

In [11]:

s.store_objects(R0s_df=R0s_df)

Pickling to C:\Users\daveb\OneDrive\Documents\GitHub\covid19\saves\pkl\R0s_df.pkl
