# Vaccine RCT Examples

Victor Chernozhukov. This notebook contains some RCT examples that I will be using for teaching.

  
# Polio RCT

One of the earliest randomized experiments were the Polio vaccination trias conducted by the Public Health Service in 1954.  The question was whether Salk vaccine prevented polio.  Children in the study were randomly assigned either a treatment (polio vaccine shot) or a placebo (saline solution shot), without knowing which one they received. The doctors in the study, making the diagnosis, did not know whether a child received a vaccine or not. In other words, the trial was a double-blind, randomized control trial.  The trial had to be large, because the rate at which Polio occured in the population was 50 per 100,000.  The treatment group saw 33 polio cases per 200,745; the control group saw 115 cases per 201,229. The estimated avearage treatment effect is about
$$
-40
$$
with the 95% confidence band (based on approximate normality of the two sample means and their differences) is:
$$[-52, -28].$$
The confidence suggests that the Polio vaccine **caused** the reduction in the risk of polio.

The interesting thing here is that we don't need the underlying individual data to evaluate the effectivess of the vaccine. This is because the outcomes are Bernoulli random variabales, and we have enough information to compute the estimate of ATE as well as the confidence intervals.


We also compute Vaccine Efficacy metric, which (I googled  [CDC](https://www.cdc.gov/csels/dsepd/ss1978/lesson3/section6.html)) refers to the following measure:
$$
\operatorname{VE} = \frac{\text{Risk for Unvaccianted - Risk for Vaccinated}}{\text{Risk for Unvaccianted}}.
$$
It describes the relative reduction in risk caused by vaccination.


It is staighborward to get VE estimate by just plugging-in the numbers, but how do we get the approximate variance estimate. I am too lazy to do calculatios for the delta methods, so I will just use a simulation (a form of approximate bootstrap) to obtain the confidence intervals.



In [None]:
import numpy as np
import pandas as pd
from sklearn.neighbors import KernelDensity
import matplotlib
import matplotlib.pyplot as plt
from scipy.stats import norm
from sklearn.neighbors import KernelDensity
from sklearn.utils.fixes import parse_version

num_treated = 200745
num_controls = 201229
outcome_treated = 33/num_treated
outcome_controls =115/num_controls

print(f"Incidence per 100000 among treated: {outcome_treated*100000}")
print(f"Incidence per 100000 among controlled: {outcome_controlled*100000}")

treatment_effect = 100000*(outcome_treated-outcome_controls)

print(f"Estimate TE of occurances per 100,000: {treatment_effect}")

var_treatment_effect = (100000**2)*(outcome_treated*(1-outcome_treated)/num_treated 
                                    +  outcome_controls*(1-outcome_controls)/num_controls)
std_treatment_effect = np.sqrt(var_treatment_effect
print(f"Standard deviation for ATE: {std_treatment_effect}")
# here we are using the fact that outcomes are Bernoulli 

CI_delta = [treatment_effect - 1.96*std_treatment_effect, 
            treatment_effect + 1.96*std_treatment_effect]

print(f"95 % confidence interval is [{CI_delta[0])}, {CI_delta[1]}]")

# Here we calculate the overall effectiveness of the vaccine and construct confidence intervals for it

NV =  200745;
NU =  201229;
RV = 33/NV;
RU = 115/NU;
VE = (RU - RV)/RU;
print("Overall VE is "+ str(VE) )

# this recovers the number in the table.

# we set up a simulation example.

# calulate variance of risk estimates:

Var_RV = RV*(1-RV)/NV
Var_RU = RU*(1-RU)/NU

# set-up MC draws:

B = 10000
RVs = RV  + np.random.normal(0, 1, B)*(Var_RV)**.5
RUs = RU  + np.random.normal(0, 1, B)*(Var_RU)**.5
VEs= (RUs - RVs)/RUs


CI_VE_L = np.quantile(VEs, .025)
CI_VE_U = np.quantile(VEs, .975)

print("95 % confidence interval is [" + str(CI_VE_L), ",", 
            str(CI_VE_U), "]"   )

X= VEs[:, np.newaxis]
X_plot = np.linspace(0, 1, 1000)[:, np.newaxis]
kde = KernelDensity(kernel='gaussian', bandwidth=0.02).fit(X)
log_dens = kde.score_samples(X_plot)
plt.fill_between(X_plot[:, 0], np.exp(log_dens))

# Pfizer/BNTX Covid-19 RCT

Here is a link to the FDA [briefing](https://www.fda.gov/media/144245/download) and an interesting [discussion](
https://garycornell.com/2020/12/09/statistics-in-the-pfizer-data-how-good-is-the-vaccine/?fbclid=IwAR282lS0Vl3tWmicQDDhIJAQCMO8NIsCXyWbUWwTtPuKcnuJ2v0VWXRDQac), as well as data.

Pfizer/BNTX is the first vaccine approved for emergency use to reduce the risk of Covid-19 decease. Volunteers were randomly assigned to receive either a treatment (2-dose vaccination) or a placebo, without knowing which they recieved. The doctors making the diagnoses did not know now whether a given volunteer received a vaccination or not. The results of the study are given in the following table ![](https://lh6.googleusercontent.com/oiO6gYom1UZyrOhgpFx2iq8ike979u3805JHiVygP-Efh1Yaz2ttyPcgWKlT1AqHDM4v46th3EPIkOvRLyXA0fNUloPL-mL9eOFmSAzfbNOHyCZSQ0DyzMhcFUtQuZ520R5Qd2lj):

Here we see both the overall effects and the effects by age group. The confidence intervals for the averal ATE are tight and suggest high effectivness of the vaccine. The confidence intervals for the age group 65-75 are much wider.  We could group 65-75 and >75 groups to evaluate the effectiveness of the vaccine and also narrow down the width of the confidence band. 

In this case, the reported results are for vaccine effectiveness. We use the same approach as above.



In the code cell below  we calculate the overall effectiveness of the vaccie and construct confidence intervals for it.

In [None]:

NV =  19965;
NU =  20172;
RV = 9/NV;
RU = 169/NU;
VE = (RU - RV)/RU;
print("Overall VE is "+ str(VE))
Var_RV = RV*(1-RV)/NV
Var_RU = RU*(1-RU)/NU

# MC draws
B = 10000
RVs = RV  + np.random.normal(0, 1, B)*(Var_RV)**.5
RUs = RU  + np.random.normal(0, 1, B)*(Var_RU)**.5
VEs= (RUs - RVs)/RUs
CI_VE_L = np.quantile(VEs, .025)
CI_VE_U = np.quantile(VEs, .975)
print("95 % confidence interval is [" + str(CI_VE_L), ",", 
            str(CI_VE_U), "]"   )

X= VEs[:, np.newaxis]
X_plot = np.linspace(0, 1, 1000)[:, np.newaxis]
kde = KernelDensity(kernel='gaussian', bandwidth=0.02).fit(X)
log_dens = kde.score_samples(X_plot)
plt.fill_between(X_plot[:, 0], np.exp(log_dens))

In the code cell below  we calculate the effectiveness of the vaccine for the two groups that are 65 or older

In [None]:
NV =  3239+805;
NU =  3255+812;
RV = 1/NV;
RU = (14+5)/NU;
VE = (RU - RV)/RU;
print("Overall VE is "+ str(VE))
Var_RV = RV*(1-RV)/NV
Var_RU = RU*(1-RU)/NU

# MC draws:
B = 10000
RVs = RV  + np.random.normal(0, 1, B)*(Var_RV)**.5
RUs = RU  + np.random.normal(0, 1, B)*(Var_RU)**.5
VEs= (RUs - RVs)/RUs


CI_VE_L = np.quantile(VEs, .025)
CI_VE_U = np.quantile(VEs, .975)

print("95 % confidence interval is [" + str(CI_VE_L), ",", 
            str(CI_VE_U), "]"   )


CI_VE_L = np.quantile(VEs, .05)


print("95 % confidence interval is [" + str(CI_VE_L), ",", 
            str(1), "]"   )


from sklearn.neighbors import KernelDensity
import matplotlib
import matplotlib.pyplot as plt
from scipy.stats import norm
from sklearn.neighbors import KernelDensity
from sklearn.utils.fixes import parse_version


# instantiate and fit the KDE model
X= VEs[:, np.newaxis]
X_plot = np.linspace(0, 1, 1000)[:, np.newaxis]
kde = KernelDensity(kernel='gaussian', bandwidth=0.02).fit(X)
log_dens = kde.score_samples(X_plot)
plt.fill_between(X_plot[:, 0], np.exp(log_dens))

In [None]:
NV =  3239+805;
NU =  3255+812;
RV = 1/NV;
RU = (14+5)/NU;
VE = (RU - RV)/RU;

print("Overall VE is "+ str(VE))

B = 10000 #number of simulation draw
RVs = np.random.binomial(NV, RV, B) 
RUs = np.random.binomial(NU, RU, B)  
VEs= (RUs - RVs)/RUs

CI_VE_L = np.quantile(VEs, .025)
CI_VE_U = np.quantile(VEs, .975)
print(f"95% confidence interval is [{CI_VE_L},{CI_VE_U}]")
CI_VE_L = np.quantile(VEs, .05)
print(f"95 % confidence interval is [{CI_VE_L}, 1]")
X= VEs[:, np.newaxis]
X_plot = np.linspace(0, 1.1, 1000)[:, np.newaxis]
kde = KernelDensity(kernel='gaussian', bandwidth=0.02).fit(X)
log_dens = kde.score_samples(X_plot)
plt.fill_between(X_plot[:, 0], np.exp(log_dens))