In [1]:
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np 

## Grid approximation

You have just been hired as a data analyst at your government's Department of Health. The cabinet is considering the purchase of a brand-new drug against a deadly and contagious virus. There are some doubts, however, regarding how effective the new drug is against the virus. You have been tasked with estimating the drug's efficacy rate, i.e. the percentage of patients cured by the drug.

An experiment was quickly set up in which 10 sick patients have been treated with the drug. Once you know how many of them are cured, you can use the binomial distribution with a cured patient being a "success" and the efficacy rate being the "probability of success". While you are waiting for the experiment's results, you decide to prepare the parameter grid.

In [4]:
# Create cured patients array from 1 to 10
num_patients_cured = np.arange(0, 11, 1)

# Create efficacy rate array from 0 to 1 by 0.01
efficacy_rate = np.arange(0, 1.01, 0.01)

# Combine the two arrays in one DataFrame
df = pd.DataFrame([(x, y) for x in num_patients_cured for y in efficacy_rate])

# Name the columns
df.columns = ["num_patients_cured", "efficacy_rate"]

In [5]:
df

Unnamed: 0,num_patients_cured,efficacy_rate
0,0,0.00
1,0,0.01
2,0,0.02
3,0,0.03
4,0,0.04
...,...,...
1106,10,0.96
1107,10,0.97
1108,10,0.98
1109,10,0.99


## Grid approximation without prior knowledge
- According to the experiment's outcomes, out of 10 sick patients treated with the drug, 9 have been cured. What can you say about the drug's efficacy rate based on such a small sample? Assume you have no prior knowledge whatsoever regarding how good the drug is.

In [8]:
from scipy.stats import uniform, binom
# Calculate the prior efficacy rate and the likelihood
df["prior"] = uniform.pdf(df["efficacy_rate"])
df["likelihood"] = binom.pmf(df["num_patients_cured"], 10, df["efficacy_rate"])

In [9]:
df

Unnamed: 0,num_patients_cured,efficacy_rate,prior,likelihood
0,0,0.00,1.0,1.000000
1,0,0.01,1.0,0.904382
2,0,0.02,1.0,0.817073
3,0,0.03,1.0,0.737424
4,0,0.04,1.0,0.664833
...,...,...,...,...
1106,10,0.96,1.0,0.664833
1107,10,0.97,1.0,0.737424
1108,10,0.98,1.0,0.817073
1109,10,0.99,1.0,0.904382


In [13]:
df["likelihood"].describe()

count    1111.000000
mean        0.090909
std         0.130410
min         0.000000
25%         0.000177
50%         0.022890
75%         0.164156
max         1.000000
Name: likelihood, dtype: float64