## Impact Evaluation (in Real Life)

Let us actually now try an understand how these things work in real life (at least the concepts we learned so far). The lecture is largely based from the book Impact Evaluation in Practice which is free to download from the World Bank website.

In [25]:
import pandas as pd
import numpy as np
import statsmodels.api as sm
import statsmodels.formula.api as smf
from patsy import dmatrices
from statsmodels.stats.power import TTestIndPower

df = pd.read_stata('evaluation.dta')
df

Unnamed: 0,locality_identifier,household_identifier,treatment_locality,promotion_locality,eligible,enrolled,enrolled_rp,poverty_index,round,health_expenditures,...,educ_hh,educ_sp,female_hh,indigenous,hhsize,dirtfloor,bathroom,land,hospital_distance,hospital
0,26.0,5.0,1.0,1.0,1.0,1.0,1.0,55.950542,0.0,15.185455,...,0.0,6.0,0.0,0.0,4.0,1,0,1,124.819966,0.0
1,26.0,5.0,1.0,1.0,1.0,1.0,1.0,55.950542,1.0,19.580902,...,0.0,6.0,0.0,0.0,4.0,1,0,1,124.819966,0.0
2,26.0,11.0,1.0,1.0,1.0,1.0,0.0,46.058731,0.0,13.076257,...,4.0,0.0,0.0,0.0,6.0,1,0,2,124.819966,0.0
3,26.0,11.0,1.0,1.0,1.0,1.0,0.0,46.058731,1.0,2.398854,...,4.0,0.0,0.0,0.0,6.0,1,0,2,124.819966,1.0
4,26.0,13.0,1.0,1.0,1.0,1.0,0.0,54.095825,1.0,0.000000,...,0.0,0.0,0.0,0.0,6.0,1,0,4,124.819966,1.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
19822,35.0,15738.0,0.0,0.0,0.0,0.0,0.0,59.737247,0.0,16.811539,...,0.0,2.0,0.0,1.0,7.0,0,1,2,162.748811,
19823,40.0,15769.0,1.0,1.0,0.0,0.0,0.0,62.055641,0.0,15.906003,...,5.0,2.0,0.0,1.0,5.0,1,1,1,114.763392,
19824,40.0,15769.0,1.0,1.0,0.0,0.0,0.0,62.055641,1.0,8.248152,...,5.0,2.0,0.0,1.0,5.0,1,1,1,114.763392,
19825,40.0,15778.0,1.0,1.0,0.0,0.0,0.0,62.828438,0.0,8.737772,...,3.0,0.0,0.0,1.0,9.0,1,1,4,114.763392,


Here, we loaded the evaluation.dta dataset. Description and details on the dataset will be found online

**TASK 1**. Under this scenario, you will estimate the effect of the program by comparing the change in outcomes over time for a group of households that enrolled in the program. Assume full compliance, meaning that all of the households eligible for the program enrolled in it. Compare the average health expenditures before (round = 0) and after the program (round = 1) for the group eligible households (eligible =1) in treatment communities (treatment_locality = 1)

In [10]:
# Filter for eligible households in treatment communities
eligible_treatment = df[(df['eligible'] == 1) & (df['treatment_locality'] == 1)]

# Calculate average health expenditures before (round=0) and after (round=1)
avg_before = eligible_treatment[eligible_treatment['round'] == 0]['health_expenditures'].mean()
avg_after = eligible_treatment[eligible_treatment['round'] == 1]['health_expenditures'].mean()

print(f"Average health expenditures before (round=0): {avg_before:.2f}")
print(f"Average health expenditures after (round=1): {avg_after:.2f}")

Average health expenditures before (round=0): 14.49
Average health expenditures after (round=1): 7.84


It turns out that when communities were being selected for inclusion in the health insurance pilot, there were many more eligible communities than could be covered with the available budget. The provincial authorities decided to run a lottery to select the communities that would participate in the insurance scheme in year 1, thus giving all communities a fair chance to start in the program first. Your data contains information on communities selected at random for participation in year 1, as well as on communities that would only enter the program in subsequent years.  The variable “treatment_locality” indicates treatment communities (treatment_locality = 1) and non-treatment or control communities (treatment_locality = 0). For this case, use both treatment and control communities in your analysis. The sample is structured as follows:

|                     | Treatment Communities | Control Communities |
|---------------------|-----------------------|---------------------|
| **Eligible**        | 5,929                 | 5,328               |
| **Ineligible**      | 3,990                 | 4,580               |

**TASK 2**. Compare baseline out of pocket health expenditures and other covariates between eligible households in treatment and control communities. Is the sample balanced on observables? Is this what you would expect and why or why not?

In [11]:

# Make sure some keys are the expected dtypes
df["round"] = df["round"].astype(int)
df["eligible"] = df["eligible"].astype(int)
df["treatment_locality"] = df["treatment_locality"].astype(int)

# -------------------------------------------------------------
# 0) Sanity checks: sample structure and means by cells
# -------------------------------------------------------------
print("\nEligible × Treatment counts (all rounds):")
print(pd.crosstab(df["eligible"], df["treatment_locality"]))

# -------------------------------------------------------------
# 1) BASELINE BALANCE (round==0) among the ELIGIBLE population
#    Compare treatment vs control on OOP and covariates.
#    In a lottery, we expect no systematic differences at baseline
#    (up to sampling noise). We'll show SMDs and clustered p-values.
# -------------------------------------------------------------
baseline = df.query("round == 0 and eligible == 1").copy()

# Choose a set of baseline variables to compare.
# Feel free to add more here (e.g., household size, head/spouse edu/age) if present.
cand_vars = [
    "health_expenditures",   # baseline OOP
    "poverty_index"
]
baseline_vars = [v for v in cand_vars if v in baseline.columns]

def two_group_summary(data, var, treat="treatment_locality"):
    """Return means by group, diff, standardized diff, and clustered p-value (cluster=locality_identifier)."""
    g0 = data.loc[data[treat]==0, var].mean()
    g1 = data.loc[data[treat]==1, var].mean()
    diff = g1 - g0

    # Std. mean diff (pooled SD)
    s0 = data.loc[data[treat]==0, var].std()
    s1 = data.loc[data[treat]==1, var].std()
    n0 = data.loc[data[treat]==0, var].notna().sum()
    n1 = data.loc[data[treat]==1, var].notna().sum()
    sp = np.sqrt(((n0-1)*s0**2 + (n1-1)*s1**2) / (n0+n1-2)) if n0+n1-2 > 0 else np.nan
    smd = diff / sp if sp>0 else np.nan

    # Cluster-robust p-value from OLS(var ~ treatment), clustering by locality_identifier
    model = smf.ols(f"{var} ~ {treat}", data=data).fit(
        cov_type="cluster",
        cov_kwds={"groups": data["locality_identifier"]}
    )
    pval = model.pvalues[treat]
    return pd.Series({"mean_ctrl": g0, "mean_treat": g1, "diff": diff, "smd": smd, "pval_cluster": pval})

balance_tbl = pd.concat(
    [two_group_summary(baseline, v) for v in baseline_vars],
    axis=1
).T.round(3)
print("\n[1] Baseline balance among ELIGIBLE (round==0)")
print(balance_tbl)

# Interpretation:
# - If the lottery worked as intended, smd values should be small (|SMD| < 0.1 is commonly viewed as very balanced)
#   and cluster-robust p-values should not show systematic differences.




Eligible × Treatment counts (all rounds):
treatment_locality     0     1
eligible                      
0                   4580  3990
1                   5328  5929

[1] Baseline balance among ELIGIBLE (round==0)
   mean_ctrl  mean_treat   diff    smd  pval_cluster
0     14.574      14.490 -0.084 -0.020         0.693
1     49.752      49.331 -0.421 -0.069         0.306


**TASK 3**. In the treatment period (round =1), compare the average out of pocket health expenditures for the eligible population in treatment and control communities. Is this the impact of the health insurance program on out-of-pocket health expenditures?

In [12]:
# -------------------------------------------------------------
# 2) SIMPLE MEANS in treatment period (round==1), ELIGIBLE only
#    (This is a raw ITT comparison, not yet controlling for anything.)
# -------------------------------------------------------------
t1 = df.query("round == 1 and eligible == 1").copy()
oop_means = t1.groupby("treatment_locality")["health_expenditures"].mean()
diff_means = oop_means.get(1, np.nan) - oop_means.get(0, np.nan)
print("\n[2] Round 1 (treatment period) average OOP among ELIGIBLE:")
print(oop_means)
print(f"Difference (Treat - Control): {diff_means:.3f}")

# Note:
# - This difference is the reduced-form ITT (intention-to-treat) contrast for eligible households.
# - It is *not* necessarily the treatment-on-the-treated effect because not all eligible may enroll.
# - We’ll estimate the ITT more formally via OLS with cluster-robust SE next.



[2] Round 1 (treatment period) average OOP among ELIGIBLE:
treatment_locality
0    17.980551
1     7.840179
Name: health_expenditures, dtype: float32
Difference (Treat - Control): -10.140


**TASK 4**. Now use an OLS regression to estimate of the effect of the program on out-of-pocket health expenditures: (i). Without controls, (ii) Including characteristics of the household head and spouse, (iii) Including baseline covariates


In [20]:
# ============================================================
# CLUSTER THINGS AND HOUSEKEEPING
# ============================================================

# Ensure key vars are ints (Stata-style 0/1 and round indicators)
for c in ["round", "eligible", "treatment_locality", "locality_identifier"]:
    if c in df.columns:
        df[c] = df[c].astype(int)

def fit_cluster_ols(formula: str, data: pd.DataFrame, cluster_col: str):
    """
    OLS with cluster-robust SEs, safely aligned to the exact rows used by the formula.
    - Uses patsy.dmatrices to drop NAs on *only* the variables in `formula`.
    - Then aligns the cluster vector to that same index (prevents length-mismatch errors).
    """
    y, X = dmatrices(formula, data=data, return_type="dataframe", NA_action="drop")
    groups = data.loc[y.index, cluster_col]
    res = sm.OLS(y, X).fit(cov_type="cluster", cov_kwds={"groups": groups})
    return res

# Convenience subsets
eligible_r1 = df.query("eligible == 1 and round == 1").copy()
ineligible_r1 = df.query("eligible == 0 and round == 1").copy()

# Build baseline covariates (round==0) and merge when needed
baseline = (
    df.query("round == 0")
      .sort_values(["household_identifier", "round"])
      .drop_duplicates(subset=["household_identifier"])
      .loc[:, ["household_identifier", "health_expenditures", "poverty_index"]]
      .rename(columns={
          "health_expenditures": "bl_health_expenditures",
          "poverty_index": "bl_poverty_index"
      })
)

In [21]:
# ============================================================
# Q3(i) — ELIGIBLE, ROUND 1: OLS without controls
# Stata: reg health_expenditures treatment_locality if eligible==1 & round==1, cl(locality_identifier)
# ============================================================
form1 = "health_expenditures ~ treatment_locality"
res1 = fit_cluster_ols(form1, eligible_r1, cluster_col="locality_identifier")
print(res1.summary())

                             OLS Regression Results                            
Dep. Variable:     health_expenditures   R-squared:                       0.300
Model:                             OLS   Adj. R-squared:                  0.300
Method:                  Least Squares   F-statistic:                     656.8
Date:                 Tue, 23 Sep 2025   Prob (F-statistic):           1.70e-64
Time:                         22:08:25   Log-Likelihood:                -19497.
No. Observations:                 5629   AIC:                         3.900e+04
Df Residuals:                     5627   BIC:                         3.901e+04
Df Model:                            1                                         
Covariance Type:               cluster                                         
                         coef    std err          z      P>|z|      [0.025      0.975]
--------------------------------------------------------------------------------------
Intercept             17.9

In [22]:
# ============================================================
# Q3(ii) — ELIGIBLE, ROUND 1: OLS + HH head/spouse characteristics
# Stata: reg health_expenditures treatment_locality $controls1 if eligible==1 & round==1, cl(locality_identifier)
# NOTE: Edit the list below to match columns in your file; code keeps only those that exist.
# ============================================================
candidate_controls1 = [
    # EXAMPLES — replace with your actual columns if present:
    "head_age", "head_education", "head_literate",
    "spouse_age", "spouse_education", "spouse_literate",
    "household_size"
]
controls1 = [c for c in candidate_controls1 if c in eligible_r1.columns]

form2 = "health_expenditures ~ treatment_locality" + ((" + " + " + ".join(controls1)) if controls1 else "")
print("Controls used in Q3(ii):", controls1 if controls1 else "(none found)")
res2 = fit_cluster_ols(form2, eligible_r1, cluster_col="locality_identifier")
print(res2.summary())

Controls used in Q3(ii): (none found)
                             OLS Regression Results                            
Dep. Variable:     health_expenditures   R-squared:                       0.300
Model:                             OLS   Adj. R-squared:                  0.300
Method:                  Least Squares   F-statistic:                     656.8
Date:                 Tue, 23 Sep 2025   Prob (F-statistic):           1.70e-64
Time:                         22:08:34   Log-Likelihood:                -19497.
No. Observations:                 5629   AIC:                         3.900e+04
Df Residuals:                     5627   BIC:                         3.901e+04
Df Model:                            1                                         
Covariance Type:               cluster                                         
                         coef    std err          z      P>|z|      [0.025      0.975]
---------------------------------------------------------------------------

In [23]:
# ============================================================
# Q3(iii) — ELIGIBLE, ROUND 1: OLS + HH head/spouse + BASELINE covariates
# Stata: reg health_expenditures treatment_locality $controls if eligible==1 & round==1, cl(locality_identifier)
# Baseline covariates are pre-program (round==0) values merged to round==1 rows.
# ============================================================
eligible_r1_bl = eligible_r1.merge(baseline, on="household_identifier", how="left")
baseline_controls = [c for c in ["bl_health_expenditures", "bl_poverty_index"] if c in eligible_r1_bl.columns]

form3_parts = ["health_expenditures ~ treatment_locality"]
if controls1:         form3_parts.append(" + " + " + ".join(controls1))
if baseline_controls: form3_parts.append(" + " + " + ".join(baseline_controls))
form3 = "".join(form3_parts)

print("Head/Spouse controls in Q3(iii):", controls1 if controls1 else "(none)")
print("Baseline controls in Q3(iii):   ", baseline_controls if baseline_controls else "(none)")
res3 = fit_cluster_ols(form3, eligible_r1_bl, cluster_col="locality_identifier")
print(res3.summary())

# Effect size as % of control mean (interpretation helper)
ctrl_mean = eligible_r1_bl.loc[eligible_r1_bl["treatment_locality"] == 0, "health_expenditures"].mean()
b = res3.params.get("treatment_locality", np.nan)
pct = 100 * b / ctrl_mean if pd.notna(b) and ctrl_mean != 0 else np.nan
print(f"\n[Impact Q3(iii)] Coef on treatment_locality = {b:.3f}")
print(f"Control mean (eligible, r1) = {ctrl_mean:.3f}")
print(f"Percent change vs control mean = {pct:.2f}%")

Head/Spouse controls in Q3(iii): (none)
Baseline controls in Q3(iii):    ['bl_health_expenditures', 'bl_poverty_index']
                             OLS Regression Results                            
Dep. Variable:     health_expenditures   R-squared:                       0.429
Model:                             OLS   Adj. R-squared:                  0.429
Method:                  Least Squares   F-statistic:                     444.1
Date:                 Tue, 23 Sep 2025   Prob (F-statistic):           4.08e-87
Time:                         22:09:03   Log-Likelihood:                -18920.
No. Observations:                 5628   AIC:                         3.785e+04
Df Residuals:                     5624   BIC:                         3.788e+04
Df Model:                            3                                         
Covariance Type:               cluster                                         
                             coef    std err          z      P>|z|      [0.025  

**TASK 5** What is the impact of the program on out-of-pocket health expenditures? What is the percent decrease that can be attributed to the program?

In [24]:
# =========================================
# Q4 — Program impact & percent decrease
# Using the preferred spec from Q3(iii):
#   Eligible households, round==1
#   OLS: health_expenditures ~ treatment_locality + head/spouse chars + baseline covariates
# Assumes you already ran Q3(iii) and have:
#   - res3  : regression results object for the eligible sample with controls
#   - eligible_r1_bl : the DataFrame used in Q3(iii) (eligible & round==1 with baseline merged)
# =========================================

# Coefficient on the treatment indicator is the ITT effect in currency units
b = res3.params["treatment_locality"]
se = res3.bse["treatment_locality"]
p  = res3.pvalues["treatment_locality"]
ci_low, ci_high = res3.conf_int().loc["treatment_locality"].tolist()

# Control-group mean OOP (eligible, round 1) — for scaling to %
control_mean = (
    eligible_r1_bl.loc[eligible_r1_bl["treatment_locality"] == 0, "health_expenditures"]
    .mean()
)

# Percent change relative to the control mean
pct = 100 * b / control_mean
pct_low = 100 * ci_low / control_mean
pct_high = 100 * ci_high / control_mean

print("=== Q4. Program Impact on Out-of-Pocket (Preferred spec: Q3(iii)) ===")
print(f"ITT impact (coef on treatment_locality): {b:,.3f} currency units")
print(f"Cluster-robust SE: {se:,.3f}   p-value: {p:.3f}")
print(f"95% CI for impact: [{ci_low:,.3f}, {ci_high:,.3f}]")
print(f"Control mean (eligible, round 1): {control_mean:,.3f}")

direction = "decrease" if b < 0 else "increase"
print(f"\nPercent {direction} vs control mean: {pct:.2f}% "
      f"(95% CI: {pct_low:.2f}%, {pct_high:.2f}%)")

# ------------------------------------------------------------------
# Interpretation notes (brief):
# - b < 0 ⇒ the program reduced OOP by |pct| percent on average (ITT).
# - This is an intention-to-treat effect: it reflects assignment of a
#   locality to start the program in year 1, not necessarily take-up.
# - For treatment-on-the-treated, you’d need an IV using assignment as
#   an instrument for enrollment (not requested here).
# ------------------------------------------------------------------

=== Q4. Program Impact on Out-of-Pocket (Preferred spec: Q3(iii)) ===
ITT impact (coef on treatment_locality): -10.068 currency units
Cluster-robust SE: 0.342   p-value: 0.000
95% CI for impact: [-10.739, -9.398]
Control mean (eligible, round 1): 17.981

Percent decrease vs control mean: -55.99% (95% CI: -59.72%, -52.26%)


Let us say that the minister of health was pleased with the quality and results of the evaluation of the Health Insurance Subsidy Program (HISP). However, before scaling up the HISP, the government decides to pilot an expanded version of the program (which they call HISP+). HISP pays for part of the cost of health insurance for poor rural households, covering costs of primary care and drugs, but it does not cover hospitalization. The minister of health wonders whether an expanded HISP+ that also covers hospitalization would further lower out-of-pocket health expenditures. They ask you to design an impact evaluation to assess whether HISP+ further lowers health expenditures for poor rural households. In this case, choosing an impact evaluation design is not a challenge for you: HISP+ has limited resources and cannot be implemented universally immediately. As a result, you have concluded that randomized assignment would be the most viable and robust impact evaluation method. The minister of health understands how well the randomized assignment method works and is supportive. To ﬁnalize the design of the impact evaluation, you need to determine how big a sample is needed. Note that since we are using a random assignment scenario, you can drop the ineligible (eligible = 0) households from the dataset to simplify your code. 

**TASK 6** Use data from the follow-up HISP survey to obtain the benchmark mean and standard deviation for the two outcome indicators of interest to the minister of health—health expenditures and hospitalization. 

In [26]:
# 0) Keep only eligible households
df_e = df.loc[df["eligible"] == 1].copy()

# 1) Benchmarks from follow-up HISP survey (round==1) — in the treatment arm
followup_treat = df_e.query("round == 1 and treatment_locality == 1").copy()

# Hospitalization variable name can be 'hospital' or 'hospitalization' depending on the file
hosp_col = "hospital" if "hospital" in followup_treat.columns else "hospitalization"

m1_exp = followup_treat["health_expenditures"].mean()           # mean OOP in treated at follow-up
sd_exp = followup_treat["health_expenditures"].std(ddof=1)      # SD OOP (used for both arms in power calc)

m1_h   = followup_treat[hosp_col].mean()                         # hospitalization rate in treated (0–1)
sd_h   = followup_treat[hosp_col].std(ddof=1)                    # SD of the binary outcome (≈ sqrt(p*(1-p)))

print("=== Q1. Benchmarks from follow-up (treated, eligible only) ===")
print(f"Health exp — mean = {m1_exp:,.3f},  SD = {sd_exp:,.3f}")
print(f"Hospital    — mean = {m1_h:.3f} ({100*m1_h:.1f}%),  SD = {sd_h:.3f}")
print("\nInterpretation: These are the reference mean and variability used to size the trial. "
      "Higher SDs or smaller MDEs will require larger samples.\n")

=== Q1. Benchmarks from follow-up (treated, eligible only) ===
Health exp — mean = 7.840,  SD = 7.994
Hospital    — mean = 0.049 (4.9%),  SD = 0.215

Interpretation: These are the reference mean and variability used to size the trial. Higher SDs or smaller MDEs will require larger samples.



**TASK 7**. Determine the sample size needed for a minimum detectable effect of $1, $2, and $3 decrease in household out-of pocket health expenditures. Compare the sample sizes required depending on the power level you use, 0.8 or 0.9.

In [27]:
# -------------------------------------------------------------------
# Helper: sample size per group for a two-sample difference in means
# Uses the same SD for both arms and equal allocation (ratio=1), α=0.05.
# This matches Stata: sampsi m1 m2, r(1) sd1(sd) sd2(sd)
# -------------------------------------------------------------------
power_calc = TTestIndPower()
ALPHA = 0.05

def n_per_group_from_mde(sd, mde, power):
    """
    sd   : common standard deviation for the outcome
    mde  : absolute difference you want to detect (|μ_T - μ_C|)
    power: desired power (e.g., 0.8 or 0.9)
    Returns: required sample size per group (floated; round up in practice).
    """
    effect_size = mde / sd
    # solve_power returns the required n in the *first* group; with ratio=1 this is per group
    n = power_calc.solve_power(effect_size=effect_size, alpha=ALPHA,
                               power=power, ratio=1.0, alternative="two-sided")
    return n


In [28]:
# 2) Expenditures — MDEs of $1, $2, $3; power 0.8 and 0.9
mde_exp_list = [1.0, 2.0, 3.0]
powers = [0.8, 0.9]

rows_exp = []
for pwr in powers:
    for mde in mde_exp_list:
        n_pg = n_per_group_from_mde(sd=sd_exp, mde=mde, power=pwr)
        rows_exp.append({"Outcome":"Health expenditures",
                         "MDE (currency units)": mde,
                         "Power": pwr,
                         "N per group (equal arms)": np.ceil(n_pg),
                         "N total (2 arms)": int(np.ceil(2*n_pg))})
exp_table = pd.DataFrame(rows_exp)
print("=== Q2. Required sample sizes for OOP expenditures (eligible only) ===")
print(exp_table.to_string(index=False))
print("\nImplication: Detecting smaller dollar changes requires disproportionately larger samples. "
      "At a fixed MDE, moving from 80% to 90% power increases the required sample size.\n")

=== Q2. Required sample sizes for OOP expenditures (eligible only) ===
            Outcome  MDE (currency units)  Power  N per group (equal arms)  N total (2 arms)
Health expenditures                   1.0    0.8                    1005.0              2009
Health expenditures                   2.0    0.8                     252.0               504
Health expenditures                   3.0    0.8                     113.0               225
Health expenditures                   1.0    0.9                    1345.0              2689
Health expenditures                   2.0    0.9                     337.0               674
Health expenditures                   3.0    0.9                     151.0               301

Implication: Detecting smaller dollar changes requires disproportionately larger samples. At a fixed MDE, moving from 80% to 90% power increases the required sample size.



**TASK 8**. Determine the sample size needed for a minimum detectable effect of 1%, 2%, and 3% increases in the hospitalization rate. Compare the sample sizes required depending on the power level you use, 0.8 or 0.9.

In [29]:
# 3) Hospitalization — MDEs of +1, +2, +3 percentage points
#    We follow the same 'two means with common SD' approach as in the Stata script:
#    treat hospitalization as continuous (0/1) and use sd_h for both arms.
mde_pp = [0.01, 0.02, 0.03]  # absolute differences (e.g., +0.01 = +1 percentage point)

rows_h = []
for pwr in powers:
    for mde in mde_pp:
        n_pg = n_per_group_from_mde(sd=sd_h, mde=mde, power=pwr)
        rows_h.append({"Outcome":"Hospitalization rate",
                       "MDE (abs. points)": f"{int(mde*100)} pp",
                       "Power": pwr,
                       "N per group (equal arms)": np.ceil(n_pg),
                       "N total (2 arms)": int(np.ceil(2*n_pg))})
hosp_table = pd.DataFrame(rows_h)
print("=== Q3. Required sample sizes for hospitalization rate (eligible only) ===")
print(hosp_table.to_string(index=False))
print("\nImplication: Because hospitalization is relatively rare, its SD is tied to p(1-p). "
      "Smaller MDEs in percentage points demand very large samples; increasing target power "
      "from 0.80 to 0.90 also raises the sample size notably.\n")

# -------------------------------------------
# Quick design takeaways (printed as comments)
# -------------------------------------------
print("Design notes:")
print("* Use equal allocation unless logistics argue otherwise; with fixed total N, equal arms maximize power.")
print("* If your pilot suggests clustering (e.g., village-level randomization), inflate N using a design effect:")
print("    DEFF = 1 + (m - 1) * ICC   (m = cluster size, ICC = intra-cluster correlation).")
print("* The tables above report simple individual-level calculations; adjust upward if clustering or attrition is expected.")

=== Q3. Required sample sizes for hospitalization rate (eligible only) ===
             Outcome MDE (abs. points)  Power  N per group (equal arms)  N total (2 arms)
Hospitalization rate              1 pp    0.8                    7257.0             14514
Hospitalization rate              2 pp    0.8                    1815.0              3630
Hospitalization rate              3 pp    0.8                     808.0              1615
Hospitalization rate              1 pp    0.9                    9715.0             19430
Hospitalization rate              2 pp    0.9                    2430.0              4859
Hospitalization rate              3 pp    0.9                    1081.0              2161

Implication: Because hospitalization is relatively rare, its SD is tied to p(1-p). Smaller MDEs in percentage points demand very large samples; increasing target power from 0.80 to 0.90 also raises the sample size notably.

Design notes:
* Use equal allocation unless logistics argue otherwise;