# Question 1 (5 points)
Estimate the intention-to-treat (ITT) effect of offering the discount on the improvement of recovery, E[Y (Z = 1)]−E[Y (Z = 0)], using a difference-in-means estimator.
Also estimate the standard error and the asymptotic 95% confidence interval. Explain why, the ITT effect can be different from the contrast that compares outcomes
Y
obs of the patients who take vs. do not take physiotherapy.
Be aware that the input data is aggregated, so you should either used weighted
estimators (for the mean and standard error). You can use Python code for the
computations, and in that case manually create the input dataframe from the given
table.

In [12]:
import numpy as np
import pandas as pd
from statsmodels.stats.proportion import proportion_confint

# Create a DataFrame from the provided data
data = {
    'Z': [0, 0, 0, 0, 1, 1, 1, 1],
    'Tobs': [0, 0, 1, 1, 0, 0, 1, 1],
    'Yobs': [0, 1, 0, 1, 0, 1, 0, 1],
    'n': [185, 123, 9, 41, 37, 20, 26, 96]
}

df = pd.DataFrame(data)

# Calculate the weighted means of Yobs for Z=0 and Z=1
mean_Yobs_0 = (df[(df['Z'] == 0)]['Yobs'] * df[(df['Z'] == 0)]['n']).sum() / df[(df['Z'] == 0)]['n'].sum()
mean_Yobs_1 = (df[(df['Z'] == 1)]['Yobs'] * df[(df['Z'] == 1)]['n']).sum() / df[(df['Z'] == 1)]['n'].sum()

# Calculate the weighted proportions of successful outcomes (Yobs=1)
prop_Yobs_0 = (df[(df['Z'] == 0)]['Yobs'] * df[(df['Z'] == 0)]['n']).sum() / df[(df['Z'] == 0)]['n'].sum()
prop_Yobs_1 = (df[(df['Z'] == 1)]['Yobs'] * df[(df['Z'] == 1)]['n']).sum() / df[(df['Z'] == 1)]['n'].sum()

# Calculate the ITT effect as the difference in proportions
ITT_effect = prop_Yobs_1 - prop_Yobs_0

# Calculate the standard error for the ITT effect
n1 = df[(df['Z'] == 1)]['n'].sum()
n0 = df[(df['Z'] == 0)]['n'].sum()

SE_ITT = np.sqrt(prop_Yobs_1 * (1 - prop_Yobs_1) / n1 + prop_Yobs_0 * (1 - prop_Yobs_0) / n0)

# Calculate the 95% confidence interval
conf_int = (ITT_effect - 1.96 * SE_ITT, ITT_effect + 1.96 * SE_ITT)

# Print the results
print("ITT Effect: {:.4f}".format(ITT_effect))
print("Standard Error: {:.4f}".format(SE_ITT))
print("95% Confidence Interval: ({:.4f}, {:.4f})".format(conf_int[0], conf_int[1]))


ITT Effect: 0.1899
Standard Error: 0.0444
95% Confidence Interval: (0.1030, 0.2769)


# In plain language of this setting, and using the potential treatment notation, what are the four possible strata defined by the instrument and the treatment values?

In the context of this study, which involves surgery and physiotherapy, we can define four possible strata based on the instrument (Z, indicating the discounted physiotherapy) and the treatment status (Tobs, indicating whether the patient actually received physiotherapy). These strata help us categorize the patients into different groups for analysis:

Stratum 1: No Discount (Z=0) and No Physiotherapy (Tobs=0):

Patients in this group were not offered a discount on physiotherapy, and they did not receive physiotherapy. These are the patients who neither had the option of a discount nor chose to undergo physiotherapy.
Stratum 2: No Discount (Z=0) and Physiotherapy (Tobs=1):

Patients in this group were not offered a discount on physiotherapy but still chose to receive physiotherapy. They paid the standard cost for physiotherapy.
Stratum 3: Discount (Z=1) and No Physiotherapy (Tobs=0):

Patients in this group were offered a discount on physiotherapy but did not opt for physiotherapy.
Stratum 4: Discount (Z=1) and Physiotherapy (Tobs=1):

Patients in this group were both offered a discount on physiotherapy and chose to receive physiotherapy at the reduced cost.
These strata help categorize the patients based on the instrument (discount) and their actual treatment status (whether they received physiotherapy or not). Analyzing outcomes within these strata can provide insights into the impact of the discount on patient recovery.

# In plain language of this setting, and in terms of potential outcomes, state the four assumptions under which the randomizer Zi is an ”instrument”, and the local ATE is non-parametrically identified. Discuss their plausibility.

n the context of this study involving surgery, physiotherapy, and potential outcomes, there are four key assumptions that, if met, make the randomizer Zi a valid "instrument" and allow for the non-parametric identification of the Local Average Treatment Effect (ATE). Let's discuss these assumptions and their plausibility:

Relevance (First Stage):

Assumption: The random assignment Zi (offering a discount on physiotherapy) has a significant impact on the actual treatment Tobs (whether the patient receives physiotherapy or not).
Plausibility: This assumption is plausible if the assignment of the discount significantly influences the decision to receive physiotherapy. For example, if patients are more likely to choose physiotherapy when it's offered at a reduced cost, this assumption holds.
Excludability (Second Stage):

Assumption: The assignment Zi is independent of potential recovery outcomes Yobs, conditional on observed covariates (if any).
Plausibility: This assumption assumes that the discount assignment is not influenced by factors that directly affect patient recovery. It's plausible if the random assignment was truly random and not influenced by hidden variables or confounding factors.
Monotonicity:

Assumption: There are no "defiers" in the population, meaning that no patient would choose physiotherapy if a discount is offered and refuse it if the standard cost is charged.
Plausibility: Monotonicity is a challenging assumption to verify in practice. In healthcare, it's reasonable to assume that people who have the option of a discount will use it, rather than refusing treatment. However, it's not always easy to confirm this assumption.
Common Support:

Assumption: There is overlap in the assignment variable Zi for patients who received physiotherapy and those who did not, meaning that there is a range of possible discount assignments for all types of patients.
Plausibility: Common support assumes that both treatment and control groups have some patients with and without the discount. In practice, this can be challenging if, for example, the discount is only offered to a specific subset of patients.

# Which of the assumptions from the question 3 is/are enough to estimate the proportion of ”never-takers”, i.e. patients who would not take physiotherapy whether or not they had been offered the discount in this study? Under this/these assumption(s), report estimates of the proportions of the groups defined in question 2.

To estimate the proportion of "never-takers" (patients who would not take physiotherapy regardless of whether they were offered a discount), we primarily need the Monotonicity assumption to hold. This assumption ensures that there are no "defiers" in the population, meaning that all patients who have the option of receiving a discount will use it, rather than refusing treatment. In this case, the never-takers are those who fall into the group where both Z=0 (no discount) and Tobs=0 (no physiotherapy). These patients are not influenced by the discount and would not take physiotherapy even if it were offered at a reduced cost.

Given the Monotonicity assumption, we can estimate the proportion of never-takers as follows:

Proportion of Never-Takers = P(Z=0, Tobs=0)

From the provided data:

P(Z=0, Tobs=0) = 185 (patients in this group) / Total patients

# Question 5 (5 points)Under assumptions from question 3, estimate the local ATE. In which group defined in question 2 is this treatment effect estimated? You can use the python function IV2SLS to provide the standard error and a 95% confidence interval for your estimate.

In [14]:
pip install linearmodels

Collecting linearmodelsNote: you may need to restart the kernel to use updated packages.

  Downloading linearmodels-5.3-cp39-cp39-win_amd64.whl (2.0 MB)
     ---------------------------------------- 2.0/2.0 MB 616.1 kB/s eta 0:00:00
Collecting pyhdfe>=0.1
  Downloading pyhdfe-0.2.0-py3-none-any.whl (19 kB)
Collecting setuptools-scm[toml]<8.0.0,>=7.0.0
  Downloading setuptools_scm-7.1.0-py3-none-any.whl (43 kB)
     -------------------------------------- 43.8/43.8 kB 239.9 kB/s eta 0:00:00
Collecting Cython>=0.29.34
  Downloading Cython-3.0.3-cp39-cp39-win_amd64.whl (2.8 MB)
     ---------------------------------------- 2.8/2.8 MB 316.8 kB/s eta 0:00:00
Collecting formulaic>=0.6.5
  Downloading formulaic-0.6.6-py3-none-any.whl (91 kB)
     -------------------------------------- 91.0/91.0 kB 576.9 kB/s eta 0:00:00
Collecting astor>=0.8
  Downloading astor-0.8.1-py2.py3-none-any.whl (27 kB)
Collecting interface-meta>=1.2.0
  Downloading interface_meta-1.3.0-py3-none-any.whl (14 kB)
Insta

In [None]:
from linearmodels.iv import IV2SLS

# Create a DataFrame from the provided data
data = {
    'Z': [0, 0, 0, 0, 1, 1, 1, 1],
    'Tobs': [0, 0, 1, 1, 0, 0, 1, 1],
    'Yobs': [0, 1, 0, 1, 0, 1, 0, 1],
    'n': [185, 123, 9, 41, 37, 20, 26, 96]
}

df = pd.DataFrame(data)

# Define the IV model
iv_model = IV2SLS.from_formula("Yobs ~ 1 + [Z ~ Tobs]", data=df)

# Estimate the local ATE
results = iv_model.fit()

# Get the local ATE, standard error, and 95% confidence interval
local_ATE = results.params['Z']
SE_local_ATE = results.std_errors['Z']
conf_int_local_ATE = results.conf_int()[0]

# Print the results
print("Local ATE: {:.4f}".format(local_ATE))
print("Standard Error: {:.4f}".format(SE_local_ATE))
print("95% Confidence Interval: ({:.4f}, {:.4f})".format(conf_int_local_ATE[0], conf_int_local_ATE[1]))


In [None]:
print("Local ATE: {:.4f}".format(local_ATE))

# Question 6 (5 points)Discuss briefly (i) the clinical and (ii) the health policy implications of the difference between your estimates in question 5 vs. question 1.

(i) Clinical Implications:

Question 1 (ITT Effect): The ITT effect estimates the overall impact of offering a discount on physiotherapy on patient recovery, regardless of whether patients actually chose to take up the offer. The ITT effect provides a broad perspective on the potential benefits of making physiotherapy more accessible by reducing costs. The estimate here, based on randomized assignment, is not influenced by patient choices.

Question 5 (Local ATE): The Local ATE, estimated within the group of patients who were offered a discount but still chose to receive physiotherapy, represents the impact of the discount on a specific subset of patients. This estimate is conditional on patient choice and provides insights into the treatment effect for those who decided to take up the offer.

Clinical Implications Comparison: The difference between these estimates highlights the importance of patient decision-making. The ITT effect reflects the broad policy impact, while the Local ATE offers a more tailored perspective for patients who actively opted for physiotherapy. Clinically, this indicates that some patients may be highly motivated to receive physiotherapy if costs are reduced, while others may not be as influenced by the discount.

(ii) Health Policy Implications:

Question 1 (ITT Effect): The ITT effect is relevant for health policymakers in understanding the overall population-level impact of a policy change. In this case, it provides insight into how offering discounted physiotherapy can impact recovery outcomes for patients across the board, taking into account the entire eligible patient population.

Question 5 (Local ATE): The Local ATE, focusing on the group of patients who actively sought physiotherapy with the discount, has policy implications for healthcare cost management. It suggests that offering discounts may attract patients who are highly motivated to undergo physiotherapy and potentially lead to increased healthcare utilization and costs.

Health Policy Implications Comparison: Policymakers need to consider the trade-off between broad access to healthcare services and managing costs. The ITT effect provides a population-level view, while the Local ATE highlights the behavior of a specific patient subgroup. Health policy decisions should balance these insights, considering the potential demand for services and their associated costs.