# Python: Confidence intervals for instrumental variables models that are robust to weak instruments

In this example we will show how to use the DoubleML package to obtain confidence sets for the treatment effects that are robust to weak instruments. Weak instruments are those that have a relatively weak correlation with the treatment. It is well known that in this case, standard methods to construct confidence intervals have poor properties and can have coverage much lower than the nominal value. We will assume that the reader of this notebook is already familiar with DoubleML and how it can be used to fit instrumental variable models.

Throughout this example

- Z is the instrument,
- X is a vector of covariates,
- D is treatment variable,
- Y is the outcome.

Next, we will generate two synthetic data sets, one where the instrument is weak and another where it is not. Then, we will compare the output of the standard way to compute confidence intervals using the DoubleMLIIVM class, with the confidence sets computed using the uniform_confset method from the same class. We will see that using the uniform_confset method is an easy way to ensure the results of an analysis are robust to weak instruments.

In [15]:
import numpy as np
import pandas as pd
import random
from sklearn.linear_model import LinearRegression, LogisticRegression
import doubleml as dml

np.random.seed(1234)
random.seed(1234)


## Generating synthetic data

The following function generates data from an instrumental variables model. The true_effect argument is the estimand of interest, the true effect of the treatment on the outcome. The instrument_strength argument is a measure of the strength of the instrument. The higher it is, the stronger the correlation is between the instrument and the treatment. Notice that the instrument is fully randomized.

In [16]:
def generate_weakiv_data(n_samples, true_effect, instrument_strength):
    u = np.random.normal(0, 2, size=n_samples)
    X = np.random.normal(0, 1, size=n_samples)
    Z = np.random.binomial(1, 0.5, size=n_samples)
    D = instrument_strength * Z + u 
    D = np.array(D > 0, dtype=int)
    Y = true_effect * D + np.sign(u)
    return pd.DataFrame({"Y": Y, "Z": Z, "D": D, "X": X})

We call the function two times to get two data sets, one where the instrument is weak, the other where the instrument is strong. In both cases the true effect is 1.

In [17]:
data_weak = generate_weakiv_data(5000, 1, 0.003)
data_strong = generate_weakiv_data(5000, 1, 1)

## Fitting the DoubleML model

Next, we fit the DoubleML model. We begin by preparing the two data sets into the DoubleMLData format.

In [18]:
dml_data_strong = dml.DoubleMLData(
    data_strong, y_col='Y', d_cols='D', 
    z_cols='Z', x_cols='X'
)
dml_data_weak = dml.DoubleMLData(
    data_weak, y_col='Y', d_cols='D', 
    z_cols='Z', x_cols='X'
)

Next, we define the nuisance estimators we will use. We will use a linear regression model for g, and a logistic regression for r. We will assume that we know the true m function, as is the case in a controlled experiment, such as an AB test.

In [19]:
class TrueMFunction(LogisticRegression):
    def predict(self, X):
        return np.full(X.shape[0], 0.5)

ml_g = LinearRegression()
ml_m = TrueMFunction()
ml_r = LogisticRegression(penalty=None)

Now, we fit the DoubleML model on the data set with a strong instrument and then print both the standard and robust results confidence sets. We see that the results are similar.

In [20]:
dml_iivm_strong = dml.DoubleMLIIVM(dml_data_strong, ml_g, ml_m, ml_r)

print("Standard confidence interval results")
print(dml_iivm_strong.fit().summary)
print("Uniform confidence set results")
print(dml_iivm_strong.robust_confset())

Standard confidence interval results
       coef   std err         t         P>|t|    2.5 %    97.5 %
D  0.951444  0.144382  6.589761  4.405342e-11  0.66846  1.234428
Uniform confidence set results
[(np.float64(0.6317710537372013), np.float64(1.206690551996729))]


We now repeat the process with the weak instruments data set. In this case, the standard method reports a confidence interval equal to [2.08, 3.46]. Thus, an analyst reading this would think that she can have high confidence that the true effect is roughly between 2 and 3.5. We know however that the true effect is equal to 1, and thus that the standard DoubleML estimator is badly biased in this case. On the other hand, the uniform confidence set method returns the whole real line as a confidence interval. This indicates that the data does not contain enough information to make any claims about the effect of the treatment of the outcome, because the instrument is too weak.

In [21]:
ml_g = LinearRegression()
ml_m = TrueMFunction()
ml_r = LogisticRegression(penalty=None)
dml_iivm_weak = dml.DoubleMLIIVM(dml_data_weak, ml_g, ml_m, ml_r)

print("Standard confidence interval results")
print(dml_iivm_weak.fit().summary)
print("Uniform confidence set results")
print(dml_iivm_weak.robust_confset())

Standard confidence interval results
       coef   std err         t     P>|t|     2.5 %    97.5 %
D  2.210317  3.689771  0.599039  0.549147 -5.021502  9.442136
Uniform confidence set results
[(-inf, inf)]


## References

- Chernozhukov, V., Chetverikov, D., Demirer, M., Duflo, E., and Hansen, C. (2018). Double/debiased machine learning for
treatment and structural parameters. The Econometrics Journal, 21(1):C1–C68.
- Ma, Y. (2023). Identification-robust inference for the late with high-dimensional covariates. arXiv preprint arXiv:2302.09756.
- Stock, J. H. and Wright, J. H. (2000). GMM with weak identification. Econometrica, 68(5):1055–1096.
- Takatsu, K., Levis, A. W., Kennedy, E., Kelz, R., and Keele, L. (2023). Doubly robust machine learning for an instrumental
variable study of surgical care for cholecystitis. arXiv preprint arXiv:2307.06269.