# Quickstart

The main class in the package is ``dualbounds.generic.DualBounds``, which computes dual bounds on a partially identified estimand of the form

$$\theta = E[f(Y(0), Y(1), X)]. $$

<!--For example, when $f(Y(0), Y(1), X) = \mathbb{I}(Y(0) < Y(1))$, $\theta$ is the proportion of individuals who benefit from the treatment. Such estimands are *partially identified* because we never observe the joint law of the potential outcomes, but the data still contains information on the law of $(Y(0), X)$ and $(Y(1), X)$, allowing us to *bound* $\theta$.-->

<!--Given covariates $X \in \mathbb{R}^{n \times p}$, a treatment vector $W \in \{0,1\}^n$, an outcome vector $y \in \mathbb{R}^n$, and (optional) propensity scores $\pi_i \in [0,1]^n$ where $\pi = P(W_i = 1 \mid X_i)$, the ``DualBounds`` class performs provably valid inference on $\theta$ using one of a wide variety of machine learning models. -->

Crucially, the confidence intervals produced by DualBounds are **always** valid in randomized experiments, even if the underlying machine learning model is arbitrarily misspecified.

In [1]:
# Import packages
import sys; sys.path.insert(0, "../../")
import numpy as np
import dualbounds as db
from dualbounds.generic import DualBounds

# Generate synthetic data from a heavy-tailed linear model
data = db.gen_data.gen_regression_data(
    n=900, # Num. datapoints
    p=30, # Num. covariates
    r2=0.95, # population R^2
    tau=3, # average treatment effect
    interactions=True, # ensures treatment effect is heterogenous
    eps_dist='laplace', # heavy-tailed residuals
    sample_seed=123, # random seed
)

# Initialize dual bounds object
dbnd = DualBounds(
    f=lambda y0, y1, x: y0 < y1,
    X=data['X'], # n x p covariate matrix
    W=data['W'], # n-length treatment vector
    y=data['y'], # n-length outcome vector
    pis=data['pis'], # n-length propensity scores (optional)
    Y_model='ridge', # model for Y | X, W
)

# Compute dual bounds and observe output
dbnd.compute_dual_bounds(
    alpha=0.05 # nominal level
)

Cross-fitting the outcome model.


  0%|          | 0/5 [00:00<?, ?it/s]

Estimating optimal dual variables.


  0%|          | 0/900 [00:00<?, ?it/s]

{'estimates': array([0.58374648, 0.93389944]),
 'ses': array([0.02336725, 0.01422112]),
 'cis': array([0.5379475 , 0.96177232])}

Note that there are two estimates---both a lower and an upper estimate---because the estimand is not identified.