# Quickstart

The main class in the package is ``dualbounds.generic.DualBounds``, which computes dual bounds on a partially identified estimand of the form

$$\theta = E[f(Y(0), Y(1), X)] $$

For example, when $f(Y(0), Y(1), X) = \mathbb{I}(Y(0) < Y(1))$, $\theta$ is the proportion of individuals who benefit from the treatment. 

Given covariates $X \in \mathbb{R}^{n \times p}$, a treatment vector $W \in \{0,1\}^n$, an outcome vector $y \in \mathbb{R}^n$, and (optional) propensity scores $\pi_i \in [0,1]^n$ where $\pi_i = P(W_i = 1 \mid X_i)$, the ``DualBounds`` class makes it easy to perform provably valid inference on $\theta$ using a wide variety of machine learning models, as shown below.

In [69]:
# Import packages
import sys; sys.path.insert(0, "../../")
import numpy as np
import dualbounds as db
from dualbounds.generic import DualBounds

# Generate synthetic data from a heavy-tailed linear model
data = db.gen_data.gen_regression_data(
    n=900, # Number of datapoints
    p=30, # Dimensionality
    r2=0.95, # population R^2
    tau=3, # average treatment effect
    interactions=True, # ensures treatment effect is heterogenous
    eps_dist='laplace', # heavy-tailed residuals
    sample_seed=123, # random seed
)

# Initialize dual bounds object
dbnd = DualBounds(
    f=lambda y0, y1, x: y0 < y1,#np.maximum(0, y1-y0),
    X=data['X'], 
    W=data['W'],
    y=data['Y'],
    pis=data['pis'], # propensity scores
    #Y_model=db.dist_reg.RidgeDistReg, # model for Y | X, W
)

# Compute dual bounds and observe output
dbnd.compute_dual_bounds()

100%|████████████████████████████████████████████████████████| 900/900 [00:01<00:00, 817.92it/s]


{'estimates': array([0.57745911, 0.92498299]),
 'ses': array([0.02331765, 0.0129798 ]),
 'cis': array([0.53175736, 0.95042292])}