# Additional estimands

Dual bounds can apply beyond the settings described in the previous sections. 

## Variance of the CATE

Dual bounds can also be used to *lower-bound* the variance of the conditional average treatment effect $\theta = \text{Var}(E[Y(1) - Y(0) \mid X])$, as shown below.

In [1]:
# Import packages
import sys; sys.path.insert(0, "../../../")
import numpy as np
import dualbounds as db
from dualbounds.generic import DualBounds
# Generate synthetic data
data = db.gen_data.gen_regression_data(n=500, p=30)

In [2]:
vdb = db.varcate.CalibratedVarCATEDualBounds(
    outcome=data['y'],
    treatment=data['W'], 
    covariates=data['X'],
    propensities=data['pis'],
    outcome_model='elasticnet',
)
vdb.fit()
print(vdb.results().to_markdown())

Cross-fitting the outcome model.


  0%|          | 0/5 [00:00<?, ?it/s]

Fitting cluster bootstrap to aggregate results.


  0%|          | 0/1000 [00:00<?, ?it/s]

|            |    Lower |   Upper |
|:-----------|---------:|--------:|
| Estimate   | 8.48493  |     nan |
| SE         | 0.748497 |     nan |
| Conf. Int. | 6.98069  |     nan |


We broadly recommend using the class ``CalibratedVarCATEDualBounds`` instead of ``VarCATEDualBounds``. (Both have the same API, but the former will yield more powerful results.)

## Variance of the ITE

Dual bounds can also be used to upper and lower bound the variance of the individual treatment effect $\theta = \text{Var}(Y(1) - Y(0))$, as shown below.

In [3]:
vdb = db.varite.VarITEDualBounds(
    outcome=data['y'],
    treatment=data['W'], 
    covariates=data['X'],
    propensities=data['pis'],
    outcome_model='elasticnet',
)
vdb.fit()
print(vdb.results().to_markdown())

Cross-fitting the outcome model.


  0%|          | 0/5 [00:00<?, ?it/s]

Estimating optimal dual variables.


  0%|          | 0/500 [00:00<?, ?it/s]

|            |    Lower |     Upper |
|:-----------|---------:|----------:|
| Estimate   | 8.37082  | 12.8302   |
| SE         | 0.746638 |  0.818289 |
| Conf. Int. | 6.90744  | 14.434    |


## Lee Bounds under monotonicity

Lee bounds are a method to bound the average treatment effect in the face of post-treatment nonrandom sample selection, named in honor of  [Lee (2009)](https://www.jstor.org/stable/40247633). Precisely, we assume we observe the following data:

- Pre-treatment covariates $X_i \in \mathcal{X}$

- A binary treatment $W_i \in \{0,1\}$
 
- A post-treatment selection indicator $S_i \in \{0,1\}$.
 
- An outcome $Y_i \in \mathbb{R}$.

Note that both $Y_i$ and $S_i$ have potential outcomes $(Y_i(0), Y_i(1))$ and $(S_i(0), S_i(1))$ since both potentially depend on the treatment.

A classic example is a setting where $W_i$ denotes enrollment in a job training program, $S_i$ denotes whether a subject entered the labor market, and the outcome $Y_i$ denotes wages. A natural estimand in these settings is the average treatment effect for subjects who would have entered the labor market no matter their treatment status; e.g., 

$$E[Y(1) - Y(0) \mid S(1) = S(0) = 1]. $$

Dual bounds can be used to bound this partially identified estimand under the **monotonicity** assumption that $S(1) \ge S(0)$ a.s., as shown below.

In [4]:
# create data
lee_data = db.gen_data.gen_lee_bound_data(n=900, p=30, sample_seed=123)

# fit lee bounds
ldb = db.lee.LeeDualBounds(
    # data
    selections=lee_data['S'], 
    covariates=lee_data['X'], 
    treatment=lee_data['W'],
    propensities=lee_data['pis'], 
    outcome=lee_data['y'],
    # Model specifications
    outcome_model='ridge',
    selection_model='monotone_logistic',
)
ldb.fit().summary()

Cross-fitting the selection model.


  0%|          | 0/5 [00:00<?, ?it/s]

Cross-fitting the outcome model.


  0%|          | 0/5 [00:00<?, ?it/s]

Estimating optimal dual variables.


  0%|          | 0/900 [00:00<?, ?it/s]

___________________Inference_____________________
               Lower     Upper
Estimate    2.220620  3.155182
SE          0.191205  0.193274
Conf. Int.  1.845866  3.533992

________________Selection model__________________
                            Model  No covariates
Out-of-sample R^2        0.172740       0.000000
Accuracy                 0.703333       0.594444
Likelihood (geom. mean)  0.557705       0.508488

_________________Outcome model___________________
                      Model  No covariates
Out-of-sample R^2  0.919644       0.000000
RMSE               1.057244       3.729639
MAE                0.846142       2.954590

________________Treatment model__________________
                            Model  No covariates
Out-of-sample R^2        0.001111       0.000000
Accuracy                 0.500000       0.516667
Likelihood (geom. mean)  0.500000       0.499721



It is also possible to bound this estimand without the monotonicity assumption using the generic ``DualBounds`` class, although we caution that without the monotonicity assumption, the bounds might be too wide to be useful. Please see [Ji et al. (2023)](https://arxiv.org/pdf/2310.08115.pdf), Section 2.5 for details.