# Support restrictions

## Main idea

``dualbounds`` also allows analysts to restrict the support of $Y(1), Y(0), X$ to yield sharper partial identification bounds. That said, restricting the support of $Y(1), Y(0), X$ is a real assumption---if the assumption is false, the final bounds will *not* be valid.

In particular, the user can provide a boolean-valued function $s(Y(1), Y(0), X) \in \{0,1\}$, where the support of $Y(1), Y(0), X$ is assumed to be $\{y_1, y_0, x : s(y_1, y_0, x) = 1\}$, i.e., the set of values such that $s$ evaluates to True. Below, we give some examples of support restrictions.

In [1]:
# No support restriction: this does not make any assumptions
s1 = lambda y0, y1, x: True
# Assume y0 <= y1 holds a.s. 
s2 = lambda y0, y1, x: y0 <= y1
# Assume y0 <= y1 whenever x[0] >= 0
s3 = lambda y0, y1, x: (y0 <= y1) | (x[0] < 0)

Passing this function to a ``DualBounds`` or ``DeltaDualBounds`` object using the ``support_restriction`` argument will yield bounds that incorporate this structural assumption. For example, below, we show how to bound the variance $\text{Var}(Y(1) - Y(0))$ under the assumption that $Y(0) \le Y(1)$.

**Note**: the correct argument to use is the ``support_restriction`` argument, **not** the ``support`` argument (which is used to specify the marginal support of $Y$).

In [2]:
# Import packages
import sys; sys.path.insert(0, "../../../")
import dualbounds as db
from dualbounds.varite import VarITEDualBounds

# Generate synthetic data from a linear model
data = db.gen_data.gen_regression_data(
    n=500, p=30, interactions=False, tau=1, sample_seed=123
)

# Common arguments
db_args = dict(
    outcome=data['y'],
    treatment=data['W'],
    covariates=data['X'], 
    propensities=data['pis'],
    how_transform='identity',
    eps_dist='gaussian',
)
# Fit assumption-free dual bounds
vdb = VarITEDualBounds(**db_args).fit(verbose=False)
# Fit dual bounds assuming Y(0) <= Y(1)
vdb_monotone = VarITEDualBounds(
    **db_args, 
    support_restriction=lambda y0, y1, x: y0 <= y1
).fit(verbose=True, ninterp=0, grid_size=0)

Cross-fitting the outcome model.


  0%|          | 0/5 [00:00<?, ?it/s]

Estimating optimal dual variables.


  0%|          | 0/500 [00:00<?, ?it/s]

In [3]:
print("The assumption-free results are:")
print(vdb.results().to_markdown())
print("The results assuming monotonicity are:")
print(vdb_monotone.results().to_markdown())

The assumption-free results are:
|            |     Lower |   Upper |
|:-----------|----------:|--------:|
| Estimate   | 0         | 4.29261 |
| SE         | 0.0179071 | 0.26683 |
| Conf. Int. | 0         | 4.81558 |
The results assuming monotonicity are:
|            |      Lower |    Upper |
|:-----------|-----------:|---------:|
| Estimate   | 0          | 2.5705   |
| SE         | 0.00829492 | 0.205357 |
| Conf. Int. | 0          | 2.97299  |


## Best practices and common problems

### Ensuring the outcome model is compatible

It is important that the estimated outcome model is compatible with any assumed support restriction. For example, consider the following scenario:

- You would like to compute bounds which incorporate the monotonicity assumption $Y(0) \le Y(1)$
  
- Your outcome model predicts that the conditional average treatment effect $E[Y(1) - Y(0) \mid X]$ is negative for certain $X$.

Here, the estimated outcome model is incompatible with the monotonicity assumption. Note that this can happen even when the monotonicity assumption $Y(0) \le Y(1)$ is accurate, e.g., because the outcome model has overfit. Mathematically, this will yield completely vacuous bounds (i.e. a bound from $-\infty$ to $\infty$).

Incompatible outcome models will not cause errors---instead, ``dualbounds`` will automatically try to force the incompatible outcome model to become compatible with the support restriction. However, this has two consequences:

- Computation speed: Forcing the outcome model to be compatible with the support restriction can be slow.
- Numerical instability: This procedure can also be numerically unstable, leading to large standard errors and loose bounds.

Thus, although it is not strictly necessary, the best solution is to ensure the outcome model is compatible with the support restriction. 

- For example, the sklearn HistGradientBoostingRegressor has an argument (``monotonic_cst``) which can be used to guarantee that $E[Y(1) - Y(0) \mid X] > 0$.

- For bespoke support restrictions, we suggest that analysts implement custom outcome models wrapping the ``dist_reg.DistReg`` class.

**Very important note**: If you think your outcome model should be compatible but you are still getting numerical errors, try setting ``ninterp=0`` and ``grid_size=0`` when calling the ``DualBounds.fit()`` method. These technical arguments (described in the documentation to ``DualBounds.compute_dual_variables()``) are used to ensure validity even when the data are very heavy tailed---however, they can sometimes cause a compatible outcome model to become incompatible.

### Large standard errors and numerical problems

If you cannot create a compatible outcome model, you may (or may not) have numerical problems and large standard errors. However, ``dualbounds`` has a few ways to address this problem:

1. Try setting ``ninterp=0`` and ``grid_size=0`` when calling the ``.fit()`` method.
2. Try increasing the value of ``nvals0`` and ``nvals1`` when calling the ``.fit()`` method.
3. Try changing the ``interp_fn`` input when calling ``.fit()``.
4. Try setting ``dual_strategy='se'`` when calling the ``.fit()`` method.
5. If the outcome variable is heavy-tailed, try transforming it to make it lighter tailed, e.g., by using a ``np.arcsinh`` transformation. This won't necessarily change the estimand since one can just undo this transformation when specifying the estimand in the ``DualBounds`` class.

## Summary

In sum, incorporating support restrictions can substantially sharpen partial identification bounds. However, for optimal statistical and computational performance, we recommend the following:

1. Try to ensure that the outcome model is compatible with the support restriction.
2. Always inspect the diagnostic results (via the ``.diagnostics()`` method) to see if there are major numerical problems. If so, see the section on "large standard errors and numerical problems."