Skip to content

Latest commit

 

History

History
206 lines (144 loc) · 7.02 KB

est.rst

File metadata and controls

206 lines (144 loc) · 7.02 KB

Estimator Model: Estimating the Causal Effects

For a causal effect with do-operator, after converting it into the corresponding statistical estimand with the approach called identification, the task of causal inference now becomes estimating the statistical estimand, the converted causal effect. Before diving into any specific estimation methods for causal effects, we briefly introduce the problem settings of the estimation of causal effects.

Problem Setting

It is introduced in causal_model that every causal structure has a corresponding DAG called causal graph. Furthermore, each child-parent family in a DAG G represents a deterministic function


Xi = Fi(pai, ηi), i = 1, …, n,

where pai are parents of xi in G and ηi are random disturbances representing exogeneous not present in the analysis. We call these functions Structural Equation Model related to the causal structures. For a set of variables W that satisfies the back-door criterion (see identification), the causal effect of X on Y is given by the formula


P(y|do(x)) = ∑wP(y|x, w)P(w).

In such case, variables X for which the above equality is valid are also named "conditionally ignorable given W" in the potential outcome framework. The set of variables W satisfying this condition is called adjustment set. And in the language of structural equation model, these relations are encoded by

$$\begin{aligned} X & = F_1 (W, \epsilon),\\\ Y & = F_2 (W, X, \eta). \end{aligned}$$

Our problems can be expressed with the structural equation model.

ATE

Specifically, one particular important causal quantity in YLearn is the difference


𝔼(Y|do(X = X1)) − 𝔼(Y|do(X = X0))

which is also called average treatment effect (ATE), where Y is called the outcome and X is called the treatment. Furthermore, when the conditional independence (conditional ignorability) holds given a set of variables W potentially having effects on both outcome Y and treatment :math:`X, the ATE can be evaluated as


E(Y|X = x1, w) − E(Y|X = x0, w).

Using structural equation model we can describe the above relation as

$$\begin{aligned} X & = F_1 (W, \epsilon) \\\ Y & = F_2 (X, W, \eta) \\\ \text{ATE} & = \mathbb{E}\left[ F_2(x_1, W, \eta) - F_2(x_0, W, \eta)\right]. \end{aligned}$$

CATE

Suppose that we assign special roles to a subset of variables in the adjustment set W and name them as **covariates** :math:`V, then, in the structural equation model, the CATE (also called heterogeneous treatment effect) is defined by

$$\begin{aligned} X & = F_1 (W, V, \epsilon) \\\ Y & = F_2 (X, W, V, \eta) \\\ \text{CATE} & = \mathbb{E}\left[ f_2(x_1, W, V, \eta) - f_2(x_0, W, V, \eta)| V =v\right]. \end{aligned}$$

Counterfactual

Besides casual estimands which are differences of effects, there is also a causal quantity counterfactual. For such quantity, we estimate the following causal estimand:


𝔼[Y|do(x), V = v].

Estimator Models

YLearn implements several estimator models for the estimation of causal effects:

est_model/approx_bound est_model/meta est_model/dml est_model/dr est_model/causal_tree est_model/forest est_model/iv est_model/score

The evaluations of


𝔼[F2(x1, W, η) − F2(x0, W, η)]

in ATE and


𝔼[F2(x1, W, V, η) − F2(x0, W, V, η)]

in CATE will be the tasks of various suitable estimator models in YLearn. The concept EstimatorModel in YLearn is designed for this purpose.

A typical EstimatorModel should have the following structure:

class BaseEstModel:
    """
    Base class for various estimator model.

    Parameters
    ----------
    random_state : int, default=2022
    is_discrete_treatment : bool, default=False
        Set this to True if the treatment is discrete.
    is_discrete_outcome : bool, default=False
        Set this to True if the outcome is discrete.            
    categories : str, optional, default='auto'

    """
    def fit(
        self,
        data,
        outcome,
        treatment,
        **kwargs,
    ):
        """Fit the estimator model.

        Parameters
        ----------
        data : pandas.DataFrame
            The dataset used for training the model

        outcome : str or list of str, optional
            Names of the outcome variables

        treatment : str or list of str
            Names of the treatment variables

        Returns
        -------
        instance of BaseEstModel
            The fitted estimator model.
        """

    def estimate(
        self,
        data=None,
        quantity=None,
        **kwargs
    ):
        """Estimate the causal effect.

        Parameters
        ----------
        data : pd.DataFrame, optional
            The test data for the estimator to evaluate the causal effect, note
            that the estimator directly evaluate all quantities in the training
            data if data is None, by default None

        quantity : str, optional
            The possible values of quantity include:
                'CATE' : the estimator will evaluate the CATE;
                'ATE' : the estimator will evaluate the ATE;
                None : the estimator will evaluate the ITE or CITE, by default None

        Returns
        -------
        ndarray
            The estimated causal effect with the type of the quantity.
        """

    def effect_nji(self, data=None, *args, **kwargs):
        """Return causal effects for all possible values of treatments.

        Parameters
        ----------
        data : pd.DataFrame, optional
            The test data for the estimator to evaluate the causal effect, note
            that the estimator directly evaluate all quantities in the training
            data if data is None, by default None
        """

Usage

One can apply any EstimatorModel in the following procedure:

  1. For the data in the form of pandas.DataFrame, find the names of treatment, outcome, adjustment, and covariate.
  2. Pass the data along with names of treatments, outcomes, adjustment set, and covariates into the fit() method of EstimatorModel and call it.
  3. Call the estimate() method to use the fitted EstimatorModel on test data.