For a causal effect with do-operator, after converting it into the corresponding statistical estimand with the approach called identification
, the task of causal inference now becomes estimating the statistical estimand, the converted causal effect. Before diving into any specific estimation methods for causal effects, we briefly introduce the problem settings of the estimation of causal effects.
It is introduced in causal_model
that every causal structure has a corresponding DAG called causal graph. Furthermore, each child-parent family in a DAG G represents a deterministic function
Xi = Fi(pai, ηi), i = 1, …, n,
where pai are parents of xi in G and ηi are random disturbances representing exogeneous not present in the analysis. We call these functions Structural Equation Model related to the causal structures. For a set of variables W that satisfies the back-door criterion (see identification
), the causal effect of X on Y is given by the formula
P(y|do(x)) = ∑wP(y|x, w)P(w).
In such case, variables X for which the above equality is valid are also named "conditionally ignorable given W" in the potential outcome framework. The set of variables W satisfying this condition is called adjustment set. And in the language of structural equation model, these relations are encoded by
Our problems can be expressed with the structural equation model.
ATE
Specifically, one particular important causal quantity in YLearn is the difference
𝔼(Y|do(X = X1)) − 𝔼(Y|do(X = X0))
which is also called average treatment effect (ATE), where Y is called the outcome and X is called the treatment. Furthermore, when the conditional independence (conditional ignorability) holds given a set of variables W potentially having effects on both outcome Y and treatment :math:`X, the ATE can be evaluated as
E(Y|X = x1, w) − E(Y|X = x0, w).
Using structural equation model we can describe the above relation as
CATE
Suppose that we assign special roles to a subset of variables in the adjustment set W and name them as **covariates** :math:`V, then, in the structural equation model, the CATE (also called heterogeneous treatment effect) is defined by
Counterfactual
Besides casual estimands which are differences of effects, there is also a causal quantity counterfactual. For such quantity, we estimate the following causal estimand:
𝔼[Y|do(x), V = v].
YLearn implements several estimator models for the estimation of causal effects:
est_model/approx_bound est_model/meta est_model/dml est_model/dr est_model/causal_tree est_model/forest est_model/iv est_model/score
The evaluations of
𝔼[F2(x1, W, η) − F2(x0, W, η)]
in ATE and
𝔼[F2(x1, W, V, η) − F2(x0, W, V, η)]
in CATE will be the tasks of various suitable estimator models in YLearn. The concept EstimatorModel
in YLearn is designed for this purpose.
A typical EstimatorModel
should have the following structure:
class BaseEstModel:
"""
Base class for various estimator model.
Parameters
----------
random_state : int, default=2022
is_discrete_treatment : bool, default=False
Set this to True if the treatment is discrete.
is_discrete_outcome : bool, default=False
Set this to True if the outcome is discrete.
categories : str, optional, default='auto'
"""
def fit(
self,
data,
outcome,
treatment,
**kwargs,
):
"""Fit the estimator model.
Parameters
----------
data : pandas.DataFrame
The dataset used for training the model
outcome : str or list of str, optional
Names of the outcome variables
treatment : str or list of str
Names of the treatment variables
Returns
-------
instance of BaseEstModel
The fitted estimator model.
"""
def estimate(
self,
data=None,
quantity=None,
**kwargs
):
"""Estimate the causal effect.
Parameters
----------
data : pd.DataFrame, optional
The test data for the estimator to evaluate the causal effect, note
that the estimator directly evaluate all quantities in the training
data if data is None, by default None
quantity : str, optional
The possible values of quantity include:
'CATE' : the estimator will evaluate the CATE;
'ATE' : the estimator will evaluate the ATE;
None : the estimator will evaluate the ITE or CITE, by default None
Returns
-------
ndarray
The estimated causal effect with the type of the quantity.
"""
def effect_nji(self, data=None, *args, **kwargs):
"""Return causal effects for all possible values of treatments.
Parameters
----------
data : pd.DataFrame, optional
The test data for the estimator to evaluate the causal effect, note
that the estimator directly evaluate all quantities in the training
data if data is None, by default None
"""
Usage
One can apply any EstimatorModel
in the following procedure:
- For the data in the form of
pandas.DataFrame
, find the names of treatment, outcome, adjustment, and covariate. - Pass the data along with names of treatments, outcomes, adjustment set, and covariates into the
fit()
method ofEstimatorModel
and call it. - Call the
estimate()
method to use the fittedEstimatorModel
on test data.