# Basic definitions


When considering causal effect estimation methods, we are usually facing an experimental setting where we have collected a sample of $n$ instances $S = \{X_i; T_i; Y_i\} _n^i $ with all $d$ measured covariates $X_i \; \epsilon \; \mathbb{R}; X_i = (X_{i1}; ... ; X_{id})$ of a unit $i$, its treatment vector $T_i \; \epsilon \; \{0; 1\}^k$ indicating which of the $k$ treatments was given and its one observed outcome $Y_i \; \epsilon  \; \mathbb{R}$ 

The treatment variable $T_i$ defines what treatment has been applied to a given unit $i$. 

This section is heavily based on Franz (2019) **"A Systematic Review of Machine Learning Estimators for Causal Effect"**, [here](https://justcause.readthedocs.io/en/latest/_downloads/e054f7a0fc9cf9e680173600cb4b4350/thesis-mfranz.pdf). 






<img src="http://drive.google.com/uc?export=view&id=16B2lbld6ABBvHYRucHhnZN1mOnDngcE0" width=75%>

From Franz (2019) **"A Systematic Review of Machine Learning Estimators for Causal Effect"**, [here](https://justcause.readthedocs.io/en/latest/_downloads/e054f7a0fc9cf9e680173600cb4b4350/thesis-mfranz.pdf). 

## 2 Treatment Effects

**Individual treatment effect (ITE)** is defined as: 

$$τ_i := Y_i(1) − Y_i(0)$$

Notice that this is the same as in the previouls notebook but with a different formulation.



**Average treatement effect (ATE)** is defined as: 

$$τ := \mathbb{E}[Y_i(1) − Y_i(0)]$$

and for a finite sample population of n units as 

$$τ := \dfrac{1}{n}   \sum_{1}^{i}  (Y_i(1) − Y_i(0))$$

Another formulation that is useful to consider the treatment effect of a subpopulation
(e.g. only females, only people of age 34, ...) is the Conditional Average Treatment
Effect (CATE)

**Conditional Average Treatment Effect (CATE)** of a subsample $\{(Y_i(1); Y_i(0); X_i; T_i)| X_i = x\}$ or the corresponding distribution $P(Y (0); Y (1); T|X = x)$ is defined as: 

$$ τ(x) := \mathbb{E}[τ|X = x] = \mathbb{E}[Y (1) − Y (0)|X = x]$$




There is a **difference** between **average treatment effects** and
**individual treatment effects** in the **presence of heterogenous treatement effects**. Treatment heterogeneity is the presence of **subgroups in the population** that **react differently to the same treatment**. For example, for some patients
a given drug might be more effective due to genetic effects. Especially in social-, political sciences and econometrics we can rarely assume treatment effect homogeneity.

Choosing to consider the **Average Treatment Effect (ATE)** amounts to **ignoring**
the presence of **varying subgroups**. Consider a drug that is wildly effective for some
individuals, but in general has small negative effects on the desired outcome (e.g.
health). Considering the ATE would yield a positive causal effect and would thus
result in the recommendation to prescribe the drug as often as possible. It becomes
clear, that for an individual medical recommendation it is reasonable to consider Individual Treatment Effect (ITE) or at least a Conditional Average Treatment Effect
(CATE) on a specific stratum.

## 3 Assignment Mechanisms

An essential factor in determining whether or not a treatment effect is tractable
given data is the assignment mechanism. The **assignment mechanism** is a function of the covariates $X_i$ and the potential outcomes $Y_i(1)$; $Y_i(0)$ and
determines the **probability of treatment**.


**Assignment Mechanism**: An assignment mechanism is a function
$P(T_i | X_i; Y_i(0); Y_i(1))$ mapping **features** and **potential outcomes** to the **probability of treatment** $[0; 1]$.


**Individualistic Assignment**: An assignment mechanism is individualistic if the **probability** that a **unit is assigned treatment** does **not depend** on the **covariates or potential outcomes of other units**.

**Probabilistic Assignment**: An assignment mechanism is probabilistic if the probability that a unit $i$ is assigned treatment is strictly between zero
and one:


$$ 0 < P(T_i | X_i; Y_i(0); Y_i(1)) < 1, for \ all \ i = 1, ... , n$$ 


**Unconfounded Assignment**: An assignment mechanism is unconfounded if it does **not depend** on the **potential outcomes**. That is to say, 

$$P(T_i |
X_i; Y_i(0); Y_i(1)) = P(T_i | X_i)$$

The outcomes are randomly assigned.

## 4 Randomised vs observational data

Randomised trials differ from non-randomised ones in two ways [Rosenbaum and Rubin (1983)](https://academic.oup.com/biomet/article/70/1/41/240879): 

- The **assignment mechanism** is **known**, because it is usually chosen by the researcher and the treatment assignment is probabilistic (defintion of probabalistic assignment above is met)


- The **treatment assignment** and the **potential outcomes** are **conditionally independent** given the covariates. Formally, $(Y (1); Y (0)) \perp T | X$. In other words, the assignment of treatment only depends on observed covariates. We say, treatment assignment is ignorable (unconfoundet assignment condition above is met).


**Additional, related definitions**:

**Conditional Independence:** Given random variables $X, Y, Z$ we
say $Y$ is independent of $X$ conditioned on $Z$ , written $X \perp Y | Z$ when
$$P(X = x; Y = y | Z = z) = P(X = x | Z = z) · P(Y = y | Z = z)$$   


or equivalently:

$$P(Y = y | X = x; Z = z) = P(Y = y | Z = z) \: or \: P(X | Z) = 1$$



**Strong Ignorability:** Treatment Assignment is strongly ignorable if


$$(Y (1); Y (0)) \perp T | X \: and \: 0 < P(T = 1 | X) < 1)$$

Under these assumptions treatment effects can be identified. That is to say, we can
use the presence of multiple units to make up for the missing information about the
unobserved potential outcome.

## 5 Traditional assumptions to identify causes

The task of estimating the ITE from data is often
preceded by the task of identification. **Identification** asserts **whether
a given counterfactual quantity (e.g. a treatment effect) can be calculated** given
only factual data.

In the simplified setting of the potential outcome framework, **assumptions** are **made**
to **enable identification**. These assumptions are reasonable in RCT, but remain questionable in observational settings:



**Stable Unit Treatment Value Assumption (SUTVA)**. The SUTVA
combines two aspect together, that make working with causal data easier. Namely,
- no interference between units 
- well defined treatment levels (i.e. there is no half-treatment or treatment with
decreased efficacy).
<br>
<br>
Formally, we can write 

$$Y_i = T_iY_i(1) + (1 − T_i)Y_i(0)$$

to capture these assumptions.


**Unconfoundedness**. Unconfoundedness is identical to ignorability: 


$$(Y (1); Y (0)) \perp T | X$$


In words, the potential outcomes are independent of the treatment given the covariates.


**Overlap**: The overlap assumption makes up the second part
of strong ignorability. It states that for all instances the probability of treatment
must be strictly between zero and one. Formally, 

$$0 < P(T = 1 | X) < 1$$

This is equivalent to the definiotion of **Probabilistic Assignment** above.











**When is the Unconfoundedness Assumption questionable?** Consider an example
where the data is collected in a hospital setting. We are considering the administration of a given treatment, but the treatment assignment to individuals is not random
(i.e. we are merely observing the operational processes in the hospital). Thus the
assignment of the treatment on a patient might depend on some insight or knowledge
the doctor has about the patient, which we do not observe in the covariate vector
$X_i$. In this case, the treatment assignment is confounded and our methods will not
work properly. If, however, we were to measure whatever feature the doctor was
considering for his decision, then the treatment would be unconfounded conditional
on the covariate vector.

**When is the Stable Unit Treatment Value Assumption questionable?** Consider for
example a setting in which the individuals in both treatment and control group are
in contact with each other. In this case the positive effect of treatment might spill
over to other individuals in the control group (e.g. the improved mood of the treated
affects the mood of the control group).

**When is the Overlap Assumption questionable?** In a case where there is a prohibitive factor for receiving a treatment, the overlap assumption is questionable.
For example, if pregnant women are not allowed to take a drug, the probability of
assignment is strictly zero and the overlap assumption does not hold. In such a case,
given the prohibitive factors are known, the data can be trimmed in order to enforce
the overlap assumption. That would mean we do not consider pregnant women at
all in our study.

## 6 Causal inference as a missing data problem


We can also see the problem of causal inference as one of missing data. [Li and Ding (2013)](https://arxiv.org/abs/1712.06170). discuss how the formalisation of
missing data problems closely match the causal inference notation and methods introduced in the Potential Outcomes framework. 

We are trying to reconstruct two distributions from incomplete
data. The goal is to know the distribution of Y (1) and Y (0) over all covariates
that describe a unit. 

The Fundamental Problem of Causal Inference now makes this problem essentially a missing data problem. For any individual, we only ever observe either $Y_i(1)$ or $Y_i(0)$, thus missing the respective counterpart. The treatment indicator in the treatment effect case tells us which of the two distributions we observe. 

Analogously it tells us, which data is missing. And the distribution of this
missingness is what we aim to constrain with our assumptions. Having **ignorable
treatment assignment** means that the distribution of the missingness does not depend on outcomes when we know all the covariates. Without the Unconfoundedness Assumptions, the missingness might be dependent on the value of the outcome such that higher outcomes are more likely to be missing.