# Mutual information and causal influence

In order to show the relation between causal influence and mutual information, consider two random variables $X$ and $Y$, where $X$ is said to be causally sufficient for $Y$ if there exists a $x \in X$ for which there is a change in $Y$. Please note that can only occur if there is a direct relation between $X$ and $Y$, i.e. $X$ and $Y$ form a Markov chain:

$$X \rightarrow Y.$$

$X$ and $Y$ have corresponding distributions $P(X)$ and $P(Y)$ respectively. We denote the causal influence of $X$ on $Y$ as

$$ C_{X \rightarrow Y} =  P( Y | \neg do(X)) - P(Y | do(X))$$ 
where $do(X)$ indicates an intervention on $X$ that changes $P(X)$ to $P'(X) = P(X | Z = z)$ for some external variable $Z$. Alternatively, one can interpret the do-operator as affecting the strength of the arrow; in the extreme case this would result in a removal of the arrow, i.e. $P(Y | do(X)) = P(Y)$.

**Theorem**. The causal influence in Markov chain $X \rightarrow Y$ is equal to the mutual information between $X$ and $Y$, i.e. $C_{X \rightarrow Y} = I(X;Y)$.

__Proof__: 
A change in the distribution of $X$ will have an effect on the distribution of $Y$. Therefore, we can quantify the causal effect of $X$ on $Y$ by how much the distribution of $Y$ differs for each $x \in X$. The difference between $P(Y | do(X)) = P(Y) $ and $P(Y | \neg do(X)) = P(Y | X)$ can be captured by the Kullback-Leiber divergence of $P(Y | X )$ and $P(Y)$


# Postulates for causal strength
Let G be a causal DAG with nodes $X_1, \cdots, X_n$ where $X_i \rightarrow X_j$ means that $X_i$ influences $X_j$ directly in the sense that intervening on $X_i$ changes the distribution of $X_j$ even if all the other variables are held constant (also by interventions). Let $PA_j$ denote the set of parent variables of $X_j$ in $G$, i.e. its direct causes. The joint probability factorizes into

$$ P(x_1, \cdots, x_n) = \prod_{j = 1} ^ n P(x_j | pa_j)$$

Causal strenght is supposed to measure the impact of an intervention that removes the respective arrows.

- P0: Causal Markov condition. If $C_S$ = 0, then the joint distribution satisfies the Markov condition with respect to the DAG $G_S$ obtained by removing the arrows in $S$. 

\begin{aligned}
D_{KL} ( P(Y | X = x )|| P(Y) ) &= E_{Y | X = x} \left[ \log \left( \frac{ P(Y|X = x) }{ P(Y) } \right) \right] \\
                               &= \sum_{y \in Y} P(Y | X = x) \log \left(\frac{P(Y|X = x)}{P(Y)}\right) 
\end{aligned}
for each $x \in X$. The expected value for $X$ would then be given as 

\begin{aligned}
C_{X \rightarrow Y} = E_X [D_{KL} ( P(Y | X = x) || P(Y) )] &= \sum_{x \in X} P(x) \sum_y P(y | x) \log \frac{P(y | x)}{P(y)} \\
                                      &= \sum_{x \in X} P(x) \sum_{y \in Y} P(y | x) \log P(y | x) - \sum_{y \in Y} \log P(y)  \underbrace{ \sum_{x \in X,} P(x) P(y | x)}_{P(y)} \\ 
                                      &= H(Y) - H(Y | X) \\
                                      &= I(X; Y) \blacksquare
\end{aligned}


# Inferred causation (Pearl)

Definition: A causal structure of a set of variables $V$ is a directed acyclic graph (DAG) in which each node corresponds to a distinc element of $V$, and each link represents a direct functional relationship among the corresponding variables. 

Definition: A causal model $M = < D, \Theta_D >$ consisiting of a causal structure $D$ and a set of parameters $\Theta_D$ compatible with $D$. The parametrs $\Theta_D$ assign a function $x_i = f_i (pa_i, u_i)$ to each $X_i \in V$ and a probability measure $P(u_i)$ to each $u_i$, where $PA_i$ are the parents  of $X_i$ in $D$ and where each $U_i$ is a random disturbance distrubted according to $P(u_i)$, independently of all other $u$. 

Interpretation: a causal model consists of a structure, the interactions between agents, and a rule on how the agents interact bases on nearest neighbor interactions. Causal structure is thus a physical relation between agents, i.e. there must be some form of energy exchange between agents. Additionally, it includes some form of arbritrary distrubances by $u$ parameter. In the case of no noise, the model reduces to a deterministic system.

Definition: A variable $X$ is said to have a causal influence on a variable $Y$ if a directed path from $X$ to $Y$ exists in every minimal structure consistent with the data.
