# Causality Tutorial Exercises – R

Contributors: Rune Christiansen, Jonas Peters, Niklas Pfister, Sorawit Saengkyongam, Sebastian Weichwald.
The MIT License applies; copyright is with the authors.
Some exercises are adapted from "Elements of Causal Inference: Foundations and Learning Algorithms" by J. Peters, D. Janzing and B. Schölkopf.


# Exercise 1 – Structural Causal Model

Let's first draw a sample from an SCM

In [None]:
set.seed(1)

n <- 200
C <- rnorm(n)
A <- 0.8*rnorm(n)
K <- A + 0.1*rnorm(n)
X <- C - 2*A + 0.2*rnorm(n)
F <- 3*X + 0.8*rnorm(n)
D <- -2*X + 0.5*rnorm(n)
G <- D + 0.5*rnorm(n)
Y <- 2*K - D + 0.2*rnorm(n)
H <- 0.5*Y + 0.1*rnorm(n)

data.obs <- cbind(C, A, K, X, F, D, G, Y, H)

__a)__

What are the parents and children of $X$ in the above SCM ?

Take a pair of variables and think about whether you expect this pair to be dependent
(at this stage, you can only guess, later you will have tools to know). Check empirically.

__b)__

Generate a sample of size 300 from the interventional distribution $P_{\mathrm{do}(X=\mathcal{N}(2, 1))}$
and store the data matrix as `data.int`.

__c)__

Do you expect the marginal distribution of $Y$ to be different in both samples?

Double-click (or enter) to edit

__d)__

Do you expect the joint distribution of $(A, Y)$ to be different in both samples?


Double-click (or enter) to edit

__e)__

Check your answers to c) and d) empirically.

# Exercise 2 – Adjusting


![DAG](https://raw.githubusercontent.com/sweichwald/causality-tutorial-exercises/main/data/Exercise-ANM.png)

Suppose we are given a fixed DAG (like the one above).

a) What are valid adjustment sets (VAS) used for?

b) Assume we want to find a VAS for the causal effect from $X$ to $Y$.
What are general recipies (plural 😉) for constructing VASs (no proof)?
Which sets are VAS in the DAG above?

c) The following code samples from an SCM. Perform linear regressions using different VAS and compare the regression coefficient against the causal effect from $X$ to $Y$.


In [None]:
set.seed(1)

n <- 200
C <- rnorm(n)
A <- 0.8*rnorm(n)
K <- A + 1.1*rnorm(n)
X <- C - 2*A + 0.2*rnorm(n)
F <- 3*X + 0.8*rnorm(n)
D <- -2*X + 0.5*rnorm(n)
G <- D + 0.5*rnorm(n)
Y <- 2*K - D + 0.2*rnorm(n)
H <- 0.5*Y + 0.1*rnorm(n)

data.obs <- cbind(C, A, K, X, F, D, G, Y, H)

d) Why could it be interesting to have several options for choosing a VAS?

e) If you indeed have access to several VASs, what would you do?

# Exercise 3 – Independence-based Causal Structure Learning

__a)__

Assume $P^{X,Y,Z}$ is Markov and faithful wrt. $G$. Assume all (!) conditional independences are

$$
\newcommand{\indep}{{\,⫫\,}}
\newcommand{\dep}{\not{}\!\!\indep}
$$

$$X \dep Z \mid \emptyset$$

(plus symmetric statements). What is $G$?

__b)__

Assume $P^{W,X,Y,Z}$ is Markov and faithful wrt. $G$. Assume all (!) conditional independences are

$$\begin{aligned}
(Y,Z) &\indep W \mid \emptyset \\
W &\indep Y \mid (X,Z) \\
(X,W) &\indep Y | Z
\end{aligned}
$$

(plus symmetric statements). What is $G$?

# Exercise 4 – Additive Noise Models

Set-up required packages:

In [None]:
# set up – not needed when run on mybinder
# if needed (colab), change FALSE to TRUE and run cell
if (FALSE) {
  install.packages('dHSIC')
}

In [None]:
library(mgcv)
library(dHSIC)

Let's load and plot some real data set:

In [None]:
# Load some real data set
real.dat <- read.csv('https://raw.githubusercontent.com/sweichwald/causality-tutorial-exercises/main/data/Exercise-ANM.csv')
Y <- real.dat[, "Y"]
X <- real.dat[, "X"]

# Let us plot the data
par(mfrow=c(1,1))
plot(X, Y, pch = 19, cex = .8)

__a)__

Do you believed that $X \to Y$ or that $X \gets Y$? Why?

Double-click (or enter) to edit

$$
\newcommand{\indep}{{\,⫫\,}}
\newcommand{\dep}{\not{}\!\!\indep}
$$

__b)__
Let us now try to get a more statistical answer. We have heard that we cannot 
have  
$$Y = f(X) + N_Y,\ N_Y \indep X$$
and
$$X = g(Y) + N_X,\ N_X \indep Y$$
at the same time.

Given a data set over $(X,Y)$,
we now want to decide for one of the two models. 

Come up with a method to do so.

Hints: 
* `gam(B ∼ s(A))$residuals` provides residuals when regressing $B$ on $A$. 
* `dhsic.test` (with `method = "gamma"`) can be used as an independence test.

__c)__

Assume that the error terms are Gaussian with zero mean and variances 
$\sigma_X^2$ and $\sigma_Y^2$, respectively.
The maximum likelihood for DAG G is 
then proportional to 
$-\log(\mathrm{var}(R^G_X)) - \log(\mathrm{var}(R^G_Y))$,
where $R^G_X$ and $R^G_Y$ are the residuals obtained from regressing $X$ and $Y$ on 
their parents in $G$, respectively (no proof).

Find the maximum likelihood solution.

# Exercise 5 – Invariant Causal Prediction

__a)__

Generate some observational and interventional data:

In [None]:
# Generate n=1000 observations from the observational distribution
na <- 1000
Xa <- rnorm(na)
Ya <- 1.5*Xa + rnorm(na)

# Generate n=1000 observations from an interventional distribution
nb <- 1000
Xb <- rnorm(nb, 2, 1)
Yb <- 1.5*Xb + rnorm(nb)
red <- rgb(1,0,0,alpha=0.4)
blue <- rgb(0,0,1,alpha=0.4)

# plot Y vs X1
plot(Xa,Ya,pch=16,col=blue,xlim=range(c(Xa,Xb)),ylim=range(c(Ya,Yb)),xlab="X",ylab="Y")
points(Xb,Yb,pch=17,col=red)
legend("topright",c("observational","interventional"),pch=c(16,17),col=c(blue,red),inset=0.02)

Look at the above plot. Is the predictor $\{X\}$ an invariant set, that is (roughly speaking), does $Y \mid X = x$ have the same distribution in the red and blue data?

Double-click (or enter) to edit

__b)__

We now consider data over a response and three covariates $X1, X2$, and $X3$
and try to infer $\mathrm{pa}(Y)$. To do so, we need to find all sets for which this
invariance is satisfied.

In [None]:
data <- as.matrix(read.csv('https://raw.githubusercontent.com/sweichwald/causality-tutorial-exercises/main/data/Exercise-ICP.csv'))
pairs(data, col = c(rep(1,140), rep(2,80)))

# The code below plots the residuals versus fitted values for all sets of 
# predictors. 
# extract response and predictors
Y <- data[,1]
Xmat  <- data[,2:4]
S <- list( c(1), c(2), c(3), c(1,2), c(1,3), c(2,3), c(1,2,3))
resid <- fitted <- vector("list", length(S))
for(i in 1:length(S)){
  modelfit <- lm.fit(Xmat[,S[[i]],drop=FALSE], Y)
  resid[[i]] <- modelfit$residuals
  fitted[[i]] <- modelfit$fitted.values
}
env <- c(rep(0,140),rep(1,80))
par(mfrow=c(2,2))
red <- rgb(1,0,0,alpha=0.4)
blue <- rgb(0,0,1,alpha=0.4)
names <- c("X1", "X2", "X3", "X1, X2", "X1, X3", "X2, X3", "X1, X2, X3")
plot((1:length(Y))[env==0], Y[env==0], pch=16, col=blue, xlim=c(0,220), ylim=range(Y), xlab="index", ylab="Y", main="empty set")
points((1:length(Y))[env==1], Y[env==1], pch=17, col=red)
legend("topleft",c("observational","interventional"),pch=c(16,17),col=c(blue,red),inset=0.02)
for(i in 1:length(S)){
  plot(fitted[[i]][env==0], resid[[i]][env==0], pch=16, col=blue, xlim=range(fitted[[i]]), ylim=range(resid[[i]]), xlab="fitted values", ylab="residuals", main=names[i])
  points(fitted[[i]][env==1], resid[[i]][env==1], pch=17, col=red)
  legend("topleft",c("observational","interventional"),pch=c(16,17),col=c(blue,red),inset=0.02)
}


Which of the sets are invariant? (There are two plots with four scatter plots each.)

Double-click (or enter) to edit

__c)__
What is your best guess for $\mathrm{pa}(Y)$?

Double-click (or enter) to edit

__d) (optional, and R only)__

Use the function ICP to check your result.

In [None]:
# set up – not needed when run on mybinder
# if needed (colab), change FALSE to TRUE and run cell
if (FALSE) {
  install.packages('InvariantCausalPrediction')
}

In [None]:
library(InvariantCausalPrediction)