# Causal graphs

Anders Ledberg

In this notebook we will do some simple exercises to illustrate the concept of confounding and collider. 


## Confounding

Let's start with confounding. A variable Z is counfounding the association between X and Y, iff Z has a causal influence on both X and Y. Let's draw a graph:

In [None]:
import matplotlib.pyplot as plt
import networkx as nx
import numpy as np


# create a directed multi-graph
G = nx.MultiDiGraph()
##G.add_node(1,val="A")
G.add_edges_from([
    ("X", "Y"),
    ("Z", "X"),
    ("Z", "Y"),
])
plt.figure(figsize=(3,3))
## set the positions
mypos={'X':(0,0),'Z':(1,1),'Y':(2,0)}
layout = nx.spring_layout(G, weight='capacity', dim=2, k=20,
                          pos=mypos)
nx.draw(G, layout,with_labels=True,node_size=1000,font_size=20,node_color="lightgray",width=3)
plt.show()

Next we will generate data according to this model. In particular, we let the effect of Z be positive, but the effect of X on Y negative. 

In [None]:
## define short-hands for random variable functions
rnorm= np.random.default_rng().normal
runif= np.random.default_rng().uniform

## standard Normal distribution
def u(N=100):
    return(rnorm(size=N))

## discrete uniform distribution between -1 and 1
def uz(N=100):
    return(np.round(runif(-10.5,10.5,N))/10)

## number of samples
N=10000

## confounder
Z=uz(N)
## exposure
X=2*Z + u(N)
## outcome
Y=5*Z -0.6*X + u(N)

plt.plot(X,Y,'+')
plt.xlabel("Exposure (X)")
plt.ylabel("Outcome (Y)")
plt.show()

There is a quite strong positive association between X and Y. We can quantify this using the correlation coefficient.

In [None]:
print(np.corrcoef(X,Y))

Next we condition on Z and see how that affects the dependency between X and Y. That is, we keep Z constant, and look at the variation in X and Y. 

In [None]:
## conditioning means to hold constant
## hold z constant and see what happens
## to check what values z takes we can do this:
print(np.unique(Z))

## fix Z at a particular value
indx= Z==-0.2

## plot the corresponding vaules of X and Y
plt.plot(X[indx],Y[indx],'+')
plt.xlabel("Exposure (X)")
plt.ylabel("Outcome (Y)")
plt.show()
print(np.corrcoef(X[indx],Y[indx]))

When we hold Z fixed the negative dependency between X and Y become clear. The positive dependence was all due to confounding. Please verify that the negative dependence is there also for some other values of Z.



## Colliders

Next we consider the case of colliders. The corresponding causal graph looks like this:

In [None]:
# create a directed multi-graph
G = nx.MultiDiGraph()

G.add_edges_from([
    ("X", "Z"),
    ("Y", "Z"),
])
plt.figure(figsize=(3,3))
## set the positions
mypos={'X':(0,0),'Z':(1,1),'Y':(2,0)}
layout = nx.spring_layout(G, weight='capacity', dim=2, k=20,
                          pos=mypos)
nx.draw(G, layout,with_labels=True,node_size=1000,font_size=20,node_color="lightgray",width=3)
plt.show()

Now we generate data according to this model.

In [None]:
## exposure
X=u(N)
## outcome
Y=u(N)
## collider
Z=X+Y+uz(N)

plt.plot(X,Y,'+')
plt.xlabel("Exposure (X)")
plt.ylabel("Outcome (Y)")
plt.show()
print(np.corrcoef(X,Y))

Here there is no indication of an association (i.e. dependence) between X and Y. Let's check what happens if we condition on Z?

In [None]:
## restrict the variation of the collider Z
indx= (Z>-0.3) & (Z < -0.2)
plt.plot(X[indx],Y[indx],'+')
plt.xlabel("Exposure (X)")
plt.ylabel("Outcome (Y)")
plt.show()

print(np.corrcoef(X[indx],Y[indx]))

This time conditioning introduced a negative linear dependence between X and Y. Explain the reason for this.