# CBS Week 3 Tutorial: Causal reasoning
## Semester 2 2021


In [None]:
library(tidyverse)
library(nimble)
library(testthat)

This tutorial focuses on causal Bayesian networks, and will give you some practice in modeling inferences about interventions.

Bayes nets are often specified by thinking about causal relationships --- for example, the edge from Robbery to Alarm in last week's network was intended to capture the fact that a robbery causes the alarm to sound. When developing a Bayes net, we suggested that you should only include arrows that capture causal relationships. But the formalism of Bayes nets actually makes no causal assumptions, and there are valid Bayes nets where the arrows do not capture causal relationships.

Unlike regular Bayes nets, a  *causal* Bayes net has arrows that capture causal relationships. Because of this property a causal Bayes net supports inferences about interventions: for example, an inference about how the other variables in a network might change if we reach in and alter the value of one variable.

In this tutorial we'll work with two causal Bayes nets from Figure 6-6 of Hagmayer et al, [Causal reasoning through intervention]( https://www.ucl.ac.uk/lagnado-lab/publications/lagnado/intervention%20hagmayer%20et%20al.pdf ). Both networks specify causal relationships between three hormones,  and the level of each hormone is either normal (1) or elevated (2).   In the common cause model, elevated levels of Pixin (P = 2) cause elevated levels of both Sonin and Xanthan. In the chain model, elevated levels of Sonin (S = 2) cause elevated levels of Pixin (P = 2), which in turn cause elevated levels of Xanthan (X = 2). The probability distributions shown include probabilities like `P(P=2)` which is slightly confusing --- here the first `P` is the probability symbol and the second is the variable for Pixin.

<figure>
  <img src="images/commoncause_chain_models.png" alt="commoncause_chain_models" style="width:50%">
  <figcaption  class="figure-caption text-center">Figure 1: Two models specifying causal relationships between three hormones. (a) Common cause model (b) Chain model. This figure is a redrawn version of Fig 6-6 of Hagmayer et al, Causal reasoning through intervention.</figcaption>
</figure>

The two models were deliberately chosen to capture the same joint distribution over the three variables. 

### Exercise 1

Write down the joint distribution by hand.

In [None]:
# Fix the distribution below so that it reflects the joint distribution P(P,S,X) captured by both the Common Cause and Chain models. The column labelled P_P_S_X currently contains zeros but you should replace these with the correct probabilities.
joint <- tibble(P = c(1,1,1,1,2,2,2,2), 
                S = c(1,1,2,2,1,1,2,2), 
                X = c(1,2,1,2,1,2,1,2), 
                p_P_S_X = c(0,0,0,0,0,0,0,0) )

# YOUR CODE HERE
stop('No Answer Given!')

print(joint)

In [None]:
expect_equal(sum(joint$p_P_S_X),  1) 

Waldmann and Hagmayer (2005) carried out an experiment in which they asked participants to reason about the two causal models in Figure 1 using a scenario involving hormone levels of chimpanzees.  Participants first went through a training phase in which they learned either the Common Cause model or the Chain model. The training included a written description of the model---e.g. common-cause participants were told that an increased level of the hormone Pixin causes increases in the level of both Sonin and Xanthan. The training also included observations of the hormone levels of 20 chimpanzees, which allowed participants to estimate the parameters of the causal networks. For example, half of these 20 chimpanzees had elevated levels of Pixin, allowing participants to figure out that $P(P=2) = 0.5$.

Participants were then asked to reason about a new set of 20 previously unseen chimpanzees. They were asked about both hypothetical *observations* and hypothetical *interventions*. The observation questions asked people to imagine that Sonin had been observed to be either elevated or normal in each of the 20 new chimpanzees, and to estimate the number of these chimpanzees that would have elevated levels of Xanthan. In terms of our models, these two estimates correspond to the probabilities $P(X=2|S=2)$ and  $P(X=2|S=1)$.  The intervention questions were similar but asked people to imagine that the Sonin levels of all chimpanzees had been determined by an injection (ie an intervention) instead of just being observed. The corresponding two probabilities are $P(X=2|do(S=2))$ and  $P(X=2|do(S=1))$ where we've used Pearl's $\text{do}(\cdot)$ operator to indicate that the value of S is set by an intervention instead of merely being observed. 

The light grey bars in the figure below show average human inferences, and the dark grey bars show model predictions. 


<figure>
  <img src="images/model_human_inferences.png" alt="model_human_inferences" style="width:50%">
  <figcaption  class="figure-caption text-center">Figure 2: Results of experiment carried out by Waldmann and Hagmayer (2005). The y axis shows the number of animals out of a set of 20 that were estimated to have elevated levels of Xanthan. In the Observation condition, the two groups of bars show results when Sonin is observed to be either elevated or normal. In the Intervention condition, the two groups of bars show results when the animals were given injections that either made the Sonin level elevated or normal. This figure is taken from Fig 6-7 of Hagmayer et al, Causal reasoning through intervention.</figcaption>
</figure>

### Exercise 2

Think about how you would have responded if you were a participant. For the common cause model, participants gave a higher estimate of $P(X=2|S=2)$ than  $P(X=2|do(S=2))$. Would you have done the same? Why or why not?  For the chain model, participants indicated that  $P(X=2|S=2)$ and $P(X=2|do(S=2))$ were roughly the same. Would you have made the same inference? Why?


YOUR ANSWER HERE

## Observation questions

We'll try to replicate the model predictions using NIMBLE. First let's compute predictions for the observation questions. We'll start with the common cause model.

In [None]:
commoncause_code <- nimbleCode({
  # dcat specifies a discrete categorical distribution
  p ~ dcat(P_cpd[1:2])
  s ~ dcat(S_cpd[p,1:2])
  x ~ dcat(X_cpd[p,1:2])
})

commoncause_data <- list(
  P_cpd = c(0.5, 0.5), 
  S_cpd =  array(c(0.9, 0.1, 0.1, 0.9), dim = c(2,2)),
  X_cpd =  array(c(0.9, 0.1, 0.1, 0.9), dim = c(2,2))
)

Compute $P(X=2|S=1)$:



In [None]:
commoncause_data$s= 1
samples <- nimbleMCMC(
  code = commoncause_code,
  data = commoncause_data,
  monitors =  c("p", "s", "x"),
  inits = list(p=1, x=1),
)    

# function for turning a bag of samples into a sample-based posterior on x
x_posterior <- function( samples ) {
  ps <- samples %>% 
    as_tibble() %>% 
    group_by(x) %>% 
    summarize(count = n(), .groups = "drop") %>%  
    mutate(prob = count/sum(count))               
  return(ps)
}

p_x2_given_s1_cc <- x_posterior(samples)$prob[2]
print(paste0('For the common cause model, P(X=2|S=1) ≈ ', as.character(p_x2_given_s1_cc)))

And now compute $P(X=2|S=2)$:



In [None]:
commoncause_data$s= 2
samples <- nimbleMCMC(
  code = commoncause_code,
  data = commoncause_data,
  monitors =  c("p", "s", "x"),
  inits = list(p=1, x=1),
)    

p_x2_given_s2_cc <- x_posterior(samples)$prob[2]
print(paste0('For the common cause model, P(X=2|S=2) ≈ ', as.character(p_x2_given_s2_cc)))

### Exercise 3 

Use NIMBLE to compute $P(X=2|S=1)$ and  $P(X=2|S=2)$ according to the chain model in Figure 1.


In [None]:
# Redefine these two variables
p_x2_given_s1_chain <- 0
p_x2_given_s2_chain <- 0

# YOUR CODE HERE
stop('No Answer Given!')

print(paste0('For the chain model, P(X=2|S=1) ≈ ', as.character(p_x2_given_s1_chain)))
print(paste0('For the chain model, P(X=2|S=2) ≈ ', as.character(p_x2_given_s2_chain)))

In [None]:
expect_lt(p_x2_given_s1_chain, 1)
expect_gt(p_x2_given_s1_chain, 0)
expect_lt(p_x2_given_s2_chain, 1)
expect_gt(p_x2_given_s2_chain, 0)

## Intervention questions

Now compute model predictions for the intervention questions. The probabilities to estimate are 
$P(X=2|\text{do}(S=1))$ and  $P(X=2|\text{do}(S=2))$, where we've used Pearl's  $\text{do}(\cdot)$ operator to indicate that the values of $S$ are set by an intervention instead of simply observed.

If a package like NIMBLE supported interventions we could reuse our `nimbleCode()` specifications of the two models and include a `data` argument formulated using the `do()` operator. For example, something like:


```R
commoncause_intervention_data <- list(
  s = do(1),
  P_cpd = c(0.5, 0.5), 
  S_cpd =  array(c(0.9, 0.1, 0.1, 0.9), dim = c(2,2)),
  X_cpd =  array(c(0.9, 0.1, 0.1, 0.9), dim = c(2,2))
)
```

In reality, NIMBLE doesn't support the `do()` operator, so we'll handle interventions by transforming the original network into a *manipulated* network that captures the intervention.  Recall that graph manipulation involves cutting all arrows that lead into the node that is the target of the intervention, and adjusting the CPD of this node to reflect that its value is set by external means. The intervention `do(S=2)` produces the following manipulated networks:

<figure>
  <img src="images/commoncause_chain_intervention.png" alt="commoncause_chain_intervention" style="width:50%">
  <figcaption  class="figure-caption text-center">Figure 3: The models from Figure 1 have been manipulated to reflect an intervention (symbolized here by a hammer) that fixes the level of Sonin to 2.</figcaption>
</figure>

### Exercise 4

Use NIMBLE to compute $P(X=2|do(S=1))$ and  $P(X=2|do(S=2))$ according to the common cause model. You'll need to define a new model that matches Figure 3a.

In [None]:
# Redefine these two variables
p_x2_given_do_s1_cc<- 0
p_x2_given_do_s2_cc<- 0

# YOUR CODE HERE
stop('No Answer Given!')

print(paste0('For the common cause model, P(X=2|do(S=1)) ≈ ', as.character(p_x2_given_do_s1_cc)))
print(paste0('For the common cause model, P(X=2|do(S=2)) ≈ ', as.character(p_x2_given_do_s2_cc)))

In [None]:
expect_lt(p_x2_given_do_s1_cc, 1)
expect_gt(p_x2_given_do_s1_cc, 0)
expect_lt(p_x2_given_do_s2_cc, 1)
expect_gt(p_x2_given_do_s2_cc, 0)

### Exercise 5

Use NIMBLE to compute $P(X=2|do(S=1))$ and  $P(X=2|do(S=2))$ according to the chain model. 


In [None]:
# Redefine these two variables
p_x2_given_do_s1_chain <- 0
p_x2_given_do_s2_chain <- 0

# YOUR CODE HERE
stop('No Answer Given!')

print(paste0('For the common cause model, P(X=2|do(S=1)) ≈ ', as.character(p_x2_given_do_s1_chain)))
print(paste0('For the common cause model, P(X=2|do(S=2)) ≈ ', as.character(p_x2_given_do_s2_chain)))

In [None]:
expect_lt(p_x2_given_do_s1_chain, 1)
expect_gt(p_x2_given_do_s1_chain, 0)
expect_lt(p_x2_given_do_s2_chain, 1)
expect_gt(p_x2_given_do_s2_chain, 0)

## Summary of model predictions

Let's gather all of the NIMBLE estimates in an order that matches Figure 2.


In [None]:
inference_order =  c("cc_i_i", "cc_i_l", "cc_o_i", "cc_o_l", "chn_i_i", "chn_i_l", "chn_o_i",  "chn_o_l")

nimblepreds <- tibble(cc_i_i=p_x2_given_do_s2_cc,  
                      cc_i_l=p_x2_given_do_s1_cc,  
                      cc_o_i=p_x2_given_s2_cc,  
                      cc_o_l=p_x2_given_s1_cc,  
                      chn_i_i=p_x2_given_do_s2_chain,  
                      chn_i_l=p_x2_given_do_s1_chain,  
                      chn_o_i=p_x2_given_s2_chain,  
                      chn_o_l=p_x2_given_s1_chain) %>% 
                gather() %>% 
                mutate(inference= factor(key, levels=inference_order), probability=value) %>% 
                select(inference, probability)

nimblepredplot <- nimblepreds %>% 
  ggplot(aes(x=inference, y = probability)) +
  geom_col() +
  ylab("model prediction")

print(nimblepredplot)

The order of the bars from left to right matches the order in Figure 2. For example, `cc_i_i` is short for common cause/intervention/increasing and `cc_i_l` is short for common cause/intervention/lowering. 

Check that the model predictions line up with the model predictions in Figure 2. If not, we've done something wrong!

## Optional Exercises (if time permits)

1. Is it surprising that two different Bayes nets can specify the same joint distribution? If you show me any Bayes net can I always give you a different Bayes net that captures the same joint distribution?


YOUR ANSWER HERE

2. In the week 3 lecture we talked about causal structure learning, or learning the causal relationships that hold between a set of variables.  Imagine that you do not know the causal relationships between Pixin, Sonin and Xanthan. To attempt to figure this out you measure hormone levels from a large number of chimpanzees. For example, you might discover that the first chimp has elevated levels of all three hormones, that the second has normal levels of Sonin but elevated levels of Pixin and Xanthan, and so on. Will taking measurements in this way allow you to figure out the causal relationships between the three hormones? Why or why not?


YOUR ANSWER HERE

3. If your answer to the previous question is no, how might you figure out the causal relationships between the hormones? 


YOUR ANSWER HERE