# CBS Week 2 Tutorial: Bayesian Networks
## Semester 2, 2021

In [1]:
library(tidyverse)
library(nimble)
library(testthat)

── [1mAttaching packages[22m ─────────────────────────────────────── tidyverse 1.3.0 ──

[32m✔[39m [34mggplot2[39m 3.3.2     [32m✔[39m [34mpurrr  [39m 0.3.4
[32m✔[39m [34mtibble [39m 3.0.6     [32m✔[39m [34mdplyr  [39m 1.0.2
[32m✔[39m [34mtidyr  [39m 1.1.1     [32m✔[39m [34mstringr[39m 1.4.0
[32m✔[39m [34mreadr  [39m 1.3.1     [32m✔[39m [34mforcats[39m 0.5.0

── [1mConflicts[22m ────────────────────────────────────────── tidyverse_conflicts() ──
[31m✖[39m [34mdplyr[39m::[32mfilter()[39m masks [34mstats[39m::filter()
[31m✖[39m [34mdplyr[39m::[32mlag()[39m    masks [34mstats[39m::lag()

nimble version 0.11.1 is loaded.
For more information on NIMBLE and a User Manual,
please visit http://R-nimble.org.


Attaching package: ‘nimble’


The following object is masked from ‘package:stats’:

    simulate



Attaching package: ‘testthat’


The following object is masked from ‘package:dplyr’:

    matches


The following object is masked from ‘

This tutorial notebook is to support the in-class activities led by your tutor. Unlike the assessment notebooks, this tutorial notebook will not be manually graded, so please make use of the tutorial time to discuss the material with your tutor. After the tutorial you'll be able to consult a complete version of the notebook (including solutions), and you'll also be able to submit your notebook and receive feedback generated by the automated grader. For all tutorials, we ask you not to look at the solution notebook until after your tutorial.

This tutorial focuses on Bayesian networks, also known as Bayes nets or directed graphical models. Bayes nets provide a natural way to capture and compute with joint probability distributions. In general, a joint distribution defined over $n$ binary variables requires $2^n -1$ numbers to specify. A Bayes net can allow this distribution to be specified using many fewer numbers. The savings arise because a Bayes net allow a high-dimensional joint distribution to be expressed as a product of lower-dimensional distributions, each of which captures a modular piece of a situation.

Here we'll work with a variant of an example introduced by Judea Pearl. Pearl lives in Los Angeles, and suppose that he's just had a new alarm installed. The alarm reliably detects robbers, but also occasionally malfunctions and sounds for no apparent reason. 

A Bayes net for this example is shown below. On any given day, we've assumed that the probability of a robbery occurring is low (0.05). This number will make things simple but is too high to be realistic --- even in LA burglaries don't occur once per twenty days! 

To adopt a convention we'll use for the next few weeks, we'll use 1 for FALSE and 2 (which rhymes with TRUE) for TRUE. So A = 2 means that the alarm sounds and A = 1 means that the alarm does not sound.


<figure>
  <img src="images/alarm_2node.png" alt="alarm_2node" style="width:50%">
  <figcaption  class="figure-caption text-center">Figure 1: Bayes net specifying the relationship between the occurrence of a Robbery and the sounding of the Alarm.</figcaption>
</figure>


Whenever you develop a Bayes net you need to think carefully about the numbers that go into the conditional probability distributions (or CPDs for short). Here we've used a noisy-OR function with a background cause. We've assumed that $w_R = 0.94$, where $w_R$ is the "causal strength" of the relationship between robbery and the alarm sounding. This means that if no other causes of the alarm sounding are present (i.e. the alarm has not malfunctioned), then the alarm sounds with probability 0.94 if there is a robbery.

We've also assumed that the alarm may sound because of some Background cause other than robbery -- e.g. because of malfunction. The Background cause is present with probability $b = 0.01$ and has causal strength of 1 (which means that the alarm definitely sounds when the Background cause is present)

To make the meaning of $w_R$ and $b$ clear we can explicitly introduce a variable in the network for the Background cause: 


<figure>
  <img src="images/alarm_2node_background.png" alt="alarm_2node_background" style="width:50%">
  <figcaption  class="figure-caption text-center">
       Figure 2: Figure 1 extended with a catch-all node that represents all Background causes of the alarm sounding (including malfunction).
 </figcaption>
</figure>

When $b =0.01$ and $w_R = 0.94$, the networks in Figures 1 and 2 are equivalent in the sense that they capture the same joint distribution $P(R,A)$ over the Robbery and Alarm variables.

### Exercise 1

Use Figure 2 to show that $P(A = 2 | R = 2) = 0.9406$ when $b = 0.01$ and $w_R = 0.94$.

=== BEGIN MARK SCHEME ===

We'll show every step here, but once you get comfortable working with probability distributions you probably won't need to write down every step.

\begin{align}
P(A=2 |R = 2) &= \sum_B P(A = 2, B | R = 2)  & \text{(marginalization)}\\
              &= P(A = 2, B = 1 | R = 2) + P(A = 2, B = 2|R = 2) & \\
              &= P(A = 2 |B = 1, R = 2)P(B=1|R=2) + P(A = 2|B = 2,R = 2)P(B=2|R=2) & (\text{chain rule} \\
              &= P(A = 2 |B = 1, R = 2)P(B=1) + P(A = 2|B = 2,R = 2)P(B=2) & (\text{because B and R are independent}) \\
              &= w_R \times (1-b) + 1 \times b & \\
              &= 0.94 * 0.99 + 0.01 & \\
              &= 0.9406
\end{align}


=== END MARK SCHEME ===

Even though the Background variable is often left implicit, thinking about background rates and causal strengths is a good way to figure out what numbers should go into a conditional probability distribution. For example, I came up with the CPD in Figure 1 by first figuring out what values for $b$ and $w_R$ might be reasonable and then using these numbers to compute $P(A|R)$.

Now that we've used a Bayes net to specify the joint distribution $P(a,r)$, we can use this distribution to answer queries about the variables in the network.

### Exercise 2

Given that the alarm sounds ($A = 2$), what is the probability that a robbery has occurred? Answer this question by computing $P(R = 2 | A = 2)$ by hand.

=== BEGIN MARK SCHEME ===

There are different ways to compute this conditional probability. The approach presented in the first week of classes would involve creating a table that specifies the entire joint distribution $P(R,A)$, then using this table to compute $P(R|A)$. In principle, enumerating the joint distribution will always work, although in practice the table may be so big that it is unwieldy or impossible to write down.

Here we use a more direct approach.

\begin{align}
P(R=2|A=2) &= \frac{ P(A=2|R=2)P(R=2) }{P(A=2)} & \text{(Bayes rule)}\\
           &= \frac{ 0.9406 \times 0.05}{P(A=2)} & \\
           &= \frac{0.04703}{P(A=2)} 
\end{align}
and
\begin{align}
P(A=2) &= \sum_R{ P(A=2,R) } & \text{(marginalization)}\\
       &=  P(A=2,R=1) + P(A=2,R=2) & \\
       &=  P(A=2|R=1)P(R=1) + P(A=2|R=2)P(R=2) & \text{(chain rule)} \\
       &=  0.01 \times (1 - 0.05) + 0.9406 \times 0.05 & \\
       &= 0.05653 & 
\end{align}
       
so
\begin{equation}
P(R=2|A=2) = \frac{0.04703}{0.05653} \approx 0.8319
\end{equation}

=== END MARK SCHEME ===

## Inference by Sampling with NIMBLE

In addition to providing a convenient way to specify a joint distribution, Bayes nets support efficient algorithms for computing with joint distributions. We won't cover these algorithms in this class, but will instead use a package called [NIMBLE]( https://r-nimble.org/manuals/NimbleUserManual.pdf ) to carry out inference by sampling. For today we won't worry about what NIMBLE is doing under the hood --- the goal is just to learn how to use the package.

NIMBLE can be a bit tricky to use and the error messages it produces aren't always very informative. So don't worry if it takes you a while to get comfortable using it, and as always please ask for help when you get stuck. When you're using NIMBLE for real, you'd probably want to specify parameters including the number of samples that you'd like to draw. We won't worry about that today -- we'll just use package defaults to keep things simple.

To get started with NIMBLE the first step is to write down a specification of the probabilistic model that you'd like to use. NIMBLE can be used to reason about very complex models but the Bayes net in Figure 1 is very simple.

In [2]:
alarm_code <- nimbleCode({
  # dcat specifies a discrete categorical distribution
  r ~ dcat(R_cpd[1:2])
  a ~ dcat(A_cpd[r,1:2])
})


`R_cpd` and `A_cpd` (which we haven't yet defined) correspond to the CPDs for nodes R and A in Figure 1.  Let's define them now.

In [3]:
# use the background probability b and the causal strength w_R described earlier
b <- 0.01
w_R <- 0.94

alarm_data <- list(
  R_cpd = c(0.95, 0.05), 
  A_cpd =  array(c(1-b,(1-w_R)*(1-b),b,1 - (1-w_R)*(1-b)), dim = c(2,2))
)

print(alarm_data)

$R_cpd
[1] 0.95 0.05

$A_cpd
       [,1]   [,2]
[1,] 0.9900 0.0100
[2,] 0.0594 0.9406



Remember that 2 stands for TRUE --- so `R_cpd[2]` is the the probability that a robbery occurs. When you specify CPDs for nodes with two or more parents you need to be careful about the order in which the entries are listed when defining the array. Here `A_cpd[2,1]` specifies $P(A=1|R=2)$, and `A_cpd[2,2]` specifies $P(A=2|R=2)$.

Now that we've defined the CPDs, let's go back to the code for the model.    The line ` r ~ dcat(R_cpd)` indicates that variable r is drawn from a discrete categorical distribution with parameters specified by `R_cpd`.   The following line `a ~ dcat(A_cpd[r,1:2])` indicates that a is drawn from the distribution in the row of `A_cpd` that corresponds to the value of variable r.

Now let's actually run the model. We'll tell NIMBLE that we'd like to monitor (ie store samples of) variables `r` and `a`, and we'll initialize both variables to 1.

In [4]:
alarm_monitors <- c("r", "a")
alarm_inits <- list(r=1, a=1)

We can now use NIMBLE to sample from the joint distribution on `r` and `a` given the data that we've provided. Because `alarm_data` includes the CPDs only, sampling in this way is equivalent to sampling from the joint distribution $P(a,r)$.

In [5]:
# Sample from the joint distribution P(a,r)
alarm_samples <- nimbleMCMC(
  code = alarm_code,
  data = alarm_data,
  inits = alarm_inits,
  monitors = alarm_monitors
)    

alarm_samples <- as_tibble(alarm_samples)
head(alarm_samples)


defining model...

building model...

setting data and initial values...

running calculate on model (any error reports that follow may simply reflect missing values in model variables) ... 


checking model sizes and dimensions...


checking model calculations...

model building finished.

compiling... this may take a minute. Use 'showCompilerOutput = TRUE' to see C++ compilation details.

compilation finished.

running chain 1...



|-------------|-------------|-------------|-------------|
|-------------------------------------------------------|


a,r
<dbl>,<dbl>
2,2
1,1
1,1
1,1
1,1
1,1


We can now use our bag of samples to answer queries about the variables `r` and `a`. Let's compute the marginal probability $P(R=2)$: 

In [6]:
r_marginal <- alarm_samples %>% 
  group_by(r) %>% 
  summarize(count = n(), .groups = "drop") %>%  # first compute counts for each value of r 
  mutate(prob = count/sum(count))               # divide by total counts to yield probabilities

print(paste0('According to the sample, P(R=2) ≈ ', as.character(r_marginal$prob[2])))

[1] "According to the sample, P(R=2) ≈ 0.0511"


Checking Figure 1 tells us that $P(R=2) = 0.05$, and you probably got a slightly different estimate based on your bag of samples. Each time you run the sampler you'll get a different estimate, but they should all be close to the true value.

Our first run of NIMBLE sampled from the joint distribution $P(r,a)$, but we can also sample the value of a set of variables conditioned on observing the values of some other set of variables. Let's sample from the posterior distribution $P(R|A=2)$. The only change we need to make is to add the observation $A=1$ to `alarm_data`:

In [7]:
alarm_data <- list(
  a = 2,
  R_cpd = c(0.95, 0.05), 
  A_cpd =  array(c(1-b,(1-w_R)*(1-b),b,1 - (1-w_R)*(1-b)), dim = c(2,2))
)

We'll run NIMBLE again, and this time we'll monitor `r` only

In [8]:
alarm_samples <- nimbleMCMC(
  code = alarm_code,
  data = alarm_data,
  inits = list(r=1),
  monitors = c('r')
)    
alarm_samples <- as_tibble(alarm_samples)

defining model...

building model...

setting data and initial values...

running calculate on model (any error reports that follow may simply reflect missing values in model variables) ... 


checking model sizes and dimensions...


checking model calculations...

model building finished.

compiling... this may take a minute. Use 'showCompilerOutput = TRUE' to see C++ compilation details.

compilation finished.

running chain 1...



|-------------|-------------|-------------|-------------|
|-------------------------------------------------------|


Now use the sample to estimate $P(R=2|A=2)$:

In [9]:
r_given_a2 <- alarm_samples %>% 
  group_by(r) %>% 
  summarize(count = n(), .groups = "drop") %>%  # first compute counts for each value of r 
  mutate(prob = count/sum(count))               # divide by total counts to yield probabilities

print(paste0('According to the sample, P(R=2|A=2) ≈ ', as.character(r_given_a2$prob[2])))

[1] "According to the sample, P(R=2|A=2) ≈ 0.8301"


Compare the sample-based estimate of $P(R=2|A=2)$ to the value you computed earlier by hand. The sample-based estimate should be close to the true value.


## Exercise 3

Use NIMBLE to estimate $P(A=2|R=2)$, and compare the answer you get to the true value.

In [10]:
p_a2_given_r2 <- 0

# replace p_a2_given_r2 with an estimate derived from a bag of samples generated using NIMBLE

### BEGIN SOLUTION

alarm_data["a"] <- NULL # remove the observation that a = 2 
alarm_data["r"] <- 2    # add the observation that r = 2. The remaining elements of alarm_data stay the same as before

alarm_samples <- nimbleMCMC(
  code = alarm_code,
  data = alarm_data,
  inits = list(a=1),
  monitors = c('a')
)    
alarm_samples <- as_tibble(alarm_samples)

a_given_r2 <- alarm_samples %>% 
  group_by(a) %>% 
  summarize(count = n(), .groups = "drop") %>%  # first compute counts for each value of a
  mutate(prob = count/sum(count))               # divide by total counts to yield probabilities

p_a2_given_r2 <- a_given_r2$prob[2]
print(paste0('According to the sample, P(A=2|R=2) ≈ ', as.character(p_a2_given_r2)))

### END SOLUTION

defining model...

building model...

setting data and initial values...

running calculate on model (any error reports that follow may simply reflect missing values in model variables) ... 


checking model sizes and dimensions...


checking model calculations...

model building finished.

compiling... this may take a minute. Use 'showCompilerOutput = TRUE' to see C++ compilation details.

compilation finished.

running chain 1...



|-------------|-------------|-------------|-------------|
|-------------------------------------------------------|
[1] "According to the sample, P(A=2|R=2) ≈ 0.9435"


In [11]:
expect_lt(p_a2_given_r2, 0.96)
expect_gt(p_a2_given_r2, 0.92)

# Optional material (if time permits)

Let's extend the network to allow for the fact that the alarm can be triggered by earthquakes. We'll assume that the probability of an earthquake occurring is 0.1. We'll continue to assume that the causes of the Alarm variable combine according to a noisy-OR function, and will assume that the causal strength of the Earthquake cause is $w_E = 0.29$. This means that if no other causes of the alarm sounding are present (ie there is no robbery and the alarm has not malfunctioned) then the alarm sounds with probability 0.29 if there is an earthquake.

As before we include a catch-all Background cause that stands for causes other than Robbery and Earthquake. The Background cause is present with probability $b= 0.01$ and has causal strength of 1 (which means that the alarm definitely sounds when the Background cause is present).

If desired we can leave the Background cause implicit:

<figure>
  <img src="images/alarm_3node.png" alt="alarm_3node" style="width:50%">
      <figcaption class="figure-caption text-center">
        Figure 3: Bayes net that includes Earthquake along with Robbery as a cause of the Alarm sounding.
      </figcaption>
</figure>

or show it explicitly:

<figure>
  <img src="images/alarm_3node_background.png" alt="alarm_3node_background" style="width:50%">
      <figcaption class="figure-caption text-center">
        Figure 4: Figure 3 extended with a Background node.
      </figcaption>
</figure>

Either way, we've defined the CPD for the Alarm node using a noisy-OR function with a background cause. This function assumes that all of the potential causes of Alarm (here Robbery, Earthquake and the Background) operate independently of each other, and that just one of these causes is enough to activate the alarm. Making this assumption means that the CPD for a node with $n$ parents can be specified using $n$ rather than $2^n - 1$ parameters.

###  Exercise 4

Use Figure 3 to show that $P(A=2 | R=2, E=2) = 0.957826$ when $b = 0.01$, $w_R = 0.94$ and $w_E = 0.29$.

=== BEGIN MARK SCHEME ===
This time we'll write down a derivation that skips some steps
\begin{align}
P(A=2|R=2,E=2) &= P(A=2|R=2,E=2,B=1)P(B=1) + P(A=2|R=2,E=2,B=2)P(B=2) \\
               &= [1 - (1-w_R)(1-w_E)](1-b) + b \\
               &= [1 - 0.06 \times 0.71]\times 0.99 + 0.01 \\
               &= 0.957826
\end{align}

=== END MARK SCHEME ===

###  Exercise 5

Run NIMBLE twice to estimate two probabilities: $P(E=2|A=2)$ and $P(E=2|A=2, R=2)$. Which probability is higher, and why? Your NIMBLE code can be based on either Figure 3 or Figure 4 -- you should get the same answer either way.

In [12]:
p_e2_given_a2 <- 0
p_e2_given_a2_r2 <- 0

# replace the values of p_e2_given_a2 and p_e2_given_a2_r2 with estimates generated using NIMBLE

### BEGIN SOLUTION
b <- 0.01
w_R <- 0.94
w_E <- 0.29

# making sure that the numbers in A_cpd are in the right order is tricky! The entries in the A_cpd list below are 
# A_cpd[1,1,1], A_cpd[2,1,1], A_cpd[1,2,1], A_cpd[2,2,1], A_cpd[1,1,2], etc where the three indices correspond to
# r, e, and a in that order

alarm_data <- list(
  R_cpd = c(0.95, 0.05),
  E_cpd = c(0.9, 0.1),
  A_cpd = array( c( 1-b,
                    (1-w_R)*(1-b),
                    (1-w_E)*(1-b),
                    (1-w_R)*(1-w_E)*(1-b),
                    b,
                    1-(1-w_R)*(1-b),
                    1-(1-w_E)*(1-b),
                    1-(1-w_R)*(1-w_E)*(1-b)),  dim = c(2,2,2))
   )

alarm_code <- nimbleCode({
  # dcat specifies a discrete categorical distribution
  r ~ dcat(R_cpd[1:2])
  e ~ dcat(E_cpd[1:2])
  a ~ dcat(A_cpd[r,e,1:2])
})

# add observation that a = 2
alarm_data["a"] <- 2

alarm_samples <- nimbleMCMC(
  code = alarm_code,
  data = alarm_data,
  inits = list(r=1,e=1),
  monitors = c('e')
)    

alarm_samples <- as_tibble(alarm_samples)
e_given_a2 <- alarm_samples %>% 
  group_by(e) %>% 
  summarize(count = n(), .groups = "drop") %>%  # first compute counts for each value of e
  mutate(prob = count/sum(count))               # divide by total counts to yield probabilities

p_e2_given_a2 <- e_given_a2$prob[2]

# add observation that r = 2 and collect a second bag of samples
alarm_data["r"] <- 2

alarm_samples <- nimbleMCMC(
  code = alarm_code,
  data = alarm_data,
  inits = list(e=1),
  monitors = c('e')
)    

alarm_samples <- as_tibble(alarm_samples)
e_given_a2_r2 <- alarm_samples %>% 
  group_by(e) %>% 
  summarize(count = n(), .groups = "drop") %>%  # first compute counts for each value of e
  mutate(prob = count/sum(count))               # divide by total counts to yield probabilities

p_e2_given_a2_r2 <- e_given_a2_r2$prob[2]

print(paste0('According to the first bag of samples, P(E=2|A=2) ≈ ', as.character(p_e2_given_a2)))
print(paste0('According to the second bag of samples, P(E=2|A=2,R=2) ≈ ', as.character(p_e2_given_a2_r2)))

# You should find that P(E=2|A=2) > P(E=2|A=2,R=2). Before making any observations, the probability that an earthquake
# has occurred is P(E=2) = 0.1, which is just the baserate of earthquakes. After observing the alarm sound, an earthquake 
# a plausible cause of the alarm, which means that P(E=2|A=2) is greater than the baserate 0.1. After learning that 
# a robbery has occurred, the sounding of the alarm now has an explanation, which means that the probability of
# earthquake drops back towards the baserate. This phenomenon is sometimes called "explaining away" -- observing the
# robbery explains the sounding of the alarm, meaning that beliefs about robbery return towards the baserate.

### END SOLUTION

defining model...

building model...

setting data and initial values...

running calculate on model (any error reports that follow may simply reflect missing values in model variables) ... 


checking model sizes and dimensions...


checking model calculations...

model building finished.

compiling... this may take a minute. Use 'showCompilerOutput = TRUE' to see C++ compilation details.

compilation finished.

running chain 1...



|-------------|-------------|-------------|-------------|
|-------------------------------------------------------|


defining model...

building model...

setting data and initial values...

running calculate on model (any error reports that follow may simply reflect missing values in model variables) ... 


checking model sizes and dimensions...


checking model calculations...

model building finished.

compiling... this may take a minute. Use 'showCompilerOutput = TRUE' to see C++ compilation details.

compilation finished.

running chain 1...



|-------------|-------------|-------------|-------------|
|-------------------------------------------------------|
[1] "According to the first bag of samples, P(E=2|A=2) ≈ 0.3938"
[1] "According to the second bag of samples, P(E=2|A=2,R=2) ≈ 0.1009"


In [13]:
# this cell contains some hidden tests! You can leave it empty except for this comment
### BEGIN HIDDEN TESTS
expect_lt(p_e2_given_a2, 0.41)
expect_gt(p_e2_given_a2, 0.37)
expect_lt(p_e2_given_a2_r2, 0.12)
expect_gt(p_e2_given_a2_r2, 0.08)
### END HIDDEN TESTS

# Even more optional material 

Bayes nets are modular and therefore easy to extend. Suppose that Pearl has two neighbours, Jan and Kim, who keep him informed when the alarm sounds, and also sometimes call for other reasons. Jan calls very reliably when the alarm sounds ($P(J=2|A=2) = 0.9$) and Kim is a bit less reliable  ($P(K=2|A=2) = 0.7$). The situation is captured by the following network.


<figure>
  <img src="images/alarm_5node.png" alt="alarm_5node" style="width:50%">
  <figcaption  class="figure-caption text-center">Figure 5: Alarm network with nodes for two neighbours (Jan and Kim) who often call when the alarm sounds.</figcaption>
</figure>

###  Exercise 6

In general a joint distribution over 5 binary variables may require 31 numbers to specify. How many different numbers did we need in order to define the joint distribution using the network in Figure 5? 

=== BEGIN MARK SCHEME ===

We needed 9 numbers ($P(R=2)$, $P(E=2)$, b, w_R, w_E, and two numbers each for Jan and Kim). So the Bayes net lets us specify the joint distribution relatively compactly.

=== END MARK SCHEME ===

###  Exercise 7

What is the probability that a robbery has occurred given that Jan calls but Kim does not?
Answer this question by using NIMBLE to compute $P(R=2|J=2,K=1)$.

In [14]:
p_r2_given_j2_k1 <- 0

# replace the value of p_r2_given_j2_k1 with an estimate generated using NIMBLE

### BEGIN SOLUTION
b <- 0.01
w_R <- 0.94
w_E <- 0.29

alarm_data <- list(
  R_cpd = c(0.95, 0.05),
  E_cpd = c(0.9, 0.1),
  A_cpd = array( c( 1-b,
                    (1-w_R)*(1-b),
                    (1-w_E)*(1-b),
                    (1-w_R)*(1-w_E)*(1-b),
                    b,
                    1-(1-w_R)*(1-b),
                    1-(1-w_E)*(1-b),
                    1-(1-w_R)*(1-w_E)*(1-b)),  dim = c(2,2,2)),
  J_cpd = array(c(1-0.05,1-0.9, 0.05,0.9), dim = c(2,2)),
  K_cpd = array(c(1-0.01,1-0.7, 0.01,0.7), dim = c(2,2))
)

alarm_code <- nimbleCode({
  # dcat specifies a discrete categorical distribution
  r ~ dcat(R_cpd[1:2])
  e ~ dcat(E_cpd[1:2])
  a ~ dcat(A_cpd[r,e,1:2])
  j ~ dcat(J_cpd[a,1:2])
  k ~ dcat(K_cpd[a,1:2])
})

# add observations (j=2, k=1)
alarm_data["j"] <- 2
alarm_data["k"] <- 1

alarm_samples <- nimbleMCMC(
  code = alarm_code,
  data = alarm_data,
  inits = list(r=1,e=1,a=1),
  monitors = c('r')
)    

alarm_samples <- as_tibble(alarm_samples)
r_given_j2_k1 <- alarm_samples %>% 
  group_by(r) %>% 
  summarize(count = n(), .groups = "drop") %>%  # first compute counts for each value of r
  mutate(prob = count/sum(count))               # divide by total counts to yield probabilities

p_r2_given_j2_k1 <- r_given_j2_k1$prob[2]

print(paste0('According to the bag of samples, P(R=2|J=2,K=1) ≈ ', as.character(p_r2_given_j2_k1)))

### END SOLUTION

defining model...

building model...

setting data and initial values...

running calculate on model (any error reports that follow may simply reflect missing values in model variables) ... 


checking model sizes and dimensions...


checking model calculations...

model building finished.

compiling... this may take a minute. Use 'showCompilerOutput = TRUE' to see C++ compilation details.

compilation finished.

running chain 1...



|-------------|-------------|-------------|-------------|
|-------------------------------------------------------|
[1] "According to the bag of samples, P(R=2|J=2,K=1) ≈ 0.1883"


In [15]:
# this cell contains some hidden tests! You can leave it empty except for this comment
### BEGIN HIDDEN TESTS
expect_lt(p_r2_given_j2_k1, 0.22)
expect_gt(p_r2_given_j2_k1, 0.18)
### END HIDDEN TESTS