# Workshop 8: Filtering

## Overview

This workshop relates to Lecture 8. In that lecture, we discussed Dynamic Bayesian Networks
(DBNs). Here you will see how DBNs work using a mixture of Excel, and the Python Pomegranate
package. In particular, you will carry out a mixture of filtering and prediction tasks on the umbrella
network that we studied in the lecture.

## Task 1: Excel for filtering

The spreadsheet `umbrella-filtering.xls` that can be found on Blackboard models the umbrella example over over the first 2 days.
On the top line, the probability of rain for `Day 0` is the prior probability (see Slides 26 and 49 in lecture 8).
At the bottom of the sheet are the conditional probability tables for the transition model and the sensor model. The predicted probability for rain on `Day 1` (top) is computed from the probability for `Day 0` and the transition model. This is exactly as on Slide 49 (and 53).

To get the filtered probability, we have to bring in information about whether we saw an umbrella or not. The filtered probability of rain for `Day 1` (middle of the sheet) is computed by combining the predicted probability for `Day 1`, the sensor model, and what we know about umbrellas. This gives the results you see on Slide 50 (and 53).

Note: There are two versions of the filtered probability. The raw values which we get directly from the calculation, and the normalized values (raw values scaled so they add to 1).

**Look at what happens if you change the probability of umbrella/not umbrella. Currently the values say you see an umbrella (probability of umbrella is 1 and that of not umbrella is 0). What happens if you don’t see an umbrella (probability of umbrella is 0 and that of not umbrella is 1)? What about if you have no information (probability of umbrella is 0.5 and that of not umbrella is 0.5)?**

## Task 2: More filtering with Excel

The column for `Day 2` just repeats the calculations for `Day 1`, but starting from the results from `Day 1`.
Thus the predicted probability for `Day 2` is calculated by applying the transition model to the (normalized) filtered probability for `Day 1`. The results are just like those on Slide 51 (and 53).

The filtered probability of `Day 2` is calculated from the predicted probability for `Day 2`, the sensor model, and what we know about umbrellas. The results are just like those on Slide 52 (and 53).

In other words, the probabilities for `Day 2` are computed just like those for `Day 1`. The calculation is modular.

**Look at what happens when the probabilities of umbrella/not umbrella on `Days 1` and `2` vary.**

## Task 3: Predicting with Python

For this example we will use a Python package called `pomegranate`, which provides support for probabilistic reasoning.

### Setup and Installation

In [None]:
%pip install pomegranate==0.15.0

### Imports

Let's import the `pomegranate` library for building our model.

In [None]:
from pomegranate import *


### Model Setup

Then you can run the version of the umbrella model in the following cells. `pomegranate` can only solve Bayesian newtorks (not Dynamic Bayesian Networks), so we have to unroll the whole example to the depth that we want. The following code has the network unrolled to a depth of 2 days. Read through the code, where we defined the probability distributions, nodes, and network edges.

In [None]:
# Define the distributions
Rain0 = DiscreteDistribution({'y': 0.5, 'n': 0.5})

# Conditional distributions for rain on subsequent days
Rain1 = ConditionalProbabilityTable([
    ['y', 'y', 0.7],
    ['y', 'n', 0.3],
    ['n', 'y', 0.3],
    ['n', 'n', 0.7]
], [Rain0])

Rain2 = ConditionalProbabilityTable([
    ['y', 'y', 0.7],
    ['y', 'n', 0.3],
    ['n', 'y', 0.3],
    ['n', 'n', 0.7]
], [Rain1])

# Sensor model for umbrella
Umbrella1 = ConditionalProbabilityTable([
    ['y', 'y', 0.9],
    ['y', 'n', 0.1],
    ['n', 'y', 0.2],
    ['n', 'n', 0.8]
], [Rain1])

Umbrella2 = ConditionalProbabilityTable([
    ['y', 'y', 0.9],
    ['y', 'n', 0.1],
    ['n', 'y', 0.2],
    ['n', 'n', 0.8]
], [Rain2])

# Nodes in the network
s1 = Node(Rain0, name='Rain0')
s2 = Node(Rain1, name='Rain1')
s3 = Node(Umbrella1, name='Umbrella1')
s4 = Node(Rain2, name='Rain2')
s5 = Node(Umbrella2, name='Umbrella2')

# Define the network
model = BayesianNetwork('Umbrella Network')
model.add_states(s1, s2, s3, s4, s5)

# Add edges between nodes
model.add_edge(s1, s2)
model.add_edge(s2, s3)
model.add_edge(s2, s4)
model.add_edge(s4, s5)

# Finalize the network
model.bake()
print('Model setup complete')

`pomegranate` makes it possible to specify the following elements:
* Variables
  
  `Rain0`, `Rain1`, `Rain2`, `Umbrella1` and `Umbrella2` are the variables here.

* Probability distributions.
  
  Variables can have probability distributions associated with them. The distribution associated with `Rain0` is an example of a prior distribution, whereas those connecting `Rain0` and `Rain1`, and then `Rain1`, `Rain2` are conditional. Similarly, there are conditional distributions connecting `Umbrella1` and `Rain1`, and `Umbrella2`, `Rain2`.

* Nodes in a network

  `s1` to `s5` are nodes, associated with the variables `Rain0`, `Rain1`, `Umbrella1`, `Rain2` and `Umbrella2` respectively.

* Models

  `model` is defined as a Bayesian Network that includes all the nodes, and then edges that connect them.


### Task 3.1: Predict without Evidence

In addition to the above, we can also define the evidence. In particular, `scenario` specifies values for the variables in the model. 

`scenario = [[None, None, None, None, None]]`

leaves all variables unspecified, while:

`scenario = [[None, None, ’y’, None, None]]`

specifies that an umbrella was seen on day 1 (i.e. the variable `Umbrella1` associated with node `s3` has value `y`.

We can define a function for printing the evidence in the scenario.

In [None]:
def print_evidence(scenario):
    # A message about the evidence presented.
    #
    # This is hard-coded to reflect the variables used in the model (see
    # below).
    msg = ""
    if scenario[0][0] == 'y':
        msg += "Rain in Day 0. "
    if scenario[0][0] == 'n':
        msg += "No rain on Day 0. "
    if scenario[0][1] == 'y':
        msg += "Rain on Day 1. "
    if scenario[0][1] == 'n':
        msg += "No rain on Day 1. "
    if scenario[0][2] == 'y':
        msg += "Umbrella on Day 1. "
    if scenario[0][2] == 'n':
        msg += "No umbrella on Day 1. "
    if scenario[0][3] == 'y':
        msg += "Rain on Day 2. "
    if scenario[0][3] == 'n':
        msg += "No rain on Day 2. "
    if scenario[0][4] == 'y':
        msg += "Umbrella on Day 2. "
    if scenario[0][4] == 'n':
        msg += "No umbrella on Day 2. "
    
    print("Evidence is: ", msg)
    print("\n")


Run the following code to compure the values of rain on `Days 1`and `2`when no evidence of umbrellas. This is predicting rain on those days with no evidence. What results do you get?

In [None]:
scenario = [[None, None, None, None, None]]
print_evidence(scenario)
predict_proba = model.predict_proba(scenario)
for i, dist in enumerate(predict_proba[0]):
    if isinstance(dist, DiscreteDistribution):
        print(f"Day {i}:", dist.items())

### Task 3.2: Adding Evidence
Here, modify the following code to add evidence of an umbrella on `Day 1` and observe the change in probability. How does the probability of rain on `Day 1` change? This is now the filtered probability of rain on `Day 1`.

In [None]:
scenario = [[None, None, None, None, None]]
print_evidence(scenario)
predict_proba = model.predict_proba(scenario)
for i, dist in enumerate(predict_proba[0]):
    if isinstance(dist, DiscreteDistribution):
        print(f"Day {i}:", dist.items())

Note that, in both these cases, `pomegranate` is giving us values for the probabilities of all the variables in the model. When we give it evidence about `umbrella1`, it not only gives us the filtered probability of rain on `Day 1`, but it also predicts the probability of rain on `Day 2`, and provides a smoothed estimate of the probability of rain on `Day 0`.

### Task 3.3: Day 2 with pomegranate

Use pomegranate to calculate the filtered probability of rain on `Day 2` when we see an umbrella on `Day 1` and `Day 2`. What is the filtered probability of rain on `Day 2` when we don’t see an umbrella on `Day 1` (that is, the evidence is “no umbrella” on `Day 1`)? How about if we just have no information about Umbrellas on `Day 1`? You can play with the code block above.
How do these results compare with what you get using Excel?

In [None]:
scenario = [[None, None, None, None, None]]
print_evidence(scenario)
predict_proba = model.predict_proba(scenario)
for i, dist in enumerate(predict_proba[0]):
    if isinstance(dist, DiscreteDistribution):
        print(f"Day {i}:", dist.items())

## Task 4: Filtering Day 3 using Excel

Take your spreadsheet from the previous example, and add the filtered probability calculation for `Day 3`.

## Task 5: Day 3 with pomegranate
This task involves extending the network to include `Day 3` and analyzing probabilities.
Use your model to predict the probability of rain on `Day 3` when the only evidence that you have is that you see an Umbrella on `Day 1`. 
How does this value change when you also see an umbrella on `Day 3`?

In [None]:
# Extend to day 3
Rain3 = ConditionalProbabilityTable(# TODO: fill in the table)
Umbrella3 = ConditionalProbabilityTable(# TODO: fill in the table)

# TODO: Define and Add nodes
s6 = # TODO: Add node for Rain3
s7 = # TODO: Add node for Umbrella3
model.add_states(s6, s7)

# TODO: Add edges


# Re-bake the model
model.bake()

# Run prediction with new model
scenario = # TODO: Create a new model scenario
print_evidence(scenario)
predict_proba = model.predict_proba(scenario)
for i, dist in enumerate(predict_proba[0]):
    if isinstance(dist, DiscreteDistribution):
        print(f"Day {i}:", dist.items())