# Bayesian Belief Networks 

based on the tutorial: https://towardsdatascience.com/bbn-bayesian-belief-networks-how-to-build-them-effectively-in-python-6b7f93435bba

Data for this practice: https://www.kaggle.com/jsphyg/weather-dataset-rattle-package


Most of you may already be familiar with the Naive Bayes algorithm, a fast and simple modeling technique used in classification problems. While it is used widely due to its speed and relatively good performance, Naive Bayes is built on the assumption that all variables (model features) are independent, which in reality is often not true.

In some cases, you may want to build **a model where you can specify which variables are dependent, independent, or conditionally independent** (this is explained in the next section). You may also want to track real-time how event probabilities change as new evidence is introduced to the model.

This is where the Bayesian Belief Networks come in handy as they allow you to construct a model with nodes and directed edges by clearly outlining the relationships between variables.

## Contents
+ The category of algorithms Bayesian Belief Networks (BBN) belong to
+ Introduction to Bayesian Belief Networks (BBN) and Directed Acyclic Graphs (DAG)
+ Bayesian Belief Network Python example using real-life data
- Directed Acyclic Graph for weather prediction
- Data and Python library setup
- BBN setup
- Using BBN for predictions
+ Conclusions


## What category of algorithms does Bayesian Belief Networks (BBN) belong to?
Technically there is no training happening within BBN. We simply define how different nodes in the network are linked together. Then we observe how the probabilities change after passing some evidence into specific nodes. 

So, we can define this networks to its own category - Probabilistic Graphical Models.

## Bayesian Belief Networks (BBN) and Directed Acyclic Graphs (DAG)
Bayesian Belief Network (BBN) is a Probabilistic Graphical Model (PGM) that represents a set of variables and their conditional dependencies via a Directed Acyclic Graph (DAG).

To understand what this means, let’s draw a DAG and analyze the relationship between different nodes.

<img src="dag.png" width="400"/>

Using the above, we can state the **relationship between variables (nodes)**:

+ **Independence**: A and C are independent of each other. So are B and C. This is because knowing whether C has happened does not change our knowledge about A or B and vice versa.
+ **Dependence**: B is dependent on A since A is the parent of B. This relationship can be written as a conditional probability: P(B|A). D is also dependent on other variables, and in this case, it depends on two of them — B and C. Again, this can be written as a conditional probability: P(D|B,C).
+ **Conditional Independence**: D is considered conditionally independent of A. This is because as soon as we know whether event B has happened, A becomes irrelevant from the perspective of D. In other words, the following is true: P(D|B,A) = P(D|B).

## Bayesian Belief Network Python example using real-life data
### Directed Acyclic Graph for weather prediction

Let’s use Australian weather data to build a BBN. This will enable us to predict if it will rain tomorrow based on a few weather observations from today.

First, let’s take a look at a DAG before we go through the details of how to build it. Note, I have displayed probabilities for all the different event combinations. You will see how we calculate these using our weather data in the following few sections.

<img src="weather_data.png" width="1000"/>

### Data and Python library setup

We will use the following data and libraries:

+ Australian weather data from Kaggle
+ PyBBN for creating Bayesian Belief Networks
+ Pandas for data manipulation
+ NetworkX and Matplotlib for drawing graphs
+ Let’s import all the libraries:

In [None]:
import pandas as pd # for data manipulation 
import networkx as nx # for drawing graphs
import matplotlib.pyplot as plt # for drawing graphs
import numpy as np

# for creating Bayesian Belief Networks (BBN)
from pybbn.graph.dag import Bbn
from pybbn.graph.edge import Edge, EdgeType
from pybbn.graph.jointree import EvidenceBuilder
from pybbn.graph.node import BbnNode
from pybbn.graph.variable import Variable
from pybbn.pptc.inferencecontroller import InferenceController

Then we get the Australian weather data from Kaggle, which you can download following this link: https://www.kaggle.com/jsphyg/weather-dataset-rattle-package.

We ingest the data and derive a few new variables for usage in the model.

In [None]:
# Set Pandas options to display more columns
pd.options.display.max_columns=50

# Read in the weather data csv
df=pd.read_csv('weatherAUS.csv', encoding='utf-8')

In [None]:
df.columns

In [None]:
np.sum(pd.isnull(df['RainTomorrow'])==True)

In [None]:
# Drop records where target RainTomorrow=NaN
df=df[pd.isnull(df['RainTomorrow'])==False]

In [None]:
# selecting the necessary columns

df = df[['RainTomorrow', 'WindGustSpeed', 'Humidity9am', 'Humidity3pm']]

In [None]:
np.sum(pd.isnull(df)==True)

In [None]:
# For other columns with missing values, fill them in with column mean
df=df.fillna(df.mean())

In [None]:

# Create bands for variables that we want to use in the model
df['WindGustSpeedCat']=df['WindGustSpeed'].apply(lambda x: '0.<=40'   if x<=40 else
                                                            '1.40-50' if 40<x<=50 else '2.>50')
df['Humidity9amCat']=df['Humidity9am'].apply(lambda x: '1.>60' if x>60 else '0.<=60')
df['Humidity3pmCat']=df['Humidity3pm'].apply(lambda x: '1.>60' if x>60 else '0.<=60')

# Show a snaphsot of data
df

### Setting up Bayesian Belief Network
Now that we have all the libraries and data ready, it is time to set up a BBN. The first stage requires us to define nodes.

In [None]:
# Create nodes by manually typing in probabilities
H9am = BbnNode(Variable(0, 'H9am', ['<=60', '>60']), [0.30658, 0.69342])
H3pm = BbnNode(Variable(1, 'H3pm', ['<=60', '>60']), [0.92827, 0.07173, 
                                                      0.55760, 0.44240])
W = BbnNode(Variable(2, 'W', ['<=40', '40-50', '>50']), [0.58660, 0.24040, 0.17300])
RT = BbnNode(Variable(3, 'RT', ['No', 'Yes']), [0.92314, 0.07686, 
                                                0.89072, 0.10928, 
                                                0.76008, 0.23992, 
                                                0.64250, 0.35750, 
                                                0.49168, 0.50832, 
                                                0.32182, 0.67818])

A few things to note:

+ Probabilities here are normalized frequencies of the variable categories from the data. E.g., the “H9am” variable has 43,594 observations where the value is ≤60 and 98,599 observations where the value is >60.


+ While I have used normalized frequencies (probabilities), it also works if you put actual frequencies instead. In that case, your code would look like this: 

```
H9am = BbnNode(Variable(0, 'H9am',['<=60', '>60']), [43594, 98599]) .
```

+ For child nodes, like “Humidity3pmCat”, which has a parent “Humidity9amCat”, we need to provide probabilities (or frequencies) for each combination as shown in the DAG (note each row adds up to 1):

<img src="table_prob1.png" width="500"/>

+ You can do this by calculating probabilities/frequencies of “H3pm” twice — the first time by taking a subset of data where “H9am”≤60 and the second time by taking a subset of data where “H9am”>60.
+ Since calculating frequencies one at a time is time-consuming, I have written a short function that gives us what we need.


In [None]:
# This function helps to calculate probability distribution, which goes into BBN (note, can handle up to 2 parents)
def probs(data, child, parent1=None, parent2=None):
    if parent1==None:
        # Calculate probabilities
        prob=pd.crosstab(data[child], 'Empty', margins=False, normalize='columns').sort_index().to_numpy().reshape(-1).tolist()
    elif parent1!=None:
            # Check if child node has 1 parent or 2 parents
            if parent2==None:
                # Caclucate probabilities
                prob=pd.crosstab(data[parent1],data[child], margins=False, normalize='index').sort_index().to_numpy().reshape(-1).tolist()
            else:    
                # Caclucate probabilities
                prob=pd.crosstab([data[parent1],data[parent2]],data[child], margins=False, normalize='index').sort_index().to_numpy().reshape(-1).tolist()
    else: print("Error in Probability Frequency Calculations")
    return prob

So, instead of manually typing in all the probabilities, let’s use the above function. At the same time, we will create an actual network:



In [None]:
# Create nodes by using our earlier function to automatically calculate probabilities
H9am = BbnNode(Variable(0, 'H9am', ['<=60', '>60']), probs(df, child='Humidity9amCat'))
H3pm = BbnNode(Variable(1, 'H3pm', ['<=60', '>60']), probs(df, child='Humidity3pmCat', parent1='Humidity9amCat'))
W = BbnNode(Variable(2, 'W', ['<=40', '40-50', '>50']), probs(df, child='WindGustSpeedCat'))
RT = BbnNode(Variable(3, 'RT', ['No', 'Yes']), probs(df, child='RainTomorrow', parent1='Humidity3pmCat', parent2='WindGustSpeedCat'))

# Create Network
bbn = Bbn() \
    .add_node(H9am) \
    .add_node(H3pm) \
    .add_node(W) \
    .add_node(RT) \
    .add_edge(Edge(H9am, H3pm, EdgeType.DIRECTED)) \
    .add_edge(Edge(H3pm, RT, EdgeType.DIRECTED)) \
    .add_edge(Edge(W, RT, EdgeType.DIRECTED))

# Convert the BBN to a join tree
join_tree = InferenceController.apply(bbn)

Note: if you are working with a small data sample, there is a risk of some event combinations not being present. In such scenario, you would get a “list index out of range” error. A solution could be to expand your data to include all possibe event combinations, or to identify missing combinations and add them in.

Now, we want to draw the graph to check that we have set it up as intended:

In [None]:
# Set node positions
pos = {0: (-1, 2), 1: (-1, 0.5), 2: (1, 0.5), 3: (0, -1)}

# Set options for graph looks
options = {
    "font_size": 16,
    "node_size": 4000,
    "node_color": "white",
    "edgecolors": "black",
    "edge_color": "red",
    "linewidths": 5,
    "width": 5,}
    
# Generate graph
n, d = bbn.to_nx_graph()
nx.draw(n, with_labels=True, labels=d, pos=pos, **options)

# Update margins and print the graph
ax = plt.gca()
ax.margins(0.10)
plt.axis("off")
plt.show()

### Using BBN for predictions
With our model being ready, we can use it to predict whether it will rain tomorrow.

First, let’s plot probabilities for each node without passing any additional information to the graph. Note, I have set up a simple function so we don’t have to retype the same code later on, as we will want to regenerate the results multiple times.

In [None]:
# Define a function for printing marginal probabilities
def print_probs():
    for node in join_tree.get_bbn_nodes():
        potential = join_tree.get_bbn_potential(node)
        print("Node:", node)
        print("Values:")
        print(potential)
        print('----------------')
        
# Use the above function to print marginal probabilities
print_probs()

As you can see, this gives us the likelihood of each event occurring with a “Rain Tomorrow (RT)” probability of 22%. While this is cool, we could have got the same 22% probability by looking at the frequency of the “RainTomorrow” variable in our original dataset.

Said that the following step is where we get a lot of value out of our BBN. We can pass evidence into BBN and see how that affects probabilities for every node in the network.

Let’s say it is 9 am right now, and we have measured the humidity outside. It says 72, which obviously belongs to the “>60” band. Hence, let’s pass this evidence into the BBN and see what happens. Note, I have created another small function to help us with that.

In [None]:
# To add evidence of events that happened so probability distribution can be recalculated
def evidence(ev, nod, cat, val):
    ev = EvidenceBuilder() \
    .with_node(join_tree.get_bbn_node_by_name(nod)) \
    .with_evidence(cat, val) \
    .build()
    join_tree.set_observation(ev)
    
# Use above function to add evidence
evidence('ev1', 'H9am', '>60', 1.0)

# Print marginal probabilities
print_probs()

As you can see, “Humidity9am>60” is now equal to 100%, and the likelihood of “Humidity3pm>60” has increased from 32.8% to 44.2%. At the same time, the chance of “RainTomorrow” has gone up to 26.1%.

Also, note how probabilities for “WindGustSpeed” did not change since “W” and “H9am” are independent of each other.

You can run the same evidence code one more time to remove the evidence from the network. After that, let’s pass two pieces of evidence for “H3pm” and “W.”



In [None]:
# Add more evidence
evidence('ev1', 'H3pm', '>60', 1.0)
evidence('ev2', 'W', '>50', 1.0)
# Print marginal probabilities
print_probs()

Unsurprisingly, this tells us that the chance of rain tomorrow has gone up to 67.8%. Note how “H9am” probabilities also changed, which tells us that despite us only measuring humidity at 3 pm, we are 93% certain that humidity was also above 60 at 9 am this morning.

## Task for you:

Deadline: 18.10.2022 12:00, send me to my e-mail aspestova@hse.ru in **html** format


1. Build similar Bayesian model for predicting the target variable RainTomorrow with adding new variables. Add at least one independent variable and one dependent variable (so, like H9am and H3pm variables in our case).

2. Explore the model a little bit. Define some events and look how it effects the probability predictions of the target variable.