<img src="https://i.ibb.co/SRy7CbN/Picture1.png" style="display: block; margin-left: auto;margin-right: 38%;width: 30%;">


# <span style="color:#008BBB">Reasonong with Uncertainty</span>  

# _Content_
*   Pre-requisites
*   Bayesian Networks
    * Inference by Enumeration
    * Sampling







### Pre-requisites

You should install $pomegranate$ package first

In [3]:
!pip install pygraphviz pomegranate


Collecting pygraphviz
  Using cached pygraphviz-1.11.zip (120 kB)
  Preparing metadata (setup.py): started
  Preparing metadata (setup.py): finished with status 'done'
Collecting pomegranate
  Downloading pomegranate-1.1.2-py3-none-any.whl.metadata (566 bytes)
Collecting apricot-select>=0.6.1 (from pomegranate)
  Downloading apricot-select-0.6.1.tar.gz (28 kB)
  Preparing metadata (setup.py): started
  Preparing metadata (setup.py): finished with status 'done'
Collecting nose (from apricot-select>=0.6.1->pomegranate)
  Downloading nose-1.3.7-py3-none-any.whl.metadata (1.7 kB)
Downloading pomegranate-1.1.2-py3-none-any.whl (98 kB)
Downloading nose-1.3.7-py3-none-any.whl (154 kB)
Building wheels for collected packages: pygraphviz, apricot-select
  Building wheel for pygraphviz (setup.py): started
  Building wheel for pygraphviz (setup.py): finished with status 'error'
  Running setup.py clean for pygraphviz
  Building wheel for apricot-select (setup.py): started
  Building wheel for apri

  error: subprocess-exited-with-error
  
  × python setup.py bdist_wheel did not run successfully.
  │ exit code: 1
  ╰─> [48 lines of output]
      running bdist_wheel
      running build
      running build_py
      creating build\lib.win-amd64-cpython-39\pygraphviz
      copying pygraphviz\agraph.py -> build\lib.win-amd64-cpython-39\pygraphviz
      copying pygraphviz\graphviz.py -> build\lib.win-amd64-cpython-39\pygraphviz
      copying pygraphviz\scraper.py -> build\lib.win-amd64-cpython-39\pygraphviz
      copying pygraphviz\testing.py -> build\lib.win-amd64-cpython-39\pygraphviz
      copying pygraphviz\__init__.py -> build\lib.win-amd64-cpython-39\pygraphviz
      creating build\lib.win-amd64-cpython-39\pygraphviz\tests
      copying pygraphviz\tests\test_attribute_defaults.py -> build\lib.win-amd64-cpython-39\pygraphviz\tests
      copying pygraphviz\tests\test_clear.py -> build\lib.win-amd64-cpython-39\pygraphviz\tests
      copying pygraphviz\tests\test_close.py -> build\lib

### Inference by Enumeration

Multiple libraries exist in Python to ease the process of probabilistic inference. We will take a look at the library pomegranate to see how the above data can be represented in code.

First, we create the nodes and provide a probability distribution for each one.

In [2]:
from pomegranate import BayesianNetwork
from pomegranate import Node
from pomegranate import DiscreteDistribution
from pomegranate import ConditionalProbabilityTable

# Rain node has no parents
rain = Node(DiscreteDistribution({
    "none": 0.7,
    "light": 0.2,
    "heavy": 0.1
}), name="rain")

# Track maintenance node is conditional on rain
maintenance = Node(ConditionalProbabilityTable([
    ["none", "yes", 0.4],
    ["none", "no", 0.6],
    ["light", "yes", 0.2],
    ["light", "no", 0.8],
    ["heavy", "yes", 0.1],
    ["heavy", "no", 0.9]
], [rain.distribution]), name="maintenance")

# Train node is conditional on rain and maintenance
train = Node(ConditionalProbabilityTable([
    ["none", "yes", "on time", 0.8],
    ["none", "yes", "delayed", 0.2],
    ["none", "no", "on time", 0.9],
    ["none", "no", "delayed", 0.1],
    ["light", "yes", "on time", 0.6],
    ["light", "yes", "delayed", 0.4],
    ["light", "no", "on time", 0.7],
    ["light", "no", "delayed", 0.3],
    ["heavy", "yes", "on time", 0.4],
    ["heavy", "yes", "delayed", 0.6],
    ["heavy", "no", "on time", 0.5],
    ["heavy", "no", "delayed", 0.5],
], [rain.distribution, maintenance.distribution]), name="train")

# Appointment node is conditional on train
appointment = Node(ConditionalProbabilityTable([
    ["on time", "attend", 0.9],
    ["on time", "miss", 0.1],
    ["delayed", "attend", 0.6],
    ["delayed", "miss", 0.4]
], [train.distribution]), name="appointment")


Second, we create the model by adding all the nodes and then describing which node is the parent of which other node by adding edges between them (recall that a Bayesian network is a directed graph, consisting of nodes with arrows between them).

In [3]:
# Create a Bayesian Network and add states
model = BayesianNetwork()
model.add_states(rain, maintenance, train, appointment)

# Add edges connecting nodes
model.add_edge(rain, maintenance)
model.add_edge(rain, train)
model.add_edge(maintenance, train)
model.add_edge(train, appointment)

# Finalize model
model.bake()


Now, to ask how probable a certain event is, we run the model with the values we are interested in. In this example, we want to ask what is the probability that there is no rain, no track maintenance, the train is on time, and we attend the meeting.

In [4]:
# Calculate probability for a given observation

probability = model.probability([["none", "yes", "on time", "miss"]])

# probability = model.probability([["none", "no", "on time", "miss"]])
print(probability)


0.022400000000000003


Otherwise, we could use the program to provide probability distributions for all variables given some observed evidence. In the following case, we know that the train was delayed. Given this information, we compute and print the probability distributions of the variables Rain, Maintenance, and Appointment.

In [5]:
model.predict_proba({
    "train": "delayed"
})


array([{
           "class" : "Distribution",
           "dtype" : "str",
           "name" : "DiscreteDistribution",
           "parameters" : [
               {
                   "none" : 0.4582663523106501,
                   "light" : 0.30694146412284706,
                   "heavy" : 0.23479218356650278
               }
           ],
           "frozen" : false
       }                                         ,
       {
           "class" : "Distribution",
           "dtype" : "str",
           "name" : "DiscreteDistribution",
           "parameters" : [
               {
                   "no" : 0.6432016166879331,
                   "yes" : 0.35679838331206687
               }
           ],
           "frozen" : false
       }                                      , 'delayed',
       {
           "class" : "Distribution",
           "dtype" : "str",
           "name" : "DiscreteDistribution",
           "parameters" : [
               {
                   "attend" : 0.59999999999

In [6]:
# Calculate predictions based on the evidence that the train was delayed
predictions = model.predict_proba({
    "train": "delayed"
})

# Print predictions for each node
for node, prediction in zip(model.states, predictions):
    if isinstance(prediction, str):
        print(f"{node.name}: {prediction}")
        # print(f"{node.name}: {prediction}")

    else:
        print(f"{node.name}")
        for value, probability in prediction.parameters[0].items():
            print(f"    {value}: {probability:.4f}")


rain
    none: 0.4583
    light: 0.3069
    heavy: 0.2348
maintenance
    no: 0.6432
    yes: 0.3568
train: delayed
appointment
    attend: 0.6000
    miss: 0.4000


The code above used inference by enumeration. However, this way of computing probability is inefficient, especially when there are many variables in the model. A different way to go about this would be abandoning exact inference in favor of approximate inference. Doing this, we lose some precision in the generated probabilities, but often this imprecision is negligible. Instead, we gain a scalable method of calculating probabilities.

### Sampling

Sampling is one technique of approximate inference. In sampling, each variable is sampled for a value according to its probability distribution. 


    To generate a distribution using sampling with a die, we can roll the die multiple times and record what value we got each time. Suppose we rolled the die 600 times. We count how many times we got 1, which is supposed to be roughly 100, and then repeat for the rest of the values, 2-6. Then, we divide each count by the total number of rolls. This will generate an approximate distribution of the values of rolling a die: on one hand, it is unlikely that we get the result that each value has a probability of 1/6 of    occurring (which is the exact probability), but we will get a value that’s close to it.

Here is an example : if we start with sampling the Rain variable, the value none will be generated with probability of 0.7, the value light will be generated with probability of 0.2, and the value heavy will be generated with probability of 0.1. Suppose that the sampled value we get is none. When we get to the Maintenance variable, we sample it, too, but only from the probability distribution where Rain is equal to none, because this is an already sampled result. We will continue to do so through all the nodes. Now we have one sample, and repeating this process multiple times generates a distribution. Now, if we want to answer a question, such as what is P(Train = on time), we can count the number of samples where the variable Train has the value on time, and divide the result by the total number of samples. This way, we have just generated an approximate probability for P(Train = on time).

We can also answer questions that involve conditional probability, such as P(Rain = light | Train = on time). In this case, we ignore all samples where the value of Train is not on time, and then proceed as before. We count how many samples have the variable Rain = light among those samples that have Train = on time, and then divide by the total number of samples where Train = on time.

In code, a sampling function can look like generate_sample:

In [None]:
import pomegranate

from collections import Counter




def generate_sample():

    # Mapping of random variable name to sample generated
    sample = {}

    # Mapping of distribution to sample generated
    parents = {}

    # Loop over all states, assuming topological order
    for state in model.states:

        # If we have a non-root node, sample conditional on parents
        if isinstance(state.distribution, pomegranate.ConditionalProbabilityTable):
            sample[state.name] = state.distribution.sample(
                parent_values=parents)

        # Otherwise, just sample from the distribution alone
        else:
            sample[state.name] = state.distribution.sample()

        # Keep track of the sampled value in the parents mapping
        parents[state.distribution] = sample[state.name]

    # Return generated sample
    return sample


In [8]:
model.states


[{
     "class" : "State",
     "distribution" : {
         "class" : "Distribution",
         "dtype" : "str",
         "name" : "DiscreteDistribution",
         "parameters" : [
             {
                 "none" : 0.7,
                 "light" : 0.2,
                 "heavy" : 0.1
             }
         ],
         "frozen" : false
     },
     "name" : "rain",
     "weight" : 1.0
 },
 {
     "class" : "State",
     "distribution" : {
         "class" : "Distribution",
         "name" : "ConditionalProbabilityTable",
         "table" : [
             [
                 "none",
                 "no",
                 "0.6"
             ],
             [
                 "none",
                 "yes",
                 "0.4"
             ],
             [
                 "light",
                 "no",
                 "0.8"
             ],
             [
                 "light",
                 "yes",
                 "0.2"
             ],
             [
                 "hea

In [9]:
generate_sample()


{'rain': 'none',
 'maintenance': 'no',
 'train': 'on time',
 'appointment': 'attend'}

Now, to compute P(Appointment | Train = delayed), which is the probability distribution of the Appointment variable given that the train is delayed, we do the following:

In [10]:
# Rejection sampling
# Compute distribution of Appointment given that train is delayed
N = 1_000_000

data = []

# Repeat sampling 1000,000 times
for i in range(N):

    # Generate a sample based on the function that we defined earlier
    sample = generate_sample()
    # print(sample)

    # If, in this sample, the variable of Train has the value delayed, save the sample. Since we are interested interested in the probability distribution of Appointment given that the train is delayed, we discard the sampled where the train was on time.
    if sample["train"] == "delayed":
        data.append(sample["appointment"])

# Count how many times each value of the variable appeared. We can later normalize by dividing the results by the total number of saved samples to get the approximate probabilities of the variable that add up to 1.
print(Counter(data))
print(Counter(data)['attend']/len(data), Counter(data)['miss']/len(data))


Counter({'attend': 127361, 'miss': 85259})
0.5990076192267896 0.4009923807732104


In [11]:
len(data)


212620

### Likelihood Weighting



In the sampling example above, we discarded the samples that did not match the evidence that we had. This is inefficient. One way to get around this is with likelihood weighting, using the following steps:

Start by fixing the values for evidence variables.
Sample the non-evidence variables using conditional probabilities in the Bayesian network.
Weight each sample by its likelihood: the probability of all the evidence occurring.
For example, if we have the observation that the train was on time, we will start sampling as before. We sample a value of Rain given its probability distribution, then Maintenance, but when we get to Train - we always give it the observed value, in our case, on time. Then we proceed and sample Appointment based on its probability distribution given Train = on time. Now that this sample exists, we weight it by the conditional probability of the observed variable given its sampled parents. That is, if we sampled Rain and got light, and then we sampled Maintenance and got yes, then we will weight this sample by P(Train = on time | light, yes).

## Resources
* https://cs50.harvard.edu/ai/2020/notes/2/
* https://www.linkedin.com/pulse/uncertainty-bayesian-network-inference-tusar/