# week 2: uncertainty

## proability

- There are possible worlds $\omega$
- There is a set of all possible worlds $\Omega$
- It has a probability of being true $P(\omega$`
- The summation of all worlds must be one $\sum_{\omega\in\Omega}P(\omega) = 1$


## conditional probability

- Probability of $a$ given $b$: $P(a|b)$
- Probability that $a$ is true given that we know that $b$ is true
$$
P(a|b)=\frac{P(a\land b)}{P(b)}
$$

## independence

**Dependence**
- $P(a\land b)=P(a)P(b|a)$

**Independence**
- $P(a\land b)=P(a)P(b)$

## bayes' rule
$$
P(a\land b)=P(a)P(b|a)
$$

- $P(b)P(a|b)=P(a)P(b|a)$

$$
P(b|a)=\frac{P(b)P(a|b)}{P(a)}
$$

- Given $C$ = cloud, and $R$ = rain
$$
P(C | rain)=\frac{P(C, rain)}{P(rain)}=\alpha P(C, rain)=\alpha\langle0.08,0.02\rangle=\langle0.8,0.2\rangle
$$

|x|R=rain|r=!rain|
|:--:|:--:|:--:|
|C=cloud|0.08|0.32|
|C=!cloud|0.02|0.58|

## probability rules

### negation
- $P(!a)=1-P(a)$

### or
- $P(a\lor b)=P(a)+P(b)-P(a\land b)$

### marginalisation
- $P(a)=P(a, b)+P(a, !b)$
- $P(X=x_i)=\sum_jP(X=x_i, Y=y_j)$

### conditioning
- $P(a)=P(a|b)P(b)+P(a|!b)P(!b)$

## bayesian network

A data structure that represents the dependencies among random variables

- directed graph
- each node represents a random variable
- arrow from X to Y means X is a parent of Y
- each node has a probability distribution $P(X | \text{Parents}(X))$

```
[Rain {none, light, heavy}] -> [Maintainance {yes, no}] -> [Train {on time, delayed}] -> [Appointment {attend, miss}]
            |                                                       ^
            --------------------------------------------------------|
```
### extracting joint probability

- $P(\text{light, no}) = P(\text{light})P(\text{no}|\text{light})$
- $P(\text{light, no, delayed}) = P(\text{light})P(\text{no}|\text{light})P(\text{delayed}|\text{light, no})$
- $P(\text{light, no, delayed, miss}) = P(\text{light})\cdot P(\text{no}|\text{light})\cdot P(\text{delayed}|\text{light, no})\cdot P(\text{miss}|\text{delayed})$

### extracting new information

Probabilistic inference
- Query `X` to compute the distribution for
- Evidence variables `E` that are obserbed variables for event `e`
- Hidden variables `Y` that are non-evidence and non-query variables
- Goal: $P(X|e)$

### inference by enumeration

**Goal: $P(\text{Appointment}|\text{light, no})$**

*Train* is a hidden variable

- $P(\text{Appointment}|\text{light, no}) = \alpha P(\text{Appointment, light, no})$
- Use marginalisation to expand hidden variable *Train*
- $= \alpha[P(\text{Appointment, light, no, on time})+P(\text{Appointment, light, no, delayed})]$

$$
P(X|e)=\alpha P(X, e)=\alpha\sum_yP(X, e, y)
$$

In [1]:
from pomegranate import *

# Rain node has no parents
rain = Node(DiscreteDistribution({
    "none": 0.7,
    "light": 0.2,
    "heavy": 0.1
}), name="rain")

# Track maintenance node is conditional on rain
maintenance = Node(ConditionalProbabilityTable([
    ["none", "yes", 0.4],
    ["none", "no", 0.6],
    ["light", "yes", 0.2],
    ["light", "no", 0.8],
    ["heavy", "yes", 0.1],
    ["heavy", "no", 0.9]
], [rain.distribution]), name="maintenance")

# Train node is conditional on rain and maintenance
train = Node(ConditionalProbabilityTable([
    ["none", "yes", "on time", 0.8],
    ["none", "yes", "delayed", 0.2],
    ["none", "no", "on time", 0.9],
    ["none", "no", "delayed", 0.1],
    ["light", "yes", "on time", 0.6],
    ["light", "yes", "delayed", 0.4],
    ["light", "no", "on time", 0.7],
    ["light", "no", "delayed", 0.3],
    ["heavy", "yes", "on time", 0.4],
    ["heavy", "yes", "delayed", 0.6],
    ["heavy", "no", "on time", 0.5],
    ["heavy", "no", "delayed", 0.5],
], [rain.distribution, maintenance.distribution]), name="train")

# Appointment node is conditional on train
appointment = Node(ConditionalProbabilityTable([
    ["on time", "attend", 0.9],
    ["on time", "miss", 0.1],
    ["delayed", "attend", 0.6],
    ["delayed", "miss", 0.4]
], [train.distribution]), name="appointment")

In [2]:
# Create a Bayesian Network and add states
model = BayesianNetwork()
model.add_states(rain, maintenance, train, appointment)

# Add edges connecting nodes
model.add_edge(rain, maintenance)
model.add_edge(rain, train)
model.add_edge(maintenance, train)
model.add_edge(train, appointment)

# Finalize model
model.bake()

In [3]:
# Calculate probability for a given observation
probability = model.probability([["none", "no", "on time", "attend"]])

print(probability)

0.34019999999999995


In [5]:
# Calculate predictions based on the evidence that the train was delayed
predictions = model.predict_proba({
    "rain": "heavy",
    "train": "delayed"
})

# Print predictions for each node
for node, prediction in zip(model.states, predictions):
    if isinstance(prediction, str):
        print(f"{node.name}: {prediction}")
    else:
        print(f"{node.name}")
        for value, probability in prediction.parameters[0].items():
            print(f"    {value}: {probability:.4f}")

rain: heavy
maintenance
    no: 0.8824
    yes: 0.1176
train: delayed
appointment
    attend: 0.6000
    miss: 0.4000


## approximate inference: rejection sampling

Take a sample of all of the variables in the Bayesian network using RNG
- For *Rain*, *none* is sampled 70% of the time
- For *Maintanance*, we then sample from the R=none row, and we choose *yes* 40% of the time
- For *Train*, we again choose the first row, and sample *on time* 80% of the time
- For *Appointment*, *attend* is chosen 90% of the time in this sample

These samples are taken hundreds or thousands of times

- Query: $P(\text{Train = on time})$
- 6 of the 8 samples have the train on time, so we get a 6/8 probability
- For conditional, reject the samples that don't meet the conditional

In [9]:
from collections import Counter
import pomegranate

def generate_sample():

    # Mapping of random variable name to sample generated
    sample = {}

    # Mapping of distribution to sample generated
    parents = {}

    # Loop over all states, assuming topological order
    for state in model.states:

        # If we have a non-root node, sample conditional on parents
        if isinstance(state.distribution, pomegranate.ConditionalProbabilityTable):
            sample[state.name] = state.distribution.sample(parent_values=parents)

        # Otherwise, just sample from the distribution alone
        else:
            sample[state.name] = state.distribution.sample()

        # Keep track of the sampled value in the parents mapping
        parents[state.distribution] = sample[state.name]

    # Return generated sample
    return sample

In [14]:
# Rejection sampling
# Compute distribution of Appointment given that train is delayed
N = 10000
data = []

# Repeat sampling 10,000 times
for i in range(N):

    # Generate a sample based on the function that we defined earlier
    sample = generate_sample()

    # If, in this sample, the variable of Train has the value delayed, save the sample. Since we are interested interested in the probability distribution of Appointment given that the train is delayed, we discard the sampled where the train was on time.
    if sample["train"] == "delayed":
        data.append(sample["appointment"])

# Count how many times each value of the variable appeared. We can later normalize by dividing the results by the total number of saved samples to get the approximate probabilities of the variable that add up to 1.
print(Counter(data))

Counter({'attend': 1309, 'miss': 881})


## likelyhood weighting

- Start by fixing the values for evidence variables
- Sample non-evidence variables using conditional probabilities in the Bayesian Network
- Weight each sample by its likelyhood: the probability of all of the evidence

### sample

- R = light
- M = yes
- \[T = on time\] FIXED
- A = attend
- weight: P(on time, light, yes) = 0.6

## uncertainty over time

- $X_t$: Weather at time $t$

**Markov Assumption**: the assumption that the current state depends only on a finite fixed number of previous states

**Markov Chain**: a sequence of random variables where the distribution of each variable follows the Markov assumption

Assume: I can predict sun or rain today just using today's weather
### trainsition model
||$X_{t+1}=sun$|$X_{t+1}=rain$|
|:--:|:--:|:--:|
|$X_t=sun$|0.8|0.2|
|$X_t=rain$|0.3|0.7|



In [16]:
# Define starting probabilities
start = DiscreteDistribution({
    "sun": 0.5,
    "rain": 0.5
})

# Define transition model
transitions = ConditionalProbabilityTable([
    ["sun", "sun", 0.8],
    ["sun", "rain", 0.2],
    ["rain", "sun", 0.3],
    ["rain", "rain", 0.7]
], [start])

# Create Markov chain
m_model = MarkovChain([start, transitions])

# Sample 50 states from chain
print(m_model.sample(50))

['rain', 'sun', 'sun', 'sun', 'sun', 'rain', 'rain', 'rain', 'rain', 'sun', 'sun', 'sun', 'sun', 'rain', 'rain', 'rain', 'sun', 'sun', 'sun', 'rain', 'rain', 'sun', 'sun', 'sun', 'sun', 'sun', 'sun', 'sun', 'sun', 'sun', 'sun', 'rain', 'sun', 'sun', 'sun', 'sun', 'sun', 'sun', 'rain', 'sun', 'rain', 'rain', 'rain', 'rain', 'sun', 'sun', 'sun', 'rain', 'rain', 'rain']
