# CS486 - Artificial Intelligence
## Lesson 24 - Dynamic Bayes Networks

Today we will wrap up Markov Models and introduce the **Bayesian Network**. 

In [1]:
import helpers
from aima.text import *
from aima.probability import *
from aima.utils import open_data

### Dynamic Bayes' Nets (DBNs)

In a traditional Hidden Markov Model, variables only condition on evidence observed in the current time-step and the hidden variable from the previous time-step. **Dynamic Bayes' Nets** allow multiple hidden variables and sources of evidence. There are edges between variables in one time step to variables in future time step where there is a causal relationship between them. 

![[DBN]](images/dbn.png)

DBNs are practically useful in instances where there are multiple sources of evidence. They are also computationally useful since the distributions for multiple hidden variables are smaller than a joint distributions across all of them. 

### Viterbi Algorithm

An HMM encodes the probability distribution of its possible outputs at any given time. Given the output of an HMM the **Viterbi Algorithm** can produce the sequence that most probably produced it. The algorithm is essentially the **Forward Algorithm** that keeps track of the most likely output at every time step: 

$$m_t[x_t] = P(e_t\mid{x_t})\max_{x_{t-1}}P(x_t\mid{x_{t-1}})m_{t-1}[x_{t-1}]$$

For a more visual idea of what's happening, consider the following HMM from [Wikipedia](https://en.wikipedia.org/wiki/Viterbi_algorithm) in which a doctor sees a patient three days in a row. On the first day the patient is normal; on the second he is cold; on the last day he is dizzy. Here is a diagram that capture the transition and emission models for the HMM:

![[Viterbi HMM]](images/viterbi_hmm.png)

The Viterbi Algorithm can produce the most probable sequence of events that explains the observations:

![[Viterbi]](images/viterbi.gif)

Consider a sentence without spaces. How do you find the most likely sequence of words? We can use our Unigram model and Viterbi:

In [3]:
flatland = open_data("EN-text/flatland.txt").read()
wordseq = words(flatland)

P = UnigramWordModel(wordseq)
text = "itiseasytoreadasentencewithoutspaces"

def viterbi_segment(text, P):
    """Find the best segmentation of the string of characters, given the
    UnigramWordModel P."""
    # best[i] = best probability for text[0:i]
    # words[i] = best word ending at position i
    n = len(text)
    words = [''] + list(text)
    best = [1.0] + [0.0] * n
    # Fill in the vectors best words via dynamic programming
    for i in range(n+1):
        for j in range(0, i):
            w = text[j:i]
            curr_score = P[w] * best[i - len(w)]
            if curr_score >= best[i]:
                best[i] = curr_score
                words[i] = w
    # Now recover the sequence of best words
    sequence = []
    i = len(words) - 1
    print(words)
    while i > 0:
        sequence[0:0] = [words[i]]
        i = i - len(words[i])
    # Return sequence of best words and overall probability
    return sequence, best[-1]

s, p = viterbi_segment(text,P)
print("Sequence of words is:",s)
print("Probability of sequence is:",p)

['', 'i', 'it', 'i', 'is', 'e', 'sea', 'seas', 'easy', 't', 'to', 'or', 're', 'a', 'read', 'a', 'as', 'e', 'n', 'sent', 'e', 'n', 'c', 'sentence', 'w', 'i', 'wit', 'with', 'o', 'u', 'without', 's', 'p', 'a', 'c', 'space', 'spaces']
Sequence of words is: ['it', 'is', 'easy', 'to', 'read', 'a', 'sentence', 'without', 'spaces']
Probability of sequence is: 2.839001552776948e-27


### Bayes' Nets

Bayes' Nets are **graphical models** that describe a joint distribution by describing the local conditional probabilities of random variables. Bayes' Nets are directed acyclic graphs in which nodes are random variables and edges are placed between variables that directly interact. Nodes that are not connected are conditionally independent. A Bayes' Net typical (but not necessarily) describes a noisy causal process. 

A Bayes' Net encodes the joint distribution across the variables without explicitly computing it. Each node carries a conditional distribution given its parents. For example, in the following network $John\ Calls\ {\perp\!\!\!\perp}\ Mary\ Call \mid{Alarm}$:

<img src="images/bayes_net.jpg" width="300">

You can compute the full join across all variables by multiplying all of the conditionals. The probability of a given full assignment is:

$$ P(x_1,x_2,x_3,...x_n)=\prod_{i=1}^nP(x_i\mid{parents(X_i})$$

Note that edges encode interaction between variables, but the direction does not actually matter. Edges only have direction to enforce a linearization of variables so that the chain rule can be applied when computing probabilities. 