# Exact Inference for Graphical Models

We want to compute exactly the posterior marginals $p \left( x _ { t } | \mathbf { v } , \boldsymbol { \theta } \right)$, where $\mathbf(x)$ are hidden variables (discrete) and $\mathbf{v}$ are the visible variables. The resulting methods apply to both directed and undirected graphical models.

## Belief Propagation for trees
Belief Progagation (BP) or sum-product algorithm

### Serial Protocol

The model is a pairwise MRF (or CRF):

$$p ( \mathbf { x } | \mathbf { v } ) = \frac { 1 } { Z ( \mathbf { v } ) } \prod _ { s \in \mathcal { V } } \psi _ { s } \left( x _ { s } \right) \prod _ { ( s , t ) \in \mathcal { E } } \psi _ { s , t } \left( x _ { s } , x _ { t } \right)$$

where $\psi_s$ is the local evidence for node $s$, $\psi_{s,t}$ is the pairwise potential for edge $s-t$. For undirected trees, we pick an arbitrary node and call it the root $r$. Now orient all edges away from $r$. This gives us a well-defined notion of parent and child. Now we send messages up from the leaves to the root (the **collect evidence** phase) and then back down from the root (the **distribute evidence** phase), in a manner analogous to forwards-backwards on chains.

![](../images/20.BP.png)

+ We compute the bottom-up belief state at $t$ as follows:
$$\mathrm { bel } _ { t } ^ { - } \left( x _ { t } \right) \triangleq p \left( x _ { t } | \mathbf { v } _ { t } ^ { - } \right) = \frac { 1 } { Z _ { t } } \psi _ { t } \left( x _ { t } \right) \prod _ { c \in \operatorname { ch } ( t ) } m _ { c \rightarrow t } ^ { - } \left( x _ { t } \right)$$

+ We compute the bottom-up message:
$$m _ { s \rightarrow t } ^ { - } \left( x _ { t } \right) = \sum _ { x _ { s } } \psi _ { s t } \left( x _ { s } , x _ { t } \right) \operatorname { bel } _ { s } ^ { - } \left( x _ { s } \right)$$

+ Local belief at the root:
$$\operatorname { bel } _ { r } \left( x _ { r } \right) \triangleq p \left( x _ { r } | \mathbf { v } \right) = p \left( x _ { t } | \mathbf { v } _ { r } ^ { - } \right) \propto \psi _ { r } \left( x _ { r } \right) \prod _ { c \in \operatorname { ch } ( r ) } m _ { c \rightarrow r } ^ { - } \left( x _ { r } \right)$$

+ Probability of the evidence:
$$p ( \mathbf { v } ) = \prod _ { t } Z _ { t }$$

+ Top-down (real) belief:
$$\operatorname { bel } _ { s } \left( x _ { s } \right) \triangleq p \left( x _ { s } | \mathbf { v } \right) \propto \operatorname { bel } _ { s } ^ { - } \left( x _ { s } \right) \prod _ { t \in \operatorname { pa } ( s ) } m _ { t \rightarrow s } ^ { + } \left( x _ { t } \right)$$

+ Downward message: 
$$m _ { t \rightarrow s } ^ { + } \left( x _ { s } \right) \triangleq p \left( x _ { s } | \mathbf { v } _ { s t } ^ { + } \right) = \sum _ { x _ { t } } \psi _ { s t } \left( x _ { s } , x _ { t } \right) \frac { \operatorname { bel } _ { t } \left( x _ { t } \right) } { m _ { s \rightarrow t } ^ { - } \left( x _ { t } \right) }$$

### Parallel protocol
The basic idea is taht all nodes receive messages from their neighbors in parallel, they then updates their belief states, and finally they send new messages back out to their neighbors. This process repeats until convergence.

We initialize all messages to the all 1's vector. Then, in parallel, each node absorbs messages from all its neihbors using:
$$\operatorname { bel } _ { s } \left( x _ { s } \right) \propto \psi _ { s } \left( x _ { s } \right) \prod _ { t \in \mathrm { nbr } _ { s } } m _ { t \rightarrow s } \left( x _ { s } \right)$$

Then in parallel, each nodes sends messages to each of its neighbors:
$$m _ { s \rightarrow t } \left( x _ { t } \right) = \sum _ { x _ { s } } \left( \psi _ { s } \left( x _ { s } \right) \psi _ { s t } \left( x _ { s } , x _ { t } \right) \prod _ { u \in \mathrm { nbr } _ { s } \backslash t } m _ { u \rightarrow s } \left( x _ { s } \right) \right)$$

The $m_{s \rightarrow t}$ message is computed by multiplying together all incoming messages, except the one sent by the recipient, and then passing through the $\psi_{st}$ potential. At iteration T of the algorithm, $\operatorname{bel}_s(x_s)$ represents the posterior belief of $x_s$ conditioned on the evidence that is $T$ steps away in the graph. After $D(G) steps, is the diameter of the graph, every node has obtained information from all other nodes. Its local belief state is then the correct posterior marginal. 