# Factor Graphs


###  Goals of this lecture

- 

### Materials

- [Video lecture](https://www.youtube.com/watch?v=Fv2YbVg9Frc&t=31) by Frederico Wadehn (ETH Zurich)
- [Loeliger, 2004](...), read sections ...
- [Loeliger, 2007](...), read sections ...

### Why Factor Graphs?

- Factor graphs provide an efficient approach to attacking the most important computational problems in probabilistic modelling:

  - **Marginalization**
  
$$
\bar{f}(x_2) = \int f(x_1,x_2,x_3,x_4,x_5) \, \mathrm{d}x_1  \mathrm{d}x_3 \mathrm{d}x_4 \mathrm{d}x_5 
$$

  - **Maximization** (computing the "max-marginal")
  
$$
\hat{f}(x_2) = \max_{x_1,x_3,x_4,x_5} f(x_1,x_2,x_3,x_4,x_5)  
$$

- Since these computations suffer from the "curse of dimensionality", we often need to solve a simpler problem in order to get an answer. 

- Factorization helps here, e.g., if $f(x_1,x_2,x_3,x_4,x_5) = \prod_{k=1}^5 f_k(x_k)$, then 

$$
\hat{f}(x_2) = f_2(x_2) \prod_{k=1,3,4,5} \max f_k(x_k)
$$
which usually is _much_ easier to compute.






###  Construction Rules

- Consider a function 
$$
f(x_1,x_2,x_3,x_4,x_5) = f_a(x_1,x_2,x_3) \cdot f_b(x_3,x_4,x_5) \cdot f_c(x_4)
$$

- The factorization of this function can be graphically represented by a **Forney-style Factor Graph** (FFG):

\begin{center}
INSERT FFG EXAMPLE GRAPH
\end{center}

- An FFG is an **undirected** graph subject to the followong construction rules ([Forney, 2001](http://ieeexplore.ieee.org/xpl/login.jsp?tp=&arnumber=910573&url=http%3A%2F%2Fieeexplore.ieee.org%2Fiel5%2F18%2F19638%2F00910573.pdf%3Farnumber%3D910573), [Loeliger, 2004]())

  1. A **node** for every factor;
  1. An **edge** (or **half-edge**) for every variable;
  1. Node $g$ is connected to edge $x$ **iff** variable $x$ appears in factor $g$.

#### Some Terminology

- $f$ is called the **global function** and $f_\bullet$ are the **factors**. 
- A **configuration** is an assigment of values to all variables.
- The **configuation space** is the set of all configutations, i.e., the domain of $f$
- A configution $\omega=(x_1,x_2,x_3,x_4,x_5)$ is said to be **valid** iff $f(\omega) \neq 0$
  

###  Equality Nodes for Branching Points


- Note that a variable can appear in maximally two factors in an FFG.

- Consider the factorization (where $x_2$ appears in three factors) 
$$
 f(x_1,x_2,x_3,x_4) = f_a(x_1,x_2)\cdot f_b(x_2,x_3) \cdot f_c(x_2,x_4)
$$

- For the factor graph representation, we will instead consider the function $g$, defined as
$$
 g(x_1,x_2,x_2^\prime,x_2^{\prime\prime},x_3,x_4) = f_a(x_1,x_2)\cdot f_b(x_2^\prime,x_3) \cdot f_c(x_2^{\prime\prime},x_4) \cdot f_=(x_2,x_2^\prime,x_2^{\prime\prime})
$$
  where 
$$
f_= \triangleq \delta(x-x_2^\prime) \delta(x-x_2^{\prime\prime})
$$
  (where $\delta$ is the Kronecker delta for discrete variables and the Dirac delta for continuously valued variables)
  
- Note that through introduction of auxiliary variables $X_2^\prime$ and $X_2^{\prime\prime}$ each variable in $g$ appears in maximally two factors. 

[show graph of g]

- Also note that $f$ is a marginal of $g$, since
$$
f(x_1,x_2,x_3,x_4) = \int g(x_1,x_2,x_2^\prime,x_2^{\prime\prime},x_3,x_4)\, \mathrm{d}x_2^\prime \mathrm{d}x_2^{\prime\prime}
$$

- Therefore, any inference problem on $f$ can be executed by a corresponding inference problem on $g$, e.g.,
\begin{align}
f(x_1|x_2) &\triangleq \frac{\int f(x_1,x_2,x_3,x_4) \,\mathrm{d}x_3 \mathrm{d}x_4 }{ \int f(x_1,x_2,x_3,x_4) \,\mathrm{d}x_1 \mathrm{d}x_3 \mathrm{d}x_4} \\
  &= \frac{\int g(x_1,x_2,x_2^\prime,x_2^{\prime\prime},x_3,x_4) \,\mathrm{d}x_2^\prime \mathrm{d}x_2^{\prime\prime} \mathrm{d}x_3 \mathrm{d}x_4 }{ \int g(x_1,x_2,x_2^\prime,x_2^{\prime\prime},x_3,x_4) \,\mathrm{d}x_1 \mathrm{d}x_2^\prime \mathrm{d}x_2^{\prime\prime} \mathrm{d}x_3 \mathrm{d}x_4} \\
  &\triangleq g(x_1|x_2)
\end{align}

$\Rightarrow$ Any factorization of a global function $f$ can be represented by a Forney-style Factor Graph.

- More generally, equality nodes are useful to express hard constraints between variables. 


### Probabilistic Models as Factor Graphs

- FFGs can be used to express conditional independence (factorization) in probalistic models. 
- For example, the (previously shown) graph for $f_a(x_1,x_2,x_3) \cdot f_b(x_3,x_4,x_5) \cdot f_c(x_4)$ would also represent the model
$$
p(x_1,x_2,x_3,x_4,x_5) = p(x_1,x_2|x_3) \cdot p(x_3,x_5|x_4) \cdot p(x_4)
$$
where we identify $f_a(x_1,x_2,x_3)=p(x_1,x_2|x_3)$, $f_b(x_3,x_4,x_5)= p(x_3,x_5|x_4)$ and $f_c(x_4)$

[show graph]

- Factorizations provide opportunities to cut on the amount of needed computations when doing inference. In what follows, we will use FFGs to process these opportunities in an automatic way (i.e., by message passing). 

### Processing Observations in a Factor Graph

- Consider a generative model 
$$p(x,y_1,y_2) = p(x)\,p(y_1|x)\,p(y_2|x) .$$ 
This model expresses the assumption that $Y_1$ and $Y_2$ are independent measurements of $X$.

[show the FFG]

- Assume that we are interested in the posterior for $X$ after observing $Y_1=y_1$ and $Y_2=y_2$. Using the product rule we get
$$
p(x|y_1,y_2) = \frac{p(x,y_1,y_2)}{p(y_1,y_2)} \propto p(x,y_1,y_2) = p(x)\,p(y_1|x)\,p(y_2|x)
$$
- Crucially, aside from a scaling factor $\frac{1}{p(y_1,y_2)}$, the factorizations for the posterior $p(x|y_1,y_2)$ and the full model $p(x,y_1,y_2)$ are the same. 

$\Rightarrow$ Making observations does not change the factor graph.



### Inference by Closing Boxes

- Assume we wish to compute the marginal
$$
\bar{f}(x_3) = \sum_{x_1,x_2,x_4,x_5,x_6,x_7}f(x_1,x_2,\ldots,x_7)
$$
where $f$ is factorized as given by the following FFG

<img src="message-passing-in-FFG-1.png">

- Due to the factorization, we decompose this sum by the distributive law as
\begin{align}
\bar{f}(x_3) = & \underbrace{ \left( \sum_{x_1,x_2} f_a(x_1)\,f_b(x_2)\,f_c(x_1,x_2,x_3)\right) }_{\overrightarrow{\mu}_{X_3}(x_3)}  \\
  & \underbrace{ \cdot\left( \sum_{x_4,x_5} f_d(x_4)\,f_e(x_3,x_4,x_5) \cdot \underbrace{ \left( \sum_{x_6,x_7} f_f(x_5,x_6,x_7)\,f_g(x_7)\right) }_{\overleftarrow{\mu}_{X_5}(x_5)} \right) }_{\overleftarrow{\mu}_{X_3}(x_3)}
\end{align}
which is computationally (much) lighter than executing the full sum $\sum_{x_1,\ldots,x_7}f(x_1,x_2,\ldots,x_7)$

- Note that the "message" $\overleftarrow{\mu}_{X_5}(x_5)$ is obtained by executing the "sum-product"  $
\sum_{ \stackrel{ \textrm{enclosed} }{ \textrm{variables} } } \prod_{\stackrel{ \textrm{enclosed} }{ \textrm{factors} }}$ in the red box. This operation is called **closing the box**. Closing the box marginalizes the internal variables away and leads to a new factor with outgoing message $\overleftarrow{\mu}_{X_5}(x_5)$.   


- Observe a shared pattern for computing the "messages":
\begin{align}
\overrightarrow{\mu}_{X_3}(x_3) &= \sum_{x_1,x_2} \overrightarrow{\mu}_{X_1}(x_1) \overrightarrow{\mu}_{X_2}(x_2) f_c(x_1,x_2,x_3) \\
\overleftarrow{\mu}_{X_5}(x_5) &= \sum_{x_6,x_7} \overrightarrow{\mu}_{X_7}(x_7) f_f(x_5,x_6,x_7) \\
\overleftarrow{\mu}_{X_3}(x_3) &= \sum_{x_4,x_5} \overrightarrow{\mu}_{X_4}(x_4) \overleftarrow{\mu}_{X_3}(x_3) f_e(x_3,x_4,x_5)
\end{align}
where we defined $\overrightarrow{\mu}_{X_1}(x_1) \triangleq f_a(x_1)$, $\overrightarrow{\mu}_{X_2}(x_2) \triangleq f_b(x_2)$ etc. 

- This pattern for computing messages applies generally and is called the **sum-product rule**:
$$
\overrightarrow{\mu}_{Y}(y) = \sum_{x_1,\ldots,x_n} \overrightarrow{\mu}_{X_1}(x_1)\cdots \overrightarrow{\mu}_{X_n}(x_n) \,f(y,x_1,\ldots,x_n) 
$$

[show graph]

The **Sum-Product Theorem**
- If the factor graph for a function $f$ has no cycles, then 
$$
f_X(x) = \overrightarrow{\mu}_{X}(x)\cdot \overleftarrow{\mu}_{X}(x)
$$

The **Sum_Product Algorithm**

  

