## 18. Loglinear Models

### 18.1 The Loglinear Model

Let $X = (X_1, \dots, X_m)$ be a random vector with probability

$$ f(x) = \mathbb{P}(X = x) = \mathbb{P}(X_1 = x_1, \dots, X_m = x_m) $$

Let $r_j$ be the number of values that $X_j$ takes; without loss of generality, assume $X_j \in \{ 0, 1, \dots, r_j - 1 \}$.  Suppose we have $n$ such vectors.

We can think of the data as a sample from a Multinomial with $N = r_1 \times r_2 \times \dots \times r_m$ categories.  The data can be represented as counts in a $r_1 \times r_2 \times \dots \times r_m$ table.  Let $p = \{ p_1, \dots, p_N \}$ denote the multinomial parameter.

Let $S = \{ 1, \dots, m \}$.  Given a vector $x = (x_1, \dots, x_m)$ and a subset $A \subset S$, let $x_A = (x_j : j \in A)$.  For example, if $A = \{1, 3\}$ then $x_A = (x_1, x_3)$.

**Theorem 18.1**.  The joint probability function $f(x)$ of a single random vector $X = (X_1, \dots, X_m)$ can be written as 

$$ \log f(x) = \sum_{A \subset S} \psi_A(x) $$

where the sum is over all subsets $A$ of $S = \{1, \dots, m \}$ and the $\psi$'s satisfy the following conditions:

1. $\psi_\varnothing(x)$ is a constant;
2. For every $A \subset S$, $\psi_A(x)$ is only a function of $x_A$ and not the rest of the $x_j$'s.
3. If $i \in A$ and $x_i = 0$, then $\psi_A(x) = 0$.

The formula in this theorem is known as the **log-linear expansion** of $f$.  Note that this is the probability function for a single draw.  Each $\psi_A(x)$ will depend on some unknown parameters $\beta_A$.  Let $\beta = (\beta_A : A \subset S)$ be the set of all these parameters.  We will write $f(x) = f(x; \beta)$ when we want to estimate the dependence on the unknown parameters $\beta$.

In terms of the multinomial, the parameter space is

$$ \mathcal{P} = \left\{ p = (p_1, \dots, p_N) : p_j \geq 0, \sum_{j=1}^N p_j = 1 \right\} $$

This is an $N - 1$ dimensional space.  In the log-linear representation, the parameter space is

$$ \Theta = \Bigg\{ \beta = (\beta_1, \dots, \beta_N) : \beta = \beta(p), p \in \mathcal{P} \Bigg\} $$

where $\beta(p)$ is the set of $\beta$ values associated with $p$.  The set $\Theta$ is a $N - 1$ dimensional surface in $\mathbb{R}^N$.  We can always go back and forth between the two parametrizations by writing $\beta = \beta(p)$ and $p = p(\beta)$.

**Theorem 18.14**.  Let $(X_a, X_b, X_c)$ be a partition of vectors $(X_1, \dots, X_m)$.  Then $X_b \text{ ⫫ } X_c \; | \; X_a$ if and only if all the $\psi$-terms in the log-linear expansion that have at least one coordinate in $b$ and one coordinate in $c$ are 0.

To prove this Theorem, we will use the following Lemma whose proof follows from the definition of conditional independence.

**Lemma 18.5**.  A partition $(X_a, X_b, X_c)$ satisfies $X_b \text{ ⫫ } X_c \; | \; X_a$ if and only if $(x_a, x_b, x_c) = g(x_a, x_b) h(x_a, x_c)$ for some functions $g$ and $h$.

**Proof of Theorem 18.14**.  Suppose that $\psi_t$ is 0 whenever $t$ has coordinates in $b$ and $c$.  Hence, $\psi_t$ is 0 if $t$ is not a subset of $a \cup b$ or $t$ is not a subset of $a \cup c$.  Therefore,

$$ \log f(x) = \sum_{t \subset a \cup b} \psi_t(x) + \sum_{t \subset a \cup c} \psi_t(x) - \sum_{t \subset a} \psi_t(x) $$

Exponentiating, we see that the joint density is of the form $g(x_a, x_b) h(x_a, x_c)$.  By Lemma 18.5, $X_b \text{ ⫫ } X_c \; | \; X_a$.  The converse follows by reversing the argument.

### 18.2 Graphical Log-Linear Models

Let $\log f(x) = \sum_{A \subset S} \psi_A(x)$ be a log-linear model.  Then $f$ is **graphical** if all $\psi$-terms are non-zero except for any pair of coordinates not in the edge set for some graph $\mathcal{G}$.  In other words, $\psi_A(x) = 0$ if and only if $\{i, j\} \subset A$ and $(i, j)$ is not an edge.

Here is a way to think about this definition: if you can add a term to the model and the graph does not change, then the model is not graphical.