# Generalized Linear Models
Many probability distributions can be generalized into the "exponential family."

### The exponential family
A pdf $p(x|\theta)$ for $\{x_1..x_m\}$ and $\theta$ in $R^D$ is in the exponential family if it can be written in the form
$$
\begin{align}
    p(x|\theta) &= \frac 1 {Z(\theta)} h(x) e^{\theta^T \phi(x)} \\
                &= h(x) e^{\theta^T \phi(x) - A(\theta)} \\
      Z(\theta) &= \int_{\chi^m} h(x) e^{\theta^T \phi(x)} dx \\
      A(\theta) &= log Z(\theta)
\end{align}
$$

| Parameter | Description |
|---------- | ----------- |
| $\theta$ | natural/canonical parameters |
| $\phi(x) \in R^D$ | vector of sufficient statistics |
| $Z(\theta)$ | partition function |
| $A(\theta)$ | log partition/cumulant function |
| h(x) | scaling constant |

### Bernoulli as a Subset of the Exponential Family
Recall the Bernoulli distribution as
$$
\begin{align}
P(x|\mu) &= \mu^{I(x=0)} (1 - \mu)^{I(x=1)} \\
            &= \mu^x (1 - \mu)^{1-x} \\
            &= e^{x\ ln\mu} e^{(1-x)\ ln(1-\mu)} \\
            &= e^{x\ ln\mu + (1-x)\ ln(1-\mu)} \\
            &= e^{\theta^T \phi(x)}
\end{align}
$$

where the canonical parameters $\theta$ are 

$$
\theta = [ln(\mu),\ ln(1 - \mu)]
$$

and the vector of sufficient statistics $\phi(x)$ is

$$
\begin{align}
\phi(x) &= [x,\ 1-x] \\
        &= [I(x=0),\ I(x=1)]
\end{align}
$$

An interesting note here is that the representation $P(x|\mu) = e^{\mu^T \phi(x)}$ in this way is _over-complete_ for the Bernoulli distribution. This isn't hard to see: we are using 2 entries in $\theta$ and $\phi(x)$ to describe the distribution, when knowing the value of one immediately gives us the value of the other. This is generally not desirable, as means that the canonical parameters $\theta$ cannot be uniquely determined from a distribution.

A way around this is to define the distribution as follows:
$$
P(x|\mu) = (1-\mu)e^{\left [x\ ln \frac {\mu} {1-\mu} \right ]}
$$

now we have single values for the exponential family descriptors:

$$
\phi(x) = x \\
\theta = ln \left [ \frac {\mu} {1-\mu} \right ] \\
Z = \frac 1 {1-\mu}
$$


You may recognize $\theta$ as the ubiguitous log-odds ratio. If we solve this for $\mu$, the mean parameter of the Bernoulli distribution, we get
$$
\begin{align}
\mu &= \frac {e^{\theta}} {e^{\theta} + 1} \\
    &= \frac {1} {1 + e^{-\theta}} \\
    &= sigm(\theta)
\end{align}
$$

### High-level Points
* The log-odds ratio is the canonical parameter for the Bernoulli distribution
* The mean parameter of the Bernoulli distribution is the sigmoid of its canonical parameter, the log-odds ratio
* The sigmoid function transforms the log-odds ratio into the mean parameter