# Poisson Regression

A Poisson distribution is the simplest approach to modelling count data.

## Rate Formulation

A Poisson model can be formulated in terms of a rate, $\lambda$, as follows:

$$
y \sim \text{Poisson}(\lambda)
$$

The rate, $\lambda$, is the number of counts per unit time (or length or population or whatever!) and is given by

$$
\lambda = \exp(\eta)
$$

which ensures that $\lambda > 0$ and

$$
\eta = \alpha + \beta x,
$$

so that

$$
\log(\lambda) = \alpha + \beta x.
$$

Here $\alpha$ and $\beta$ are the parameters of the model. The natural logarithm is the link function between the linear model and the Poisson distribution.

In [1]:
stan_poisson = """
data {
    int<lower=0> N;                              // Number of observations (with constraint)
    int<lower=0> counts[N];                      // Dependent variable
    real         x[N];                           // Independent variable
}
parameters {
    real         alpha;
    real         beta;
}
model {
    counts ~ poisson(exp(alpha + beta * x));     // Poisson model with explicit link function
}
"""

Stan has special functions for dealing with various link functions.

In [2]:
stan_poisson = """
data {
    int<lower=0> N;
    int<lower=0> counts[N];
    real         x[N];
}
parameters {
    real         alpha;
    real         beta;
}
model {
    counts ~ poisson_log(alpha + beta * x);      // Poisson model with implicit log() link function
}
"""

## Counts Formulation

Sometimes it makes more sense to formulate the model in terms of the number of counts. This is especially useful if the *exposure* varies between observations. In this case the rate is given by

$$
\lambda_i = \frac{\mu_i}{\tau_i}
$$

so that

$$
\log(\lambda_i) = \log(\mu_i) - \log(\tau_i)
$$

and

$$
\log(\mu_i) = \log(\tau_i) + \alpha + \beta x_i.
$$

In this case you would consider the model

$$
y \sim \text{Poisson}(\mu)
$$

where the exposure is explicitly included.