## 16.6 Introduction to Poisson Generalised Linear Modelling (Poisson Regression)

### 16.6.1 Poisson Distribution Recap
The Poisson distribution was first published by Siméon Denis Poisson in 1838. Poisson was a French mathematician, engineer, and physicist, his name is one of 72 engraved on the Eiffel Tower in Paris. The Poisson distribution is a skewed, discrete distribution restricted to non-negative numbers. The shape of the distribution is defined by the shape parameter $\lambda$ which represents the average number of events in the given time interval. As $\lambda$ increases the distribution looks more and more like the normal distribution. When $\lambda$ is about 10 or greater, then a normal distribution is a good approximation. 

### 16.6.2 Why can't we just use Ordinary Linear Regression?

One of the main assumptions required for fitting an ordinary linear regression (OLR) is that the residual errors must follow a normal distribution. For this to be achieved with data from a skewed distribution, a transformation must be applied however with discrete data this can be very problematic (making the interpretation of the findings unfeasibly difficult) or impossible (for example, a high number of 0’s could prevent normality from being achieved). Another issue is that an OLR has the ability to create negative predicted values which would be theoretically impossible. For these reasons it is better to apply a method which actually reflects the natural distribution instead of trying to make the distribution reflect the method. This is why a Poisson regression is generally more suited to count data than OLR.

### 16.6.3 Poisson Regression
A GLM for Poisson distributed outcome is commonly known as Poisson regression but is sometimes referred to as a log-linear model. 

Say we want to model $\mathbf{Y}$ using Poisson Regression so $\mathbf{Y} \sim P(\mu)$ and let the mean $\mu$ (and therefore the variance) depend on a vector of covariate variables $\mathbf{X}$. We will need to take the linear predictor $\mathbf{X}^T\mathbf{\beta}$ and combine it with a link function ($g$) so that the left hand side of the predictor $\mu$ is always non-negative. We include the link function like this:   

$E[\mathbf{Y}|\mathbf{X}] = \mu = g^{-1} (\mathbf{X}^T\mathbf{\beta})$

Similarly the variance of $\mathbf{Y}$ (Var$[\mathbf{Y}]$) is written:

 $ Var[\mathbf{Y}|\mathbf{X}] = Var[\mu] =Var[ g^{-1} (\mathbf{X}^T\mathbf{\beta})]$
 
The link function for Poisson regression is the logarithm, thus:

$ ln(E[\mathbf{Y} | \mathbf{X}]) = ln(\mu) = \mathbf{X}^T\mathbf{\beta}$

Here $\beta$ contains the regression coefficients. An element of $\beta$ represents the expected change in the natural $log$ of the mean per unit change of one explanatory variable in $X$ (constraining the other elements to not change). 

If you wanted to find the expected value of the outcome variable $\mathbf{Y}$ given $\mathbf{X}$ then the equation looks like this:

$ E[\mathbf{Y}|\mathbf{X}]  = \mu = e^{\mathbf{X^T\beta}}$,

where $\mathbf{\beta}$ can be estimated by the maximum likelihood. 