# GLMs

## What are we talking about?

A generalized linear model (GLM) is a linear model ($\eta = x^\top \beta$) wrapped in a transformation (link function) and equipped with a response distribution from an exponential family. The choice of link function and response distribution is very flexible. In a GLM, a predictive distribution for the response variable $Y$ is associated with a vector of observed predictors $x$.  The distribution has the form:

\begin{align*}
  p(y \, |\, x)
&=
  m(y, \phi) \exp\left(\frac{\theta\, T(y) - A(\theta)}{\phi}\right)
\\
  \theta
&:=
  h(\eta)
\\
  \eta
&:=
  x^\top \beta
\end{align*}

Here $\beta$ are the parameters, $\phi$ a parameter representing dispersion ("variance"), and $m$, $h$, $T$, $A$ are characterized by the user-specified model family.

The mean of $Y$ depends on $x$ by composition of **linear response** $\eta$ and (inverse) link function, i.e.:

$$
\mu := g^{-1}(\eta)
$$

where $g$ is the so-called **link function**.  

We usually work with a Tweedie distribution, which is a generalization of Poisson ($p=1$) and Gamma ($p=2$):

\begin{align*}
\theta &= \left\{\begin{array}{ll}
\frac{\mu^{1-p}}{1-p} & \text{if } p \neq 1\\
\log \mu              & \text{if } p = 1
\end{array}\right.,\\
T(y) &= y, \\
A(\theta) &= \left\{\begin{array}{ll}
\frac{\mu^{2-p}}{2-p} & \text{if } p \neq 2\\
\log \mu              & \text{if } p = 2.
\end{array}\right.,
\end{align*}

To fit a GLM, we minimize the negative log-likelihood (or typically the unit deviance) subject to an elastic net constraint:

$$
\min_{\beta} \mathcal L + \alpha \rho ||\beta||_1  + \frac{\alpha (1-\rho)}{2} ||\beta||_2^2
$$

With no Lasso constraint (i.e. $\rho=1$), these problems are usually estimated using Iteratively Reweighted Least Squares (IRLS). With Lasso constraint, some version of coordinate descent is used for the optimization. Other optimization approaches exist (e.g. cvxpy), but we haven't seen these used in large-scale applications.

## Features

For our use case in insurance, the critial features for GLMs to be useful for us are
- Need to support Gamma, Poisson
- Need to support sample weights 
- Need to support offsets (an offset is a variable for which we force the parameter to equal 1)
- Need to work on large-ish data sets (25 million rows, up to several hundred columns)
- Can deal with high-dimensional categorical variables (either through sparse matrices or a dedicated solution)
- Need to be otherwise sensible (e.g. do not regularize the intercept, etc)
- Decent performance in terms of CPU and memory (commercial competitor without regularization needs a couple of minutes on laptop)
- Reliably convergent algorithms

## Nice to have features

- Support Tweedie
- Ability to specify custom weights for parameters for the L1 penalty
- Ability to specify custom weight matrix for parameters for the L2 penalty (generalized Tikhonov regularization)

## Which packages/implementations exist?

- glmnet (R-Package, Fortran implementation, GPL-2, no Tweedie, no gamma)
https://glmnet.stanford.edu/

- pyglmnet (python only, no support of sample weights, no Tweedie, otherwise looks fairly feature-rich, under active development) https://github.com/glm-tools/pyglmnet

- python-glmnet (python bindings to glmnet's Fortran code, not actively developed) https://github.com/civisanalytics/python-glmnet

- glmnet_python (used at wayfair, GPL licensed) https://github.com/bbalasub1/glmnet_python

- scikit-learn fork (python only, has all the features we want, poor performance with sparse matrices, convergence trouble with lasso) https://github.com/scikit-learn/scikit-learn/pull/9405

- scikit-learn master (python only, only ridge, poor performnce with sparse matrices) https://github.com/scikit-learn/scikit-learn/pull/14300

- h2o (Java, python bindings, data needs to be copied from Python to Java all the time, convergence problems) http://docs.h2o.ai/h2o/latest-stable/h2o-docs/data-science/glm.html

- Tensorflow Probability (C++, python bindings, no sample weights for Lasso) https://www.tensorflow.org/probability/api_docs/python/tfp/glm

- statsmodels (python only, very poor performance, lots of issues getting it to work) https://www.statsmodels.org/stable/glm.html

## Other potentially useful reading material

- [Tensorflow GLM Tutorial](https://colab.research.google.com/github/tensorflow/probability/blob/master/tensorflow_probability/examples/jupyter_notebooks/Generalized_Linear_Models.ipynb)
- [Sklearn User guide](https://scikit-learn.org/dev/modules/linear_model.html#generalized-linear-regression)
- [Tutorial on Insurance Claims Modelling with sklearn](https://github.com/lorentzenchr/Tutorial_freMTPL2/blob/master/glm_freMTPL2_example.ipynb)
- [pyglmnet Tutorial](http://glm-tools.github.io/pyglmnet/tutorial.html)
- [Improved GLMNET](https://www.csie.ntu.edu.tw/~cjlin/papers/l1_glmnet/long-glmnet.pdf)
- [h2o Documentation on GLMs (fairly extensive)](http://docs.h2o.ai/h2o/latest-stable/h2o-docs/data-science/glm.html)