# Details of Parameter Estimation

This note is meant to supplement our [paper](https://doi.org/10.1093/gigascience/giaa044). For a review on generalized linear models, I recommend chapter 15.3 of [Applied regression analysis and generalized linear models](https://www.amazon.com/Applied-Regression-Analysis-Generalized-Linear/dp/1452205663/ref=sr_1_2?dchild=1&keywords=Applied+Regression+Analysis+and+Generalized+Linear+Models&qid=1609298891&s=books&sr=1-2) by John Fox, or chapter 3-5 of [An introduction to generalized linear models](https://www.amazon.com/Introduction-Generalized-Chapman-Statistical-Science/dp/1138741515/ref=sr_1_2?crid=18BN4MONNYYJH&dchild=1&keywords=an+introduction+to+generalized+linear+models&qid=1609298924&s=books&sprefix=an+introduction+to+ge%2Cstripbooks%2C222&sr=1-2) by Dobson and Barnett. 

## Generalized linear models

In `MendelIHT.jl`, phenotypes $(\bf y)$ are modeled as a [generalized linear model](https://en.wikipedia.org/wiki/Generalized_linear_model):
\begin{aligned}
    \mu_i = E(y_i) = g({\bf x}_i^t {\bf \beta})
\end{aligned}
where $\bf x$ is sample $i$'s $p$-dimensional vector of *covariates* (genotypes + other fixed effects), $\bf \beta$ is a $p$-dimensional regression coefficients, $g$ is a non-linear *inverse-link* function, $y_i$ is sample $i$'s phenotype value, and $\mu_i$ is the *average predicted value* of $y_i$ given $\bf x$. 

The regression coefficients $\bf \beta$ are not observed and are estimated via **maximum likelihood**. The full design matrix $\bf X$ (obtained by stacking each ${\bf x}_i^t$ row-by-row) and phenotypes $\bf y$ are observed. 

GLMs offer a natural way to model common non-continuous phenotypes. For instance, logistic regression for binary phenotypes and Poisson regression for integer valued phenotypes are special cases. Of course, when $g(\alpha) = \alpha,$ we get the standard linear model used for Gaussian phenotypes. 

## Implementation details of loglikelihood, gradient, and expected information

In GLM, the distribution of $\bf y$ is from the exponential family with density

$$f(y \mid \theta, \phi) = \exp \left[ \frac{y \theta - b(\theta)}{a(\phi)} + c(y, \phi) \right].$$

$\theta$ is called the **canonical (location) parameter** and under the canonical link, $\theta = g(\bf x^t \bf \beta)$. $\phi$ is the **dispersion (scale) parameter**. The functions $a, b, c$ are known functions that vary depending on the distribution of $y$. 

Given $n$ independent observations, the loglikelihood is:

\begin{aligned}
    L({\bf \theta}, \phi; {\bf y}) &= \sum_{i=1}^n \frac{y_i\theta_i - b(\theta_i)}{a_i(\phi)} + c(y_i, \phi).
\end{aligned}

To evaluate the loglikelihood, we use the [logpdf](https://juliastats.org/Distributions.jl/latest/univariate/#Distributions.logpdf-Tuple{Distribution{Univariate,S}%20where%20S%3C:ValueSupport,Real}) function in [Distributions.jl](https://github.com/JuliaStats/Distributions.jl).

The perform maximum likelihood estimation, we compute partial derivatives for $\beta$s. The $j$th score component is (eq 4.18 in Dobson):

\begin{aligned}
    \frac{\partial L}{\partial \beta_j} = \sum_{i=1}^n \left[\frac{y_i - \mu_i}{var(y_i)}x_{ij}\left(\frac{\partial \mu_i}{\partial \eta_i}\right)\right].
\end{aligned}

Thus the full **gradient** is

\begin{aligned}
    \nabla L&= {\bf X}^t{\bf W}({\bf y} - \mathcal{\mu}), \quad {\bf W}_{ii} = \frac{1}{var(y_i)}\left(\frac{\partial \mu_i}{\partial \eta_i}\right),
\end{aligned}

and similarly, the **expected information** is (eq 4.23 in Dobson):

\begin{aligned}
    J = {\bf X^t\tilde{W}X}, \quad {\bf \tilde{W}}_{ii} = \frac{1}{var(y_i)}\left(\frac{\partial \mu_i}{\partial \eta_i}\right)^2
\end{aligned}

To evaluate $\nabla L$ and $J$, note ${\bf y}$ and ${\bf X}$ are known, so we just need to calculate $\mu_i, \frac{\partial\mu_i}{\partial\eta_i},$ and $var(y_i)$. The first simply uses the inverse link: ${\bf \mu}_i = g({\bf x}_i^t {\bf \beta})$. For the second, note $\frac{\partial \mu_i}{\partial\eta_i} = \frac{\partial g({\bf x}_i^t {\bf \beta})}{\partial{\bf x}_i^t {\bf \beta}}$ is just the derivative of the link function evaluated at the linear predictor $\eta_i = {\bf x}_i^t {\bf \beta}$. This is already implemented for various link functions as [mueta](https://github.com/JuliaStats/GLM.jl/blob/master/src/glmtools.jl#L149) in [GLM.jl](https://github.com/JuliaStats/GLM.jl), which we call internally. To compute $var(y_i)$, we note that the exponential family distributions have variance

\begin{aligned}
    var(y) &= a(\phi)b''(\theta) = a(\phi)\frac{\partial^2b(\theta)}{\partial\theta} = a(\phi) var(\mu).
\end{aligned}

That is, $var(y_i)$ is a product of 2 terms where the first depends solely on $\phi$, and the second solely on $\mu = g({\bf x}_i^t {\bf \beta})$. In our code, we use [glmvar](https://github.com/JuliaStats/GLM.jl/blob/master/src/glmtools.jl#L315) implemented in [GLM.jl](https://github.com/JuliaStats/GLM.jl) to calculate $var(\mu)$. We assume $a(\phi) = 1$ because $\phi$ is unknown, which works well in practice. According to Fox, it is possible to estimate $\phi$ using method of moments (sec 15.1.1), but that is not implemented.

## Iterative hard thresholding

In `MendelIHT.jl`, the loglikelihood is maximized using iterative hard thresholding. This is achieved by

\begin{aligned}
    \beta_{n+1} = \overbrace{P_{S_k}}^{(3)}\big(\beta_n - \underbrace{s_n}_{(2)} \overbrace{\nabla f(\beta_n)}^{(1)}\big)
\end{aligned}

where $f$ is the function to minimize (i.e. negative loglikelihood), $s_k$ is the step size, and $P_{S_k}$ is a projection operator that sets all but $k$ largest entries in magnitude to $0$. I already discussed above how to compute the gradient of a GLM loglikelihood. To perform $P_{S_k}$, we first partially sort the *dense* vector $\beta_n - s_n \nabla f(\beta_n)$, and set all $k+1 ... n$ entries to $0$. Finally, the step size $s_n$ is derived in our paper to be

\begin{align*}
    s_n = \frac{||\nabla f(\beta_n)||_2^2}{\nabla f(\beta_n)^t J(\beta_n) \nabla f(\beta_n)}
\end{align*}

where $J(\beta_n)$ is the **expected information matrix** which I derived for GLM models above.