# Chapter 5 Generalized Linear Models




## 5.1 Logistic regression

To model a binary outcome, $(X_i, y_i)$, $y_i \in \{0,1\}$, $X_i \in \mathbb{R}^p$, we use 
$$
{\rm logit}(\pi_i)=X_i^{T} \beta,
$$
where $\pi_i = p(y_i=1|X_i)$ and ${\rm logit}(a) = \log(a/(1-a))$. 



**Estimation** can be done with maximum likelihood estimation 
$$
\mathcal{L} = \prod_{i=1}^n \pi^{Y_i} (1-\pi_i)^{1-Y_i}.
$$
Taking the logarithm yields 
$$
\log \mathcal{L} = \sum_{i=1}^n \big[ Y_i \log \pi + (1-Y_i) \log (1-\pi_i)\big].
$$
If ${\rm logit}(\pi)i)=\beta_0$, we can derive that $\pi_i = \exp(\beta_0)/[\exp(\beta_0)+1]$. 
Hence, $\log \pi_i = \beta_0 - \log[\exp(\beta_0)+1]$ and $\log (1-\pi_i) = 0 - \log[\exp(\beta_0)+1]$. 

Plugging these into the log-likelihood yields 
$$
\log \mathcal{L} = n_1 \beta_0 - n\log[\exp(\beta_0)+1].
$$
The first order condition yields 
$$
\frac{n_1}{n} = \frac{\exp(\beta_0)  }{\exp(\beta_0)+1}.
$$

With covariates $X$, a closed form solution is not available. However, one can find the solution using optimization. 


**Interpretation** 

A key concept for the logistic regression is the log odds ration, which is the interpretation of $\beta$. Consider $\beta_1$, we can see that 
$$
\beta_1= \log \left\{ \frac{ \pi(X_{1}+1,\cdots )/[1-\pi(X_{1}+1,\cdots )]}{\pi(X_{1},\cdots )/[1-\pi(X_{1},\cdots )]}    \right\}.
$$

The score function takes the form $U(\beta)= \partial \log \mathcal{L}/\partial \beta$. The MLE $\hat{\beta}$ satisfies that $U(\hat{\beta})=0$. The Fisher information is $I(\beta)=\mathbb{E} \big[-\partial^2 \log \mathcal{L}/\partial \beta^2 \big]$. The confidence interval for each $\beta_j$ is $\hat{\beta}_j \mp z_{1-\alpha/2} \hat{\rm se}(\hat{\beta}_j)$, where $z \sim N(0,1)$. 



**Hypothesis testing**

Consider the following null and alternative hypotheses $H_0: \beta_1=0$ v.s. $H_1: \beta_1=0$. There are three tests for this task. Note that all three tests are the same in linear regression, whereas they differ in the presence of nonlinearity. 

1. Likelihood ratio test. ${\rm LR}=-2 \big[ \log \mathcal{L}({\rm reduced}) -\log \mathcal{L}({\rm full}) \big]$. We have that ${\rm LR} \sim \chi^2_{K-k}$, under $H_0$ and large $n$. 

2. Score test. $S=U(\beta)/I(\beta) \sim \chi^2$ with $d.f.=1$.

3. Wald test. Robust wald test. For $K-k=1$, we have $W=\hat{\beta}_1/{\rm se}(\hat{\beta}_1) \sim N(0,1)$ under $H_0$ and large $n$. The robustness comes from a robust standard error. 



**Visualization**

Pearson residual: $r_{p_i}=(Y_i-\hat{\pi}_i)/[\hat{\pi}_i(1-\hat{\pi}_i) ]^{1/2}$

Deviance residual: ${\rm dev}_i = {\rm sign}(Y_i-\hat{\pi}_i) \sqrt{ -2[Y_i \log \hat{\pi}_i +(1-Y_i)\log  (1-\hat{\pi}_i) ]}$.

One can define $H^2 = \sum r_{p_i}^2$ and $G^2 = \sum {\rm dev}_i^2$, which are similar to ${\rm SSE}$. They do not follow a $\chi^2$ distribution, but we can use them anyway. 



## 5.2 Generalized linear model 

The logistic regression is a special case of the generalized linear model. 

Suppose that the outcome $y$ follows a distribution that belongs to the exponential family 
$$
p(y | \theta, \tau)= h(y,\tau) \exp\left\{ \frac{b(\theta) T(y)-A(\theta)  }{d(\tau)} \right\}.
$$
We usually use the canonical form 
$$
p(y | \theta, \tau)= h(y,\tau) \exp\left\{ \frac{\theta T(y)-A(\theta)  }{d \tau} \right\}.
$$

For regular use, we specify a GLM in the following manner. 

1. Pick an exponential form of distribution.
2. Set the predictor $X\beta$.
3. Pick a link function $g$ 
$$
\mathbb{E}[Y|X]=\mu = g^{-1}(y).
$$

Note that steps (1) and (3) are often tied. 
