# Measures of Model Fit
- Log-Likelihood
- Deviance

## Log-Likelihood

- GLM produces mean and dispersion values. These together with the selected distribution give you the probability of observing the outcome.
    - Multiply prob(outcomes) for each record $\rightarrow$ likelihood.
        - GLM is fit by finding parameters that maximize the likelihood.
        - Log(likelihood) since these numbers are extremely small.




- Log-likelihood scale:
    - <b>Null Model:</b>  No predictors. Only intercept term.
    - <b>Saturated Model:</b> One predictor for each record.<br><br>

$$\require{AMScd}
\begin{CD}
    \text{Null Model} @<worse<< \text{Model} @>better>> \text{Saturated Model}
\end{CD}$$


## Deviance
<br><br>
$$ \text{Scaled Deviance = } 2 \cdot (ll_{saturated} - ll_{model})$$
<br>
- Can be thought of as the magnitude by which the model is far from "perfect" model.
- Fitted GLM coefficients minimize deviance.
- Adding predictors always reduces deviance since more degrees of freedom.

## Limitations on the Use of Log-Likelihood and Deviance

1. Model comparison only valid for identical datasets.
    - If different datasets used then sum of log-likelihood would be necessarily different.
    - When adding variables, some rows could be dropped which invalidates this comparison.<br><br>
2. The assumed distribution and the dispersion parameter must be identical.  

# Comparing Models

- Nested models
    - F-Test<br><br>
- Non-nested models 
    - AIC
    - BIC
    - Deviance Residuals

## F-Test

- Only valid if a model is subset of another model (i.e. model after adding/removing variable(s)).
- <b>Question answered by F-Test:</b> Did the added predictors significantly reduce the deviance?<br><br>

$$F = \frac{\text{UD}_{Small}-\text{UD}_{Big}}{(\text{# of added params}) \cdot \hat{\phi}_{Big}}$$<br><br>
$$Dof = \frac{\text{# of added params}}{\text{# of records - # of params$_{Big}$}}$$<br><br>

$\hat{\phi}_B$ is a good esimate for the amount by which we can expect deviance to reduce for each new added parameter to the model with no predictive power.

<center><img src = 'images/Deviance_F.JPG'></center>

## AIC and BIC

$$ \begin{align}
AIC & = -2 \cdot \text{log-likelihood} + 2p \\ \\
BIC & = -2 \cdot \text{log-likelihood} + p \cdot log(n)
\end{align}$$<br><br>

- Smaller values are desired - means lower penalty term.
- Authors prefer AIC since BIC tends to overpenalize due to inclusion of log(n).

## Deviance Residuals
<br><br>
$$\begin{align}
\text{Deviance Residual} & = \sqrt{2\phi \cdot \big(ln f(y_i|\mu_i = y_i) - ln f(y_i|\mu_i = \mu_i)\big)} \cdot sign(y_i-\mu_i) \\ \\
& = \text{(Record's contribution to the deviance)}  \cdot sign(y_i-\mu_i)
\end{align}$$<br><br>

- Can be thought of as raw residual adjusted for the shape of the assumed GLM distribution $\rightarrow$ it should be normally distributed.

#### Properties of deviance residuals

1. They follow no predictable pattern.
2. They are normally distributed with constant variance.
    - Constant variance/<b>homoscedasticity</b> - The deviance residuals don't spread as $\mu$ increases.

#### Ways of assessing normality of deviance residuals

- Create histrogram of deviance residuals and fit normal curve to it.
- Create q-q plot. Points should fall on x = y line.

<center><img src='images/Dev_Resid.JPG'></center><br><br>

- The histogram seems to have more right-skew as compared to normal distribution.
- Q-Q plot shows that sample has more values on the left and right than expected.

#### Discrete distributions and deviance residuals

- Deviance residuals for discrete distributions do not follow a normal distribution.
    - Deviance residuals do not adjust for discreteness.
        - You end up with cluster of values.<br><br>
        
<b>Solution</b><br>
- Use randomized quantile residuals - adds random jitter to the discrete points so that they are spread smoothly over the distribution.

## Assessing Model Stability

- Use <b>Cook's distance</b> to identify most influential records.   
    - Give less weight to these records if their removal dramatically changes the parameter estimates.<br><br>
- Use <b>cross validation</b> to see if the parameter estimates are consistent between different folds.<br><br>
- Use <b>bootstrapping</b> to create new datasets and fit parameters.
    - Provides a sense of stability of parameter estimates.