## 14.9 Example 3: Dementia

We study here how sex can modify the relation between dementia and bmi. This model is written

$$\mathrm{logit}(\pi_i) = \beta_0 + \beta_1bmi_i + \beta_2sex_i + \beta_3bmi_i\times sex_i$$

and can be estimated in `R` using the `glm` function. To include an interaction between two variables `x` and `z` into a model in `R`, we have to specify both the variable and the interaction term explicitly using the `:` symbol: `y ~ x + z + x:z`. We can also use the symbol `*` between two variables and both the variables and the interaction term will be included: `y ~ x*z`. Both formula will estimate the exact same model. Be careful however when using `*` with more than two variables and check that you actually want to include all the interaction terms.

In [1]:
dementia3 <- glm(dementia ~ bmi + sex + bmi:sex, data = dementia, family = binomial(link="logit"))
summary(dementia3)

ERROR: Error in is.data.frame(data): object 'dementia' not found


In [13]:
cbind("coefficients" = coefficients(dementia3), "exp(coefficients)" = exp(coefficients(dementia3)))

Unnamed: 0,coefficients,exp(coefficients)
(Intercept),-2.09422922,0.1231651
bmi,-0.07554006,0.9272426
sex,-0.15820459,0.8536751
bmi:sex,0.02104249,1.0212654


> *Exercise:* Run this model using the `*` notation in the formula and check that you are actually estimating the exact same model.


We will interpret separately on the two strata defined by the levels of the variable $sex_i$

* when $sex_i=0$, i.e. for women, $\mathrm{logit}(\pi_i) = \beta_0 + \beta_1bmi_i$
    * we do not interpret $\beta_0$ as a null bmi is not clinically relevant
    * $\exp(\beta_1) = 0.927$  is the odds-ratio of dementia for a $1$ unit increase of bmi at inclusion for a woman
* when $sex_i=1$, i.e. for men, $\mathrm{logit}(\pi_i) = (\beta_0 + \beta_2) + (\beta_1 + \beta_3)*bmi_i$
    * we do not interpret $\beta_0 + \beta_1$ as a null bmi is not clinically relevant
    * $\exp(\beta_1 + \beta_3) = 0.947$ is the odds-ratio of dementia for a $1$ unit increase of bmi at inclusion for a man
    
We note that according to the Wald test results, there is a significative association between the bmi and the dementia status but overall, no association with sex. However, we reject the null hypothesis $\beta_3=0$. Therefore, there exist an interaction between bmi and sex and the sex has an impact on the dementia status only through its interaction with the bmi.

The AIC for this last model is 37788 which is higher than the AIC of the previous model. We recall here that the lower the AIC, the better. Here the results is not very surprising as the second model included three different covariates that added more informations to the model.

*Remark:* As we said, if we want to compute the $95\%$ confindence interval for the OR of dementia among men for a $1$ unit increase of bmi at inclusion $\exp(\beta_1 + \beta_3)$, we need to know the covariance between $\hat{\beta_1}$ and $\hat{\beta_3}$. Thankfully, it is easy to access this matrix in `R` when estimating the model with the `glm` function. The function `vcov` which takes as argument a `glm` object will give the display the estimated covariance matrix. For example, we can compute the estimated covariate matrix for the third model `dementia3`. There is however no direct way to obtain the CI using the raw `R` output of the `glm` function.


In [14]:
vcov(dementia3)

Unnamed: 0,(Intercept),bmi,sex,bmi:sex
(Intercept),0.0236847582,-0.0008724659,-0.0236847582,0.0008724659
bmi,-0.0008724659,3.29684e-05,0.0008724659,-3.29684e-05
sex,-0.0236847582,0.0008724659,0.0364941671,-0.00135022
bmi:sex,0.0008724659,-3.29684e-05,-0.0013502196,5.146734e-05


By taking the square-root of the diagonal of this matrix, we find the same standard error as displayed in the summary of the object `dementia3` shown above.

In [15]:
sqrt(diag(vcov(dementia3)))

> *Exercise:* We can expect that by adding interaction terms into the second model between some of the covariates, the fit of the model might be better. Try to add some interaction terms into a fourth model and try to correctly interpret each parameter when possible. Compare the AIC of this fourth estimated model to the second and third models we estimated and comment.