# Linear Mixed Models in R

## Table of Contents
1. [Introduction](#intro)
2. [Environment Setup](#setup)
3. [Fundamental Concepts](#concepts)
4. [Analysis with Normal Distribution](#normal)
5. [Analysis with Poisson Distribution](#poisson)
6. [Analysis with Negative Binomial Distribution](#negative-binomial)
7. [Analysis with Zero-Inflated Distribution](#zero-inflated)
8. [Post Hoc Tests and Marginal Effects](#post-hoc)
9. [Conclusions](#conclusions)
10. [References](#references)

## 1. Introduction <a name="intro"></a>

Linear mixed models are powerful statistical tools for analyzing data with complex structures, such as repeated measures or hierarchical data. In this document, we will explore how to implement and interpret these models in R using different data distributions: normal, Poisson, negative binomial, and zero-inflated.

## 2. Environment Setup <a name="setup"></a>

First, install and load the required libraries:

In [2]:
# Crear un directorio en tu espacio de usuario para los paquetes de R
dir.create('~/R/x86_64-pc-linux-gnu-library/4.0', recursive = TRUE)

# Instalar los paquetes en ese directorio
install.packages(c("nlme", "glmmTMB", "emmeans", "multcomp", "MASS", "ggplot2", "patchwork"), lib = '~/R/x86_64-pc-linux-gnu-library/4.0')
install.packages('lme4', repos='http://cran.rstudio.com/', lib = '~/R/x86_64-pc-linux-gnu-library/4.0')
install.packages("TMB", type = "source", lib = '~/R/x86_64-pc-linux-gnu-library/4.0')

# Asegurarse de que R use el directorio local para los paquetes
.libPaths('~/R/x86_64-pc-linux-gnu-library/4.0')



also installing the dependencies ‘rprojroot’, ‘rstudioapi’, ‘diffobj’, ‘rematch2’, ‘brio’, ‘callr’, ‘desc’, ‘pkgload’, ‘praise’, ‘processx’, ‘ps’, ‘waldo’, ‘testthat’, ‘colorspace’, ‘minqa’, ‘nloptr’, ‘Rcpp’, ‘zoo’, ‘farver’, ‘labeling’, ‘munsell’, ‘R6’, ‘RColorBrewer’, ‘viridisLite’, ‘ellipsis’, ‘magrittr’, ‘pkgconfig’, ‘TMB’, ‘lme4’, ‘numDeriv’, ‘RcppEigen’, ‘estimability’, ‘mvtnorm’, ‘xtable’, ‘TH.data’, ‘sandwich’, ‘gtable’, ‘isoband’, ‘scales’, ‘tibble’, ‘withr’


also installing the dependency ‘RcppEigen’




In [3]:


# Carga de librerías
.libPaths(c("/home/docker/R/x86_64-pc-linux-gnu-library/4.1", .libPaths()))
library(lme4)        # Modelos lineales mixtos
#lme4::lmList
library(nlme)        # Modelos lineales y no lineales mixtos
library(glmmTMB)     # Modelos mixtos con distribuciones avanzadas
library(emmeans)     # Estimación de medias marginales
library(multcomp)    # Comparaciones múltiples
library(MASS)        # Funciones estadísticas adicionales
#MASS::geyser
library(ggplot2)     # Visualización de datos
library(patchwork)   # Combinación de gráficos
#patchwork::area

Loading required package: Matrix


Attaching package: ‘nlme’


The following object is masked from ‘package:lme4’:

    lmList


“Package version inconsistency detected.
TMB was built with Matrix version 1.4.0
Current Matrix version is 1.3.4
Please re-install 'TMB' from source using install.packages('TMB', type = 'source') or ask CRAN for a binary version of 'TMB' matching CRAN's 'Matrix' package”
Loading required package: mvtnorm

Loading required package: survival

Loading required package: TH.data

Loading required package: MASS


Attaching package: ‘TH.data’


The following object is masked from ‘package:MASS’:

    geyser



Attaching package: ‘patchwork’


The following object is masked from ‘package:MASS’:

    area




## 3. Fundamental Concepts <a name="concepts"></a>

> **Note:** Linear mixed models (LMM) combine fixed and random effects to capture both systematic relationships and unexplained variability in the data.

Selecting the appropriate distribution is crucial to model the data correctly, including normal, Poisson, negative binomial, and zero-inflated distributions.

## 4. Analysis with Normal Distribution <a name="normal"></a>

In this analysis, we fit a linear mixed model using the normal distribution to model continuous variables. For example, we can study how an educational program affects students' scores.

In [4]:
# Ajuste del modelo
model_normal <- lmer(score ~ program + (1 | school), data = data_normal)

# Resumen del modelo
summary(model_normal)

ERROR: Error: bad 'data': object 'data_normal' not found


Interpretation: The coefficient associated with the program indicates whether there was a significant effect of the experimental program compared to the control group.

## 5. Analysis with Poisson Distribution <a name="poisson"></a>

The Poisson distribution is used to model event counts. Here we show how to fit a model using this distribution to analyze, for example, the number of defects on a production line.

In [None]:
# Ajuste del modelo de Poisson
model_poisson <- glmer(defects ~ hours + (1 | machine), family = poisson, data = data_poisson)

# Resumen del modelo
summary(model_poisson)

Interpretation: The model allows us to evaluate how operating hours affect the number of observed defects.

## 6. Analysis with Negative Binomial Distribution <a name="negative-binomial"></a>

The negative binomial distribution is useful when count data show overdispersion. We fit a negative binomial model to achieve a better fit than a Poisson model.

In [None]:
# Ajuste del modelo binomial negativo
model_negbin <- glmer.nb(defects ~ hours + (1 | machine), data = data_poisson)

# Resumen del modelo
summary(model_negbin)

Interpretation: This model allows us to handle the additional variability present in count data more effectively.

## 7. Analysis with Zero-Inflated Distribution <a name="zero-inflated"></a>

In some cases, count data contain an excess of zeros. The zero-inflated distribution lets us model these data more accurately. For example, the number of website visits where many users do not visit the site at all.

In [None]:
# Ajuste del modelo cero-inflado
model_zero_inflated <- glmmTMB(visits ~ 1, ziformula = ~1, family = poisson, data = data_zero_inflated)

# Resumen del modelo
summary(model_zero_inflated)

Interpretation: The model includes a component for the excess zeros, allowing a better representation of the data.

## 8. Post Hoc Tests and Marginal Effects <a name="post-hoc"></a>

Post hoc tests and marginal effects are useful for interpreting model results and making comparisons between groups. We use the `emmeans` library to estimate marginal means and perform multiple comparisons.

In [None]:
# Medias marginales para el modelo normal
emmeans_normal <- emmeans(model_normal, ~ program)

# Comparaciones múltiples
contrast(emmeans_normal, method = "pairwise", adjust = "tukey")

Interpretation: These tests help us better understand the differences between groups after fitting the model.

## 9. Conclusions <a name="conclusions"></a>

In this document, we have explored how to apply linear mixed models in R using different data distributions. Each distribution has its own characteristics, and it is important to choose the right one to model the data correctly.

Linear mixed models allow us to capture both fixed and random effects, providing a more complete understanding of hierarchical or repeated measures data.

## 10. References <a name="references"></a>

- Pinheiro, J. C., & Bates, D. M. (2000). *Mixed-Effects Models in S and S-PLUS*. Springer.
- Gelman, A., & Hill, J. (2007). *Data Analysis Using Regression and Multilevel/Hierarchical Models*. Cambridge University Press.
- [lme4 package](https://cran.r-project.org/web/packages/lme4/index.html)
- [glmmTMB package](https://cran.r-project.org/web/packages/glmmTMB/index.html)
- [emmeans vignettes](https://cran.r-project.org/web/packages/emmeans/vignettes/emmeans.html)