**Linear mixed modeling from DataCamp**

This will serve as some notes for the Hierarchical and Mixed Effect Models in R from DataCamp [link](https://learn.datacamp.com/courses/hierarchical-and-mixed-effects-models). The sections will correspond to the sections in the course.

# Overview and introduction to hierarchical and mixed models

In [2]:
library(tidyverse)
library(WWGbook)   # for data loading
library(lme4)

Course overview:

- Components for mixed-effect models (see how it can be applied to student test scores)
- Applying and interpreting linear mixed-effect models (e.g. regression)
- Generalized linear mixed-effect models
- Repeated measure models (e.g. time-series analysis)

What is a hierarchical model?

- Data can have several types of structure, including being nested within itself, making it "hierarchical". (Example: evaluating whether sitting or standing has an effect on engagement time but whether it's an iPhone or Samsung matters more.)

Why do we use a hierarchical model?

- Data nested within itself.
    - Example: Each student has their own test score but student performance can vary because of classroom-level factors, such as teacher quality, or school-level factors such as building quality. Hence, one might ask "Are students really independent from other students in the same classroom or school?" (Probably not.)
- Pool information across small sample sizes. (What if each classroom has a different number of students?)
    - Example: Maybe 5th grade has 30 students while 3rd grade only has 5. By chance, the 3rd grade test scores are more likely to have high or low outliers because of the law of large numbers. By treating classroomms as a "random-effect" within the model, we can pool shared information about means across the classrooms within the same school.
- Repeated observations across groups or individuals.
    - Example: What if we revisit the same group of students year-after-year? Here, the observations are *not* independent across years. A repeated-measures analysis is another example of a hierarchical models and allows us to correct for this (described in chapter 4).

Other names for hierarchical models

- Hierarchical models: nested models, multi-level models
- Regression framework
    - "Pool" information
    - "Random-effect" versus a "fixed-effect"
    - "Mixed-effect" (linear mixed-effect model; LMM)
    - Linear mixed-effect regression (lmer)
- Repeated sampling: getting the same measurements of individuals or groups over time
    - Repeated measures
    - Paired-tests
    
Example for learning: school test scores

Meta-data:
- Gain in math scores for individual students from K to first grade
- Part of a national-level assessment in US
- Subset of data from West, Welch, and Galecki

Student-level variables:
- Studentid: `childid`
- Math test-score gain: `mathgain`
- Math kindergarten score: `mathdind`
- Student's sex: `sex`
- Student's minority status: `minority`
    
At the end of the chapter, we'll fit a multi-level model to our data. We'll explore the data. The purpose of the coding exercise is to show that linear models don't always produce intuitive results and that it is necessary to add a new technique to your modeling toolbox.

The data contains classroom and school-level data.

Classoom-level variables:
- Classroom id: `classid`
- Teacher's math training: `mathprep`
- Teacher's math test knowledge test score: `mathknow`
- Teacher's years teaching: `yearstea`

School-level variables:
- School id: `schoolid`
- School's household poverty level: `housepov`
- School's socioeconomic status: `ses`



In [6]:
attach(classroom)

In [7]:
head(classroom)

Unnamed: 0_level_0,sex,minority,mathkind,mathgain,ses,yearstea,mathknow,housepov,mathprep,classid,schoolid,childid
Unnamed: 0_level_1,<int>,<int>,<int>,<int>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<int>,<int>,<int>
1,1,1,448,32,0.46,1,,0.082,2.0,160,1,1
2,0,1,460,109,-0.27,1,,0.082,2.0,160,1,2
3,1,1,511,56,-0.03,1,,0.082,2.0,160,1,3
4,0,1,449,83,-0.38,2,-0.11,0.082,3.25,217,1,4
5,0,1,425,53,-0.03,2,-0.11,0.082,3.25,217,1,5
6,1,1,450,65,0.76,2,-0.11,0.082,3.25,217,1,6
