Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

First lmer try #104

Merged
merged 11 commits into from Sep 7, 2020
Merged

First lmer try #104

merged 11 commits into from Sep 7, 2020

Conversation

datalorax
Copy link
Owner

This is not complete, but if you look at may last comment from #22 you'll see that all equations basically have a "distributed as" part and a variance/covariance part. This PR is the first try at getting the "distributed as" part. Only part way there, but I think it's a pretty good step. Examples below. I put all of this in a new R file for now.

For some reason the variable ordering is not holding. I'll look into that and get that fixed before we merge.

library(tidyverse)
library(lme4)

data("benchmarks", package = "esvis")

d <- benchmarks %>% 
  add_count(sid) %>% 
  filter(n == 3) %>% 
  mutate(wave = case_when(season == "Fall" ~ 0, 
                          season == "Winter" ~ 1,
                          TRUE ~ 2))

set.seed(123)
d$district <- rep(sample(1:9), length.out = nrow(d))

d <- d %>% 
  count(district, frl) %>% 
  pivot_wider(names_from = frl, values_from = n) %>% 
  mutate(district_prop_frl = FRL / (FRL + `Non-FRL`)) %>% 
  select(district, district_prop_frl) %>% 
  right_join(d) %>% 
  rename(sid_frl = frl) 

m0 <- lmer(math ~ 1 + (1|sid), d)
equatiomatic:::create_distributed_as_merMod(m0, ital_vars = FALSE) %>% 
  cat("$$", ., "$$")

Screen Shot 2020-09-01 at 1 22 04 PM

m1 <- lmer(math ~ wave + (1|sid), d)
equatiomatic:::create_distributed_as_merMod(m1, ital_vars = FALSE) %>% 
  cat("$$", ., "$$")

Screen Shot 2020-09-01 at 1 22 34 PM

m2 <- lmer(math ~ wave + (wave|sid), d)
equatiomatic:::create_distributed_as_merMod(m2, ital_vars = FALSE) %>% 
  cat("$$", ., "$$")

Screen Shot 2020-09-01 at 1 23 15 PM

m3 <- lmer(math ~ wave + cohort + (wave|sid), d)
equatiomatic:::create_distributed_as_merMod(m3, ital_vars = FALSE) %>% 
  cat("$$", ., "$$")

Screen Shot 2020-09-01 at 1 28 37 PM

m4 <- lmer(math ~ wave + cohort + (wave|sid) + (wave + cohort|district), d)
#> boundary (singular) fit: see ?isSingular
equatiomatic:::create_distributed_as_merMod(m4, ital_vars = FALSE) %>% 
  cat("$$", ., "$$")

Screen Shot 2020-09-01 at 1 25 21 PM

m5 <- lmer(math ~ wave*cohort + (wave|sid) + (wave*cohort|district), d)
#> boundary (singular) fit: see ?isSingular
equatiomatic:::create_distributed_as_merMod(m5, ital_vars = FALSE) %>% 
  cat("$$", ., "$$")

Screen Shot 2020-09-01 at 1 26 06 PM

@datalorax
Copy link
Owner Author

Okay, the misordering was coming from a merge. It should be fixed now.

@datalorax
Copy link
Owner Author

ah shoot... also realized there's some code formatting that needs to happen here. I went well past the 80 character line. Will fix before we merge. Trying to work on the var-covar part now.

@datalorax
Copy link
Owner Author

I'm done for the day but just wanted to share this quickly (haven't pushed any of this yet).

m5 <- lmer(math ~ wave * sid_frl + cohort + district_prop_frl + (wave|sid) + (wave + cohort + wave:sid_frl|district), d)

Overly complicated model that doesn't fit, but, with my local version anyway, you can now do this!

equatiomatic:::create_vcov_structure_merMod(model) %>% 
  cat("$$", ., "$$")

Which renders as

Screen Shot 2020-09-01 at 9 27 21 PM

The actual LaTeX for this is kind of crazy. It ends up looking like this.

\left(
       \begin{array}{c} \alpha_{j} \\ \beta_{1j} \\ \beta_{2j} \\ \beta_{3j} \end{array}
     \right)\sim N \left(\left(
       \begin{array}{c} \mu_{\alpha} \\ \mu_{\beta_{1}} \\ \mu_{\beta_{2}} \\ \mu_{\beta_{3}} \end{array}
     \right),\left(
    \begin{array}{cccc}\sigma^2_{\alpha} & \rho\sigma_{\alpha}\sigma_{\beta_{1}} & \rho\sigma_{\alpha}\sigma_{\beta_{2}} & \rho\sigma_{\alpha}\sigma_{\beta_{3}}\\\rho\sigma_{\alpha}\sigma_{\beta_{1}} & \sigma^2_{\beta_{1}} & \rho\sigma_{\beta_{1}}\sigma_{\beta_{2}} & \rho\sigma_{\beta_{1}}\sigma_{\beta_{3}}\\\rho\sigma_{\alpha}\sigma_{\beta_{2}} & \rho\sigma_{\beta_{1}}\sigma_{\beta_{2}} & \sigma^2_{\beta_{2}} & \rho\sigma_{\beta_{2}}\sigma_{\beta_{3}}\\\rho\sigma_{\alpha}\sigma_{\beta_{3}} & \rho\sigma_{\beta_{1}}\sigma_{\beta_{3}} & \rho\sigma_{\beta_{2}}\sigma_{\beta_{3}} & \sigma^2_{\beta_{3}}\end{array}
    \right) \right) \\ \left(
       \begin{array}{c} \alpha_{k} \\ \beta_{1k} \end{array}
     \right)\sim N \left(\left(
       \begin{array}{c} \mu_{\alpha} \\ \mu_{\beta_{1}} \end{array}
     \right),\left(
    \begin{array}{cc}\sigma^2_{\alpha} & \rho\sigma_{\alpha}\sigma_{\beta_{1}}\\\rho\sigma_{\alpha}\sigma_{\beta_{1}} & \sigma^2_{\beta_{1}}\end{array}
    \right) \right)

But it all builds automatically! Still a fair amount to do, including how to work in group-level predictors (they all go to top level at this point) and building in things like align environments. I'd also like to add the "for SID j in 1,...,J" part for each level. But this is promising!

@andrewheiss
Copy link
Collaborator

zomg this is incredible!

@datalorax
Copy link
Owner Author

Wooohooo!!!! This is all working now. There's still a fair amount of things I'd like to implement but we're basically there for these basic models!

Last thing that needs to be changed before we merge, though, is the codebase needs to be cleaned up. This is really unorganized right now. I'll work on that later today, and then we can merge it in if it all looks good to you?

library(equatiomatic)
library(lme4)
#> Loading required package: Matrix
library(tidyverse)

data("benchmarks", package = "esvis")

d <- benchmarks %>% 
  add_count(sid) %>% 
  filter(n == 3) %>% 
  mutate(wave = case_when(season == "Fall" ~ 0, 
                          season == "Winter" ~ 1,
                          TRUE ~ 2))

set.seed(123)
d$district <- rep(sample(1:9), length.out = nrow(d))

d <- d %>% 
  count(district, frl) %>% 
  pivot_wider(names_from = frl, values_from = n) %>% 
  mutate(district_prop_frl = FRL / (FRL + `Non-FRL`)) %>% 
  select(district, district_prop_frl) %>% 
  right_join(d) %>% 
  rename(sid_frl = frl) 
#> Joining, by = "district"

m0 <- lmer(math ~ 1 + (1|sid), d)
extract_eq(m0)
#> Registered S3 method overwritten by 'broom.mixed':
#>   method      from 
#>   tidy.gamlss broom

Screen Shot 2020-09-02 at 12 32 58 PM

m1 <- lmer(math ~ 1 + wave + (1|sid), d)
extract_eq(m1)

Screen Shot 2020-09-02 at 12 33 55 PM

m2 <- lmer(math ~ 1 + wave + (wave|sid), d)
extract_eq(m2)

Screen Shot 2020-09-02 at 12 33 19 PM

m3 <- lmer(math ~ 1 + wave + (wave|sid) + (1|district), d)
#> boundary (singular) fit: see ?isSingular
extract_eq(m3)

Screen Shot 2020-09-02 at 12 34 16 PM

m4 <- lmer(math ~ 1 + wave*cohort + (wave|sid) + (wave*cohort|district), d)
#> boundary (singular) fit: see ?isSingular
extract_eq(m4)

Screen Shot 2020-09-02 at 12 34 44 PM

@jrosen48
Copy link
Collaborator

jrosen48 commented Sep 2, 2020

Magical. Kicking the tires, on this branch:

On branch first-lmer-try
Your branch is up to date with 'origin/first-lmer-try'.

After building the package, I got what is probably a super easy to fix namespace error for stringr::str_split():

library(lme4)
#> Loading required package: Matrix
library(equatiomatic)

fm1 <- lmer(Reaction ~ Days + (Days | Subject), sleepstudy)

extract_eq(fm1)
#> Registered S3 method overwritten by 'broom.mixed':
#>   method      from 
#>   tidy.gamlss broom
#> Error in str_split(text, " "): could not find function "str_split"

Created on 2020-09-02 by the reprex package (v0.3.0)

@datalorax
Copy link
Owner Author

Ah shoot! I meant to use strsplit instead of str_split. Just pushed a fix.

@jrosen48
Copy link
Collaborator

jrosen48 commented Sep 2, 2020

library(lme4)
#> Loading required package: Matrix
library(equatiomatic)

fm1 <- lmer(Reaction ~ Days + (Days | Subject), sleepstudy)

extract_eq(fm1)
#> Registered S3 method overwritten by 'broom.mixed':
#>   method      from 
#>   tidy.gamlss broom

Screen Shot 2020-09-02 at 4 25 44 PM

Seriously magical. I will try a few more examples. My challenge now is not fully understanding the Gelman and Hill notation... wondering if some kind of documentation around how to "translate" between the Raudenbush & Bryk-esque notation and it might help others like me. On that note, perhaps as a newcomer to this notation I can find the section of Data Analysis Using Regression and Multilevel/Hierarchical Models where they show five (or six?) ways to write the same model...

@datalorax
Copy link
Owner Author

Thanks, that would be super helpful. Also - I realize the tests are failing. I'll fix that later tonight too when I organize the codebase a bit better.

@andrewheiss
Copy link
Collaborator

This is absolutely amazing.

@datalorax
Copy link
Owner Author

Okay, I just documented all the functions and cleaned up the codebase some (gave better names to a few functions, etc.). The next thing I really want to implement is being able to define group-level coefficients. I have a few ideas on that, but I think this probably ready to merge in?

@datalorax
Copy link
Owner Author

datalorax commented Sep 4, 2020

Okay, I refactored the code and fixed a couple small bugs. I also added wrapping and the ability to have the mean structure inside or outside of the normal distribution part. I called that argument mean_separate, which I don't love and am fully open to suggestions for other names.

I will wait until this evening but I do believe this is officially ready to merge, and then we can submit new PRs for further development (e.g. #105).

@bbolker would you possibly have time to glance at the below and let me know if anything looks incorrect? Also, I'm currently using broom.mixed::tidy but I'm wondering if there's an easy way I can use just the tidy.merMod method. I can access that via broom.mixed:::tidy.merMod but of course that's not allowed on CRAN.

library(equatiomatic)
library(lme4)
library(tidyverse)
data("benchmarks", package = "esvis")

d <- benchmarks %>% 
  add_count(sid) %>% 
  filter(n == 3) %>% 
  mutate(wave = case_when(season == "Fall" ~ 0, 
                          season == "Winter" ~ 1,
                          TRUE ~ 2))

set.seed(123)
d$district <- rep(sample(1:9), length.out = nrow(d))

d <- d %>% 
  count(district, frl) %>% 
  pivot_wider(names_from = frl, values_from = n) %>% 
  mutate(district_prop_frl = FRL / (FRL + `Non-FRL`)) %>% 
  select(district, district_prop_frl) %>% 
  right_join(d) %>% 
  rename(sid_frl = frl) 

m0 <- lmer(math ~ 1 + (1|sid), d)
extract_eq(m0)

Screen Shot 2020-09-04 at 2 57 05 PM

m1 <- lmer(math ~ 1 + wave + (1|sid), d)
extract_eq(m1)

Screen Shot 2020-09-04 at 2 57 21 PM

m2 <- lmer(math ~ 1 + wave + (wave|sid), d)
extract_eq(m2)

Screen Shot 2020-09-04 at 2 58 03 PM

m3 <- lmer(math ~ 1 + wave + (wave|sid) + (1|district), d)
extract_eq(m3)

Screen Shot 2020-09-04 at 2 58 33 PM

m4 <- lmer(math ~ 1 + wave*cohort + (wave|sid) + (wave*cohort|district), d)
extract_eq(m4)

Screen Shot 2020-09-04 at 2 59 04 PM

Show wrapping

m5 <- lmer(math ~ wave*cohort + district_prop_frl + 
            (wave|sid) + (wave*cohort|district),
           data = d)  
extract_eq(m5, wrap = TRUE, terms_per_line = 2)

Screen Shot 2020-09-04 at 2 59 37 PM

Show wrapping inside normal distribution

extract_eq(m5, wrap = TRUE, terms_per_line = 2, mean_separate = FALSE)

Screen Shot 2020-09-04 at 3 00 00 PM

@bbolker
Copy link
Contributor

bbolker commented Sep 4, 2020

Wow! I'll take a look.

If you need to access tidy.merMod directly, broom.mixed could certainly export it, although it's not immediately obvious to me why regular S3 dispatch doesn't do what you need ... ?

@datalorax
Copy link
Owner Author

It's not a huge deal, but because already use broom::tidy I get the following message when running extract_eq() with a merMod. Maybe there's another way around this though?

Registered S3 method overwritten by 'broom.mixed':
method from
tidy.gamlss broom

@bbolker
Copy link
Contributor

bbolker commented Sep 4, 2020

It's not a huge deal, but because already use broom::tidy I get the following message when running extract_eq() with a merMod. Maybe there's another way around this though?

Registered S3 method overwritten by 'broom.mixed':
method from
tidy.gamlss broom

The correct way to get around this would be to have the broom and broom.mixed maintainers agree that only one of them was going to export such a method. Based on this forgotten issue, it looks like I should remove tidy.gamlss from broom.mixed ...

PS it's rather confusing but this suggests that the gamlss tidiers are already on their way out of broom, so this message should go away eventually.

@datalorax datalorax merged commit 5ba29c1 into master Sep 7, 2020
@datalorax
Copy link
Owner Author

Went ahead and merged this in. I have a nearly working branch on my local to incorporate group-level predictors. Will have a PR for that soon.

@datalorax datalorax deleted the first-lmer-try branch September 7, 2020 17:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants