Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

lme4: additive notation #196

Open
elbersb opened this issue Sep 3, 2021 · 33 comments
Open

lme4: additive notation #196

elbersb opened this issue Sep 3, 2021 · 33 comments

Comments

@elbersb
Copy link

elbersb commented Sep 3, 2021

I took this example from the documentation:

Screen Shot 2021-09-03 at 09 40 49

I think the notation of the "alpha" term is somewhat confusing, and also (I think) non-standard, as it is not clear how alpha_j, alpha_k, and alpha_l are combined to produce the mean of score_i. Gelman & Hill (p. 289) use the following additive notation, which feels much more natural to me:

Screen Shot 2021-09-03 at 09 41 57

@datalorax
Copy link
Owner

Hi - the equation you have from Gelman & Hill, p. 289, is for a non-nested model, but I see what you're saying. I think the issue is probably that the means should instead just be zeros.
Screen Shot 2021-09-03 at 9 38 01 AM

Thoughts?

@elbersb
Copy link
Author

elbersb commented Sep 3, 2021

Thanks! Thinking a bit more about this, there seem to be three issues:

  1. The model should have an intercept, e.g, \mu.
  2. You're right, the three latter means of the normals should be zero. (But this requires the intercept)
  3. Lastly, I still think that the Gelman & Hill notation is more appropriate. The example in the documentation that I copied is also non-nested (the data may be nested, but in the model sid, school, and district are just additive). In the current notation it's not clear that you have to add the three alphas to obtain the mean of score_i.

Some code for illustration is here:

library(equatiomatic)
library(lme4)
#> Loading required package: Matrix

mod <- lmer(score ~ wave + (1 | sid) + (1 | school), data = sim_longitudinal)

# model includes an intercept ("fixed effect")
fixef(mod)
#> (Intercept)        wave 
#>  97.1122331   0.1736105

# all of the random effects are mean zero
mean(ranef(mod)$school[,1])
#> [1] 9.453623e-13
mean(ranef(mod)$sid[,1])
#> [1] 5.764416e-11

# to predict the first observation: (sid = 1, school = 1, wave = 0)
fixef(mod)[1] + ranef(mod)$sid[1,1] + ranef(mod)$school[1,1]
#> (Intercept) 
#>    101.9404
predict(mod)[1] # check
#>        1 
#> 101.9404

# whether the data is nested or not is irrelevant for the model

mod2 <- lmer(score ~ (1 | wave) + (1 | sid) + (1 | school), data = sim_longitudinal)

# model includes an intercept ("fixed effect")
fixef(mod2)
#> (Intercept) 
#>    97.89348

# all of the random effects are mean zero
mean(ranef(mod2)$wave[,1])
#> [1] 5.902878e-12
mean(ranef(mod2)$school[,1])
#> [1] 1.460903e-10
mean(ranef(mod2)$sid[,1])
#> [1] 1.642559e-10

# to predict the first observation: (sid = 1, school = 1, wave = 0)
fixef(mod2)[1] + ranef(mod2)$sid[1,1] + ranef(mod2)$school[1,1] + ranef(mod2)$wave[1,1]
#> (Intercept) 
#>     101.959
predict(mod2)[1] # check
#>       1 
#> 101.959

Created on 2021-09-03 by the reprex package (v2.0.1)

@datalorax
Copy link
Owner

Yes, I understand how the parameters add together (I teach advanced courses on multilevel modeling). What I'm not understanding, however, is how the additive nature isn't clear from the screenshot with the means set to zero.

The intercept is \alpha, which is randomly varying across j, k, and l. Each of these have their own distributions with a mean of zero. If the parameter is specified as being generated by more than one distribution, then additivity is implied (i.e., the single value is generated by two or more distributions).

I don't like the idea of including additional greek notation to imply the additivity because it makes generalizing to n levels more difficult. For example, in the case you included from Gelman/Hill, we have gamma and delta. Would we move to zeta for the next level? What about for even more levels? iota, kappa, etc.? That seems overly complicated to me.

@elbersb
Copy link
Author

elbersb commented Sep 3, 2021

If the parameter is specified as being generated by more than one distribution, then additivity is implied.

I guess that's where I'd prefer a more explicit notation, especially for teaching purposes, and I also wouldn't think about alpha as one parameter, although I see now how one can do that (but don't you still require the overall intercept? -- because the three alphas have mean zero). Here's a proposal at writing this model with fewer greek letters, but explicit additivity:

Screen Shot 2021-09-03 at 21 27 06

(By the way, the variances have subscripts for the individual indices, e.g. $\sigma_{\alpha_j}$, but the variance doesn't depend on j. So this should just be $\sigma_{\alpha}$, as in the G&H example.)

@elbersb
Copy link
Author

elbersb commented Sep 3, 2021

Oops, the variances had the wrong subscript:
Screen Shot 2021-09-03 at 21 40 09

@datalorax
Copy link
Owner

In this specific case I agree that is more clear. But if you have a random slope for wave then you have a mean for wave (the fixed effect) plus all those same offsets.

Currently, a model like that would look like this

library(equatiomatic)
m <- lme4::lmer(score ~ wave + (wave | sid) + (wave | school) + (wave | district),
          data = sim_longitudinal)
#> boundary (singular) fit: see ?isSingular
extract_eq(m)

Created on 2021-09-03 by the reprex package (v0.3.0)

I do think we should swap each mean for a zero. But how would this model look with the notation you've proposed?

@datalorax
Copy link
Owner

The notation gets pretty small, but maybe this is more clear?
Screen Shot 2021-09-03 at 12 55 57 PM

@elbersb
Copy link
Author

elbersb commented Sep 3, 2021

Gotcha, I get now how this notation came about. But how about writing it like this:

Screen Shot 2021-09-03 at 22 00 40

This overloads \alpha and \beta, but that seems like an ok tradeoff.

@elbersb
Copy link
Author

elbersb commented Sep 3, 2021

Of course, one could also reorder such that the \alpha come all first -- maybe that makes more sense.

@elbersb
Copy link
Author

elbersb commented Sep 3, 2021

This would be another option, although it's fairly noisy (but it's also a complicated model). The problem with the previous version is that variance components such as $\sigma_{\alpha_j}$ are ambiguous. This version doesn't have this problem, as the three alphas are distinguishable.

Screen Shot 2021-09-03 at 22 11 44

It's fairly late here, so I'll leave this for now and see if I have another idea tomorrow.

@datalorax
Copy link
Owner

datalorax commented Sep 3, 2021

I like it. It's sort of an expanded notation. Perhaps I could include that as an additional argument so you could have it output either way.

I do think we'd still need to specify the mean for wave though, so perhaps something like this.

Screen Shot 2021-09-03 at 1 13 51 PM

EDIT: Thinking about this more - I think I'd probably swap the betas for lower case b's, but otherwise I think it could work.

@datalorax
Copy link
Owner

The problem with the previous version is that variance components such as $\sigma_{\alpha_j}$ are ambiguous

How so? Each index corresponds to the given level, j for sid, k for school, and l for district. So $\sigma_{\alpha_j}$ corresponds to the SID variance of the alpha parameter (intercept).

@elbersb
Copy link
Author

elbersb commented Sep 4, 2021

The index j implies that for each sid, a separate variance is estimated. That's not the case in this model, but it could be a possible model, so it seems worth distinguishing the two cases. I added the intercept for \beta in this notation. I realize it's verbose, but it seems to avoid all ambiguities.

Screen Shot 2021-09-04 at 09 34 47

@datalorax
Copy link
Owner

The index j implies that for each sid, a separate variance is estimated

I guess I'm not clear on what you mean by this. Are you talking about the residual variance structure? I would say that j implies parameter-level variance across sid levels (i.e., the offsets from the main effect).

I will think about this more and perhaps try to get some feedback from others. The below seems clear to me, but maybe I'm too close to it.

Screen Shot 2021-09-04 at 4 52 57 PM

@datalorax
Copy link
Owner

I guess the trouble with the above is it doesn't fully make clear the connection between the main effect and the random effects as yours does.

Alright... I think I'm convinced (and yes, this is only a few moments after stating I felt like I needed to think about it more). I will probably try to work on this next week. Thank you for sticking with me on this.

@datalorax
Copy link
Owner

Okay, what do you think about something like this?

Screen Shot 2021-09-04 at 5 20 07 PM

@datalorax
Copy link
Owner

Okay, even more complicated model. What do you think of this? One thing I'm not sure of is all the superscripts denoting the level of variation. They get pretty burdenson.

Notice this model has group-level predictors, and the predictors as the sid level vary at the schid level.

library(tidyverse)
library(lme4)
#> Loading required package: Matrix
#> 
#> Attaching package: 'Matrix'
#> The following objects are masked from 'package:tidyr':
#> 
#>     expand, pack, unpack

three_lev_link <- "https://stats.idre.ucla.edu/wp-content/uploads/2016/02/eg_hlm.sav"
three_lev <- haven::read_sav(three_lev_link) %>% 
  haven::as_factor() %>% 
  mutate(school = as.numeric(as.character(school)),
         cid = as.numeric(as.character(cid))) %>% 
  select(schid = school,
         sid = cid,
         size:mobility,
         female:hispanic, retained,
         grade, year,
         math)

m <- lmer(math ~ year * female + mobility + lowinc +
            (year|sid) +  (year * female|schid), 
          data = three_lev)
#> boundary (singular) fit: see ?isSingular

equatiomatic::extract_eq(m)

Screen Shot 2021-09-04 at 9 18 54 PM

Note this is not really a reprex, because I swapped out the equation for one I created by hand.

Created on 2021-09-04 by the reprex package (v0.3.0)

@datalorax
Copy link
Owner

Sorry, one last thing here. Just thinking about this again:

The index j implies that for each sid, a separate variance is estimated.

Were you referring specifically to the j in the variance-covariance matrix of the random effects? I agree for sure it should be dropped there. But I'm thinking if we do that then we could drop all the superscripts for sid, schid, etc. Do you think it would still be ambiguous if we did that?

@elbersb
Copy link
Author

elbersb commented Sep 6, 2021

Thanks for all the work! I think the superscripts are required -- I also think they really help with understanding the model by explicitly associating the parameters with what the parameters mean.

Regarding the "j": Yes, I was explicitly referring to the variance component. See the following:
Screen Shot 2021-09-06 at 08 33 29
The first model says each parameter alpha is drawn from a normal distribution with mean zero and a variance parameter that differs for each sid (depends on j). The second model says each alpha is drawn from a normal with mean zero and a fixed variance parameter. I feel that the notation should make clear that we mean the latter case, and that's the reason why I introduced the superscripts.

I'm not quite sure I understand all the decision regarding the more complicated model you show. I will try to translate it myself and see where we differ.

@elbersb
Copy link
Author

elbersb commented Sep 6, 2021

Here's my attempt for this model. I retained greek letters for all parameters, because I feel like that makes clear that are related parameters. Of course, I would be fine separating out \mu and writing it in a different line (the mean of the first normal). Although there are lots of superscripts, I feel like they clarify what exactly each quantity means.

Screen Shot 2021-09-06 at 08 57 23

@datalorax
Copy link
Owner

datalorax commented Sep 6, 2021

Thanks. I really prefer having the group-level predictors at the "level" they are at (even though it doesn't make a difference in estimation, it can really help clarity when you have a bunch of predictors at a bunch of different levels). For example, see Gelman and Hill, Equation 12.15, p. 266, or Equation 13.2 on p. 280.

The point about the j makes total sense.

@elbersb
Copy link
Author

elbersb commented Sep 6, 2021

Best I could come with is the one below. I like that it avoids some of the bracketed [i]s. But I don't think it's possible to pull the equations for \mu into the distributional statements.

Screen Shot 2021-09-06 at 21 36 18

@datalorax
Copy link
Owner

Alright well... starting to feel less certain about this move. I think there's a few things we could clean up with the current notation but I'm not sure I want to fully move to this proposal.

I'm about to release v0.3 and was considering trying to implement some of these changes before then, but I think I'll stick with it as is for now and we can continue discussing this if you'd like for v0.3.1 or v0.4. To me, this notation is less clear than what I proposed in this comment.

@elbersb
Copy link
Author

elbersb commented Sep 6, 2021

This is not urgent for me at all -- no rush. I would use this purely for teaching (and I think it's great for that!)

To be honest, though, with your previous example I have trouble seeing the relationship between the equations and the model... The year covariate and the interaction is missing, and why do the parameters for sid involve g, which is coming from the schid level? If we think generatively about this model, then you would first have to draw from the last multivariate normal to generate the g's, and then make another draw from the sid distribution, for which the means depend on g. I don't think that's the model that is estimated by lmer.

@datalorax
Copy link
Owner

The year covariate and the interaction is missing

Yeah, I see what you're saying. The current version looks like this:

Screen Shot 2021-09-06 at 1 20 09 PM

Which I think is mostly clear. The interaction is shown by this part, because female, at the sid level, is predicting the beta_1 slope.

Screen Shot 2021-09-06 at 1 21 27 PM

I definitely think the variance-covariance part should be cleaned up by removing the subscripts. But the rest of it is, from my perspective, is pretty clear. I agree that the additive nature is not explicit, and I like the proposal to get that included, at least as an option. But I'm going to take some time to think on it a bit more before I actually devote any time to it.

@elbersb
Copy link
Author

elbersb commented Sep 7, 2021

If I substitute the higher-level equations back into the first equation, this is what comes out:

Screen Shot 2021-09-07 at 12 24 23

Maybe it's just that I'm not used to this notation, but I find it hardly clear -- especially in terms of relating the quantities given in the notation to the lmer output.

I've updated the other example like this to more clearly distinguish the levels:

Screen Shot 2021-09-07 at 12 32 41

These quantities relate directly to the lme4 output:

  • fixef(m) gives alpha and beta_1 through beta_5.
  • ranef(m)$sid gives the deviations from alpha and beta_1 for each sid,
  • ranef(m)$schid gives the deviations from alpha, beta_1, beta_2, and beta_5 for each sid.
> fixef(m)
 (Intercept)         year       female     mobility       lowinc  year:female 
 0.148895814  0.757746730 -0.028796146 -0.010256601 -0.007573428  0.012771590 
> head(ranef(m)$sid)
          (Intercept)        year
101480302  0.01228463 -0.01490673
173559292  0.46964220  0.03304830
174743401 -0.09559708 -0.04008357
174755092  0.18573450 -0.02796675
179986251 -1.15298238 -0.13414677
179989511 -1.10176634 -0.14879591
> head(ranef(m)$schid)
      (Intercept)        year      female  year:female
2020  0.223138136  0.18956016 -0.17981166  0.007486238
2040  0.097887246  0.11542878 -0.09179669  0.004223772
2180 -0.003541226 -0.11315393  0.03201687  0.001413563
2330  0.144209600 -0.06703731  0.05946655 -0.032387448
2340  0.234471831 -0.03703165 -0.07267607 -0.005913269
2380 -0.043747381  0.11065318 -0.06507523  0.015913536

@elbersb
Copy link
Author

elbersb commented Sep 7, 2021

A small variation that aligns a bit more with matrix notation:

Screen Shot 2021-09-07 at 13 06 21

@datalorax
Copy link
Owner

Yes, now that I'm looking at that one closer I see it does have some issues. Let me look at a different model (and I'll file an issue for that specific model if I end up sticking with (mostly) the current notation to fix it) and I'll try to work up a simulation to illustrate the notation I'm going for.

@datalorax
Copy link
Owner

datalorax commented Sep 8, 2021

Okay, I think the primary issue with the notation was that I was trying to not use different greek letters at each level. But that just really doesn't end up working. If we do use different greek notation at each level, however, I think it all works. This is kind of a mashup between the Gelman and Hill notation and the Raudenbush and Bryk notation, but I think it works.

The model I'm working with is this one

library(equatiomatic)
library(lme4)
#> Loading required package: Matrix
library(tidyverse)

# calculate district means
dist_mean <- tapply(
  sim_longitudinal$score,
  sim_longitudinal$district,
  mean
)

# put them in a df to merge
dist_mean <- data.frame(
  district = names(dist_mean),
  dist_mean = dist_mean
)

# create a new df with dist_mean added
d <- merge(sim_longitudinal, dist_mean, by = "district")

d <- d %>% 
  mutate(prop_low = (prop_low - mean(prop_low)) * 100,
         dist_mean = dist_mean - mean(dist_mean))

m <- lmer(score ~ wave + group + treatment + prop_low * dist_mean +
            (wave | sid) + (wave | school) + (wave + prop_low | district),
          data = d
)
#> boundary (singular) fit: see ?isSingular

Created on 2021-09-08 by the reprex package (v0.3.0)

It's highly overfit, but it illustrates a lot of the complexities I'd like it to. There are group-level predictors, cross-level interactions, and terms varying across various levels (including the group-level predictors).

I'm propose modifying the existing output from extract_eq() to the below. By using different greek notation at each level, and using the superscripts to keep track of everything, it should all work.

Screen Shot 2021-09-08 at 10 44 39 AM

Here's a simulation that illustrates this equation formulation. It does take a fair amount of time to run, however (and assumes you've already run the above).

########### Set params ########### 
theta0_delta0.gamma0.alpha <- 102 # overall intercept
theta0_delta0.gamma0.beta1 <- 0.2 # fixed effect: wave
gamma1_alpha <- -5.3 # fixed effect: grouplow
gamma2_alpha <- -4.08 # fixed effect: group medium
gamma3_alpha <- -2.12 # fixed effect: treatment1
theta0_delta1.gamma0.alpha <- 0.18 # fixed effect: prop_low
theta1_delta0.gamma0.alpha <- 1.36 # fixed effect: dist_mean
theta1_delta1.gamma0.alpha <- -0.03 # fixed effect: prop_low:dist_mean

# Variance-Covariance matrices
sid_vcv <- VarCorr(m)$sid
sch_vcv <- VarCorr(m)$school
dist_vcv <- matrix(
  c(0.15, -0.5, 0.01,
    -0.5, 2.09, -0.01,
    0.01, 0.01, 0.01),
  byrow = TRUE,
  ncol = 3,
  dimnames = list(c("intercept", "prop_low", "wave"),
                  c("intercept", "prop_low", "wave"))
)

# Residual variance
m_sigma <- 0.5

# N's
n_districts <- 100
n_schools_per_dist <- 20
stu_per_school <- 50
n_waves <- 8

# create some data; note `d_` is used to denote "data", not parameters
sim_data <- tibble(
  district = seq_len(n_districts),
  d_dist_mean = rnorm(n_districts, mean(d$dist_mean), sd(d$dist_mean)),
) %>% 
  mutate(sch_data = map(
    seq_len(nrow(.)), function(x) {
      tibble(schid = seq_len(n_schools_per_dist),
             d_prop_low = rnorm(n_schools_per_dist, mean(d$prop_low), sd(d$prop_low)))
    }
  )) %>% 
  unnest(sch_data) %>% 
  mutate(schid = paste0(district, ".", schid)) %>% 
  group_by(schid) %>% 
  nest() %>% 
  ungroup() %>% 
  mutate(stu_data = map(
    seq_len(nrow(.)), function(x) {
      tibble(sid = seq_len(stu_per_school),
             d_group_low = c(rep(1, floor(stu_per_school/3)), 
                             rep(0, stu_per_school - floor(stu_per_school/3))),
             d_group_medium = c(rep(0, stu_per_school - floor(stu_per_school/3)),
                                rep(1, floor(stu_per_school/3))),
             d_treatment = c(rep(0, stu_per_school/2), rep(1, stu_per_school/2)))
    }
  )) %>% 
  unnest(data) %>%
  unnest(stu_data) %>% 
  mutate(sid = paste0(schid, ".", sid)) %>% 
  nest_by(sid) %>% 
  mutate(d_wave = list(seq_len(n_waves) - 1)) %>% 
  ungroup() %>% 
  unnest(data) %>% 
  unnest(d_wave)

# simulation function
simulate_obs <- function(means, vcv) {
  sim <- MASS::mvrnorm(1, means, vcv)
  nms <- names(sim)
  matrix(sim, nrow = 1) %>% 
    as_tibble() %>% 
    setNames(nms)
}

########### Conduct simulation ########### 

# district level
dist_pars <- sim_data %>% 
  distinct(district, .keep_all = TRUE) %>% 
  rowwise() %>% 
  mutate(
    intercept = theta0_delta0.gamma0.alpha + 
      (theta1_delta0.gamma0.alpha * d_dist_mean),
    plow = theta0_delta1.gamma0.alpha,
    slope = theta0_delta0.gamma0.beta1,
    dist = list(
     simulate_obs(c(intercept, plow, slope), dist_vcv)
  )) %>% 
  select(district, dist) %>% 
  unnest(dist) %>% 
  rename(dist_intercept = intercept,
         dist_prop_low = prop_low,
         dist_wave = wave)

sim_data <- sim_data %>% 
  left_join(dist_pars)

# school level, simulated from each corresponding district mean
sch_pars <- sim_data %>% 
  distinct(schid, .keep_all = TRUE) %>% 
  rowwise() %>% 
  mutate(
    intercept = dist_intercept + 
      (dist_prop_low * d_prop_low) +
      (theta1_delta1.gamma0.alpha * (d_prop_low * d_dist_mean)),
    slope = dist_wave,
    sch = list(
     simulate_obs(c(intercept, slope), sch_vcv) 
  )) %>% 
  select(schid, sch) %>% 
  unnest(sch) %>% 
  rename(sch_intercept = `(Intercept)`,
         sch_wave = wave)

sim_data <- sim_data %>% 
  left_join(sch_pars)

# student level, simulated from each corresponding school mean
sid_pars <- sim_data %>% 
  distinct(sid, .keep_all = TRUE) %>% 
  rowwise() %>%
  mutate(intercept = sch_intercept + (gamma1_alpha * d_group_low) + 
           (gamma2_alpha * d_group_medium) + (gamma3_alpha * d_treatment),
         slope = sch_wave) %>%
  mutate(stu = list(
    simulate_obs(c(intercept, slope), sid_vcv) 
  )) %>% 
  select(sid, stu) %>% 
  unnest(stu) %>% 
  rename(stu_intercept = `(Intercept)`,
         stu_wave = wave)

sim_data <- sim_data %>% 
  left_join(sid_pars)

# Observation level, simulated for each student
simulated <- sim_data %>% 
  rowwise() %>% 
  mutate(mean = stu_intercept + (stu_wave * d_wave) ,
         score = rnorm(1, mean, m_sigma)) %>% 
  select(sid:d_wave, score)

# estimate model
m_sim <- lmer(score ~ d_wave + d_group_low + d_group_medium + d_treatment + 
                d_prop_low * d_dist_mean +
            (d_wave | sid) + (d_wave | schid) + (d_wave + d_prop_low | district),
          data = simulated
)

# compare
tibble(
  param = names(fixef(m)),
  true = c(102, 0.2, -5.3, -4.08, -2.12, 0.18, 1.36, -0.03),
  estimated = fixef(m_sim)
)
#> # A tibble: 8 × 3
#>   param                true   estimated
#>   <chr>               <dbl>       <dbl>
#> 1 (Intercept)        102    101.8998   
#> 2 wave                 0.2    0.1989466
#> 3 grouplow            -5.3   -5.231331 
#> 4 groupmedium         -4.08  -4.028604 
#> 5 treatment1          -2.12  -2.161790 
#> 6 prop_low             0.18   0.2271691
#> 7 dist_mean            1.36   1.361268 
#> 8 prop_low:dist_mean  -0.03  -0.1093143
 
# true student vcv
sid_vcv 
#>             (Intercept)        wave
#> (Intercept)  86.6246703 -0.57107437
#> wave         -0.5710744  0.08556414
#> attr(,"stddev")
#> (Intercept)        wave 
#>   9.3072375   0.2925135 
#> attr(,"correlation")
#>             (Intercept)       wave
#> (Intercept)   1.0000000 -0.2097616
#> wave         -0.2097616  1.0000000

# estimated student vcv
VarCorr(m_sim)$sid
#>             (Intercept)      d_wave
#> (Intercept)  86.2525184 -0.57587790
#> d_wave       -0.5758779  0.08577146
#> attr(,"stddev")
#> (Intercept)      d_wave 
#>   9.2872234   0.2928676 
#> attr(,"correlation")
#>             (Intercept)     d_wave
#> (Intercept)   1.0000000 -0.2117255
#> d_wave       -0.2117255  1.0000000

# true school vcv
sch_vcv
#>             (Intercept)          wave
#> (Intercept)  9.93157307 -1.399767e-02
#> wave        -0.01399767  1.972852e-05
#> attr(,"stddev")
#> (Intercept)        wave 
#> 3.151439841 0.004441679 
#> attr(,"correlation")
#>             (Intercept)       wave
#> (Intercept)   1.0000000 -0.9999991
#> wave         -0.9999991  1.0000000

# estimated school vcv
VarCorr(m_sim)$schid
#>             (Intercept)        d_wave
#> (Intercept)   9.5328845 -1.157120e-02
#> d_wave       -0.0115712  1.404535e-05
#> attr(,"stddev")
#> (Intercept)      d_wave 
#> 3.087536967 0.003747713 
#> attr(,"correlation")
#>             (Intercept) d_wave
#> (Intercept)           1     -1
#> d_wave               -1      1

# true district vcv
dist_vcv
#>           intercept prop_low  wave
#> intercept      0.15    -0.50  0.01
#> prop_low      -0.50     2.09 -0.01
#> wave           0.01     0.01  0.01

cov2cor(dist_vcv)
#>            intercept    prop_low        wave
#> intercept  1.0000000 -0.89299953  0.25819889
#> prop_low  -0.8929995  1.00000000 -0.06917145
#> wave       0.2581989  0.06917145  1.00000000

# estimated district vcv (note the columns/rows are in a different order
cov2cor(dist_vcv)
#>             (Intercept)     d_wave  d_prop_low
#> (Intercept)  0.22203609 0.03213051 -0.35245848
#> d_wave       0.03213051 0.01099242  0.01759074
#> d_prop_low  -0.35245848 0.01759074  1.82081966
#> attr(,"stddev")
#> (Intercept)      d_wave  d_prop_low 
#>   0.4712071   0.1048447   1.3493775 
#> attr(,"correlation")
#>             (Intercept)    d_wave d_prop_low
#> (Intercept)   1.0000000 0.6503679 -0.5543228
#> d_wave        0.6503679 1.0000000  0.1243380
#> d_prop_low   -0.5543228 0.1243380  1.0000000

I'm still not going to implement this immediately, but I think this is the direction I'd like to move. I also do think it would be good to have an additional option that includes all of the fixed effects in a single line, similar to how you have outlined above. That could be an additional argument to extract_eq(), perhaps as a logical with the argument something like "mixed_model", where FALSE would return as above and TRUE would return something similar to what you've outlined.

@elbersb
Copy link
Author

elbersb commented Sep 15, 2021

Sorry, for the late reply. Yes, that seems to fix the problems. I do think that the alternative notation does help with relating the quantities of the equation to the model output, so if you consider putting in a switch to allow for that kind of notation, I'd be all for it.

@jarbet
Copy link

jarbet commented Jun 12, 2024

@datalorax : first, thanks so much for making the equatiomatic R package. I'm finding it extremely useful!

I want to echo @elbersb 's OP, since I am also confused about the lack of additive notation for random effects. For example:

library(lme4);
#> Loading required package: Matrix
library(equatiomatic); # extract math latex formula for LMM
library(texPreview); # display math latex formula

### simulate example multilevel data ###########################
n.patients <- 100;
sim.data.one.patient.one.gene <- function(patient.id = 1, n.foci = 2, n.lesion = 2) {
    data <- data.frame(
        lesion.id = rep(1:n.lesion, each = n.foci),
        foci.id = rep(1:n.foci, times = n.lesion)
        );
    data$abundance <- rnorm(n = 1:nrow(data));
    data$patient.id <- patient.id;
    return(data);
    }

set.seed(123);
sim.data <- lapply(
    X = 1:n.patients,
    FUN = function(x) {
        sim.data.one.patient.one.gene(patient.id = x)
        }
    );
sim.data <- do.call(rbind, sim.data);
sim.data <- sim.data[,c('patient.id', colnames(sim.data)[colnames(sim.data) != 'patient.id'])];
sim.data$treat <- sample(
    x = c('pre', 'post'),
    size = nrow(sim.data),
    replace = TRUE
    );
sim.data$cohort <- sample(
    x = c('cohort.1', 'cohort.2'),
    size = nrow(sim.data),
    replace = TRUE
    );
sim.data$treat <- factor(sim.data$treat);
sim.data$cohort <- factor(sim.data$cohort);
sim.data$patient.id <- factor(sim.data$patient.id);
sim.data$lesion.id <- factor(sim.data$lesion.id);
sim.data$foci.id <- factor(sim.data$foci.id);

### fit lmer ###########################

fit <- lmer(
    formula = abundance ~ treat + cohort + (1|patient.id/lesion.id),
    data = sim.data
    );
#> boundary (singular) fit: see help('isSingular')

tex_preview(extract_eq(fit))

Created on 2024-06-12 with reprex v2.0.2

image

Question:

Am I correct that your random intercept notation implies $\alpha_{j[i],k[i]} = \alpha_{j[i]} + \alpha_{k[i]}$?

The latter seems more intuitive to me - any reasons you went with the former?. I suppose if I use your formulas for a publication, I can just include a note explaining the above notation.

@datalorax
Copy link
Owner

I think I would use equatiomatic to extract the code but then manually modify it so it matches what is shown above. I unfortunately do not have time to make fixes to this now, but it does have a new maintainer (@phgrosjean), and he may be able to implement a fix.

@jarbet
Copy link

jarbet commented Jun 12, 2024

I think I would use equatiomatic to extract the code but then manually modify it so it matches what is shown above. I unfortunately do not have time to make fixes to this now, but it does have a new maintainer (@phgrosjean), and he may be able to implement a fix.

Got it, will do. I'm having a pretty hard time following the notation in your post above. In any case, I can just manually edit the latex if I find a notation that makes more sense to me.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants