Skip to content

Programmatically Creating RE Terms #462

@palday

Description

@palday

This is at least partly a StatsModels.jl issue, partly a mixedmodels issue, and partly a julia/macro issue.

I think what you're trying to do is construct a formula from a list of variable names. For more information on that, see the docs for statsmodels on constructing a formula at run-time: https://juliastats.org/StatsModels.jl/stable/formula/#Constructing-a-formula-programmatically-1. You can do the fixed effects part easily:

julia> fix = term(1) + sum(term.("x$n" for n in 1:3))
1
x1(unknown)
x2(unknown)
x3(unknown)

The mixedmodels issue is that constructing a random-effects term at run-time using that technique is tricky, but it can be done. The reason is that you have to construct the special RandomEffectsTerm used by mixedmodels, but that's assumed to happen after the schema has already been extracted (see this section of the statsmodels docs: https://juliastats.org/StatsModels.jl/stable/internals/#The-lifecycle-of-a-@formula-1, and this code from mixedmodels on how the random effects syntax with | is implemented: https://github.com/JuliaStats/MixedModels.jl/blob/master/src/randomeffectsterm.jl; note especially how the constructor for RandomEffectsTerm checks whether the grouping is an interaction or categorical term, which is only known after the schema is applied). This is to my mind a legitimate quality of life issue. At the moment you'll have to construct the concrete versions of all the terms and assemble the RandomEffectsTerm manually. While this technically works (in that it will give you a formula you can fit with mixedmodels) it isn't going to give you any of the bumpers that mixedmodels normally provides (e.g., if you create multiple random effects terms this way, then you might end up with ... "interesting" (mis-specified) models), so it's definitely a "here be dragons" kinda thing that we should fix. So with the caveat that I'm handing you a big foot-gun here, this is how I'd do it:

julia> ran = sum(term.("z$n" for n in 1:2))
z1(unknown)
z2(unknown)

julia> ran = apply_schema(sum(term.("z$n" for n in 1:2)), schema(dat), MixedModel)
z1(continuous)
z2(continuous)

julia> grouping = StatsModels.concrete_term(term(:g), dat, Dict(:g => Grouping()))
g(Grouping:55)

julia> reterm = MixedModels.RandomEffectsTerm(ran, grouping)
(z1 + z2 | g)

julia> formula = term(:y) ~ term(1) + fix + reterm
FormulaTerm
Response:
  y(unknown)
Predictors:
  1
  x1(unknown)
  x2(unknown)
  x3(unknown)
  (z1 + z2 | g)

This gets us most of the way there but it still doesn't work unfortunately. The issue is that MixedModels tries to extract a schema and doesn't recognize that RandomEffectsTerms don't need them...we can duck patch that by defining StatsModels.needs_schema(::RandomEffectsTerm) = false but that's better fixed here or in statsmodels IMO.

In the interest of fixing this the right way, here are the missing pieces in MixedModels at the moment

  • make apply_schema(::FunctionTerm{typeof(|)}, ...) create an "non-concrete" RandomEffectsTerm
  • add additional apply_schema(::RandomEffectsTerm, ...) which makes it concrete, does the validity checks done in teh inner constructor right now
  • add run-time methods for Base.|(::AbstractTerm, ::AbstractTerm) to create a RandomEffectsTerm, or if we don't want to do type piracy, use a different syntax (like re or reterm or something...)

Finally, the most superficial reason why your attempt doesn't work is a Julia/macro issue: @formula(f) only sees one thing as its argument, the Symbol :f. Macros in julia can only do transformations of the surface syntax (symbols, exprs, strings, and literal numbers). Reading the docs on macros in julia will be helpful, and people on slack/discourse/zulip can answer other questions in that direction. If you're going to go this way (which is not recommended), then you need to use eval and $f.

Originally posted by @kleinschmidt in #426 (comment)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions