-
Notifications
You must be signed in to change notification settings - Fork 49
Description
This is at least partly a StatsModels.jl issue, partly a mixedmodels issue, and partly a julia/macro issue.
I think what you're trying to do is construct a formula from a list of variable names. For more information on that, see the docs for statsmodels on constructing a formula at run-time: https://juliastats.org/StatsModels.jl/stable/formula/#Constructing-a-formula-programmatically-1. You can do the fixed effects part easily:
julia> fix = term(1) + sum(term.("x$n" for n in 1:3))
1
x1(unknown)
x2(unknown)
x3(unknown)The mixedmodels issue is that constructing a random-effects term at run-time using that technique is tricky, but it can be done. The reason is that you have to construct the special RandomEffectsTerm used by mixedmodels, but that's assumed to happen after the schema has already been extracted (see this section of the statsmodels docs: https://juliastats.org/StatsModels.jl/stable/internals/#The-lifecycle-of-a-@formula-1, and this code from mixedmodels on how the random effects syntax with | is implemented: https://github.com/JuliaStats/MixedModels.jl/blob/master/src/randomeffectsterm.jl; note especially how the constructor for RandomEffectsTerm checks whether the grouping is an interaction or categorical term, which is only known after the schema is applied). This is to my mind a legitimate quality of life issue. At the moment you'll have to construct the concrete versions of all the terms and assemble the RandomEffectsTerm manually. While this technically works (in that it will give you a formula you can fit with mixedmodels) it isn't going to give you any of the bumpers that mixedmodels normally provides (e.g., if you create multiple random effects terms this way, then you might end up with ... "interesting" (mis-specified) models), so it's definitely a "here be dragons" kinda thing that we should fix. So with the caveat that I'm handing you a big foot-gun here, this is how I'd do it:
julia> ran = sum(term.("z$n" for n in 1:2))
z1(unknown)
z2(unknown)
julia> ran = apply_schema(sum(term.("z$n" for n in 1:2)), schema(dat), MixedModel)
z1(continuous)
z2(continuous)
julia> grouping = StatsModels.concrete_term(term(:g), dat, Dict(:g => Grouping()))
g(Grouping:5→5)
julia> reterm = MixedModels.RandomEffectsTerm(ran, grouping)
(z1 + z2 | g)
julia> formula = term(:y) ~ term(1) + fix + reterm
FormulaTerm
Response:
y(unknown)
Predictors:
1
x1(unknown)
x2(unknown)
x3(unknown)
(z1 + z2 | g)This gets us most of the way there but it still doesn't work unfortunately. The issue is that MixedModels tries to extract a schema and doesn't recognize that RandomEffectsTerms don't need them...we can duck patch that by defining StatsModels.needs_schema(::RandomEffectsTerm) = false but that's better fixed here or in statsmodels IMO.
In the interest of fixing this the right way, here are the missing pieces in MixedModels at the moment
- make
apply_schema(::FunctionTerm{typeof(|)}, ...)create an "non-concrete"RandomEffectsTerm - add additional
apply_schema(::RandomEffectsTerm, ...)which makes it concrete, does the validity checks done in teh inner constructor right now - add run-time methods for
Base.|(::AbstractTerm, ::AbstractTerm)to create aRandomEffectsTerm, or if we don't want to do type piracy, use a different syntax (likereorretermor something...)
Finally, the most superficial reason why your attempt doesn't work is a Julia/macro issue: @formula(f) only sees one thing as its argument, the Symbol :f. Macros in julia can only do transformations of the surface syntax (symbols, exprs, strings, and literal numbers). Reading the docs on macros in julia will be helpful, and people on slack/discourse/zulip can answer other questions in that direction. If you're going to go this way (which is not recommended), then you need to use eval and $f.
Originally posted by @kleinschmidt in #426 (comment)