New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add Grouping contrasts #339
Conversation
based on JuliaStats/StatsModels.jl#181 |
Codecov Report
@@ Coverage Diff @@
## master #339 +/- ##
==========================================
- Coverage 95.75% 95.63% -0.13%
==========================================
Files 22 23 +1
Lines 1485 1512 +27
==========================================
+ Hits 1422 1446 +24
- Misses 63 66 +3
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The catch is that, at the moment, you have to manually specify the
grouping contrasts when constructing the model. I've added a guard to catch OOM
errors during schema construction which prints a warning, but we could also
throw a different error there instead to be clearer.
I think this is the way to go for now, but we should consider automatically setting the contrasts on grouping variables and doing this separately from the fixed-effects contrasts. This would allow us a bit of magic that could be good or bad, namely having a discrete, yet numeric fixed effect that also appears as a grouping variable. You can think of this as an approximation for a GAM (it's not quite the same as a GAM, but GAMs can indeed be rewritten as mixed models). Right now, you can of course still do this by having two copies of the same variable, one categorical and one numeric. For our planned neuro time-series stuff, this is actually surprisingly relevant because we time as a discrete numeric variable and expect non linear temporal trends.
if isa(e, OutOfMemoryError) | ||
@warn "Random effects grouping variables with many levels can cause out-of-memory errors. Try manually specifying `Grouping()` contrasts for those variables." | ||
end | ||
rethrow(e) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is the way to go right now, but I'm thinking about whether we should automatically set Grouping
contrasts for grouping variables.
Co-authored-by: Phillip Alday <palday@users.noreply.github.com>
Yeah I agree @palday, it would be nice to be able to do this automatically. It actually may be possible now but would take some annoying fiddling; you'd have to unpack the |
I think at least some of the CI failures are not unrelated to this PR but have to do with how the OS handles OOM. The MacOS one failed before too, apparently during the tests for the added functionality where I intentionally trigger an OOM error. And the cancelled jobs might be related to this as well, but I'm not sure, although the passing tests on at least one windows build tells me that OOM doesn't always result in a canceled job. I guess the thing to do is to remove the tests which intentionally cause an OOM because otherwise we'll always have failing CI which...defeats the purpose. |
@kleinschmidt The failures on Future are a known issue. For Windows .... I'll have to look at the details of the GitHub Action / its docker image to figure out why it just cancels. For OS X: do you have a local Mac? |
I'm afraid not. |
I have many local macs |
@kliegl can you check out this PR branch and run the tests? The procedure is to
|
The |
ah yeah you'll need to first go to that directory, with
|
It is running now. I just had to break it into two commands. |
ERROR: Package MixedModels errored during testing (received signal: KILL) |
Yeah looks from that like the process is just killed when it hits the OOM tests. I'll push a change that only tests the good path (I'm sufficiently convinced that trying to construct a 1,000,000x1,000,000 dense matrix will OOM...) |
Just ran it again: Do you need the report? |
Looks like the CI now is failing because of a problem with the artifacts... |
@kleinschmidt Yeah, I'm still sorting out the artifact link stuff. Right now, the default link always points to the newest version, but that of course doesn't match the SHA for older versions. I'll see if I can't fix that as part of #341 |
@kleinschmidt assuming that all the CIs (potentially except Future) pass, I think this is good to go. |
* add Grouping contrasts * export and add a getproperty patch to work around statsmodels * warn users to use Grouping when OOM during schema call * Update src/grouping.jl Co-authored-by: Phillip Alday <phillip.alday@mpi.nl> (cherry picked from commit 2f9d423)
* add Grouping contrasts (#339) Co-authored-by: Dave Kleinschmidt <dave.f.kleinschmidt@gmail.com>
This is in lieu of having these defined in StatsModels.jl; as we've discussed
they may be a bit to specialized for the specific demands of grouping variables
in MixedModels without defining enough functionality to be of use in other
settings, so it makes sense to have them here until some kind of sensible
handling of grouping is incorporated into StatsModels.
The
Grouping
contrasts are a "pseudocontrast" that never defines a contrastsmatrix, just keeps track of the unique values and an "inverse index" which maps
levels to a numeric index. This allows you to get around the
OutOfMemoryError
that's triggered when you have a grouping variable with a very large number of
levels. The catch is that, at the moment, you have to manually specify the
grouping contrasts when constructing the model. I've added a guard to catch OOM
errors during schema construction which prints a warning, but we could also
throw a different error there instead to be clearer. The downside with that
would be that if it's NOT a problem with the ranef grouping term, then they'd
just get useless advice and no useful stacktrace, so that's why I went with
printing a warning.