non-positive definite Hessian #240
Replies: 3 comments 13 replies
-
|
Thanks. You'll get lower AIC when you include extra fixed effects compared to a random effects autocorrelation structure, because that is usually how it works - random effects add a penalty to the likelihood whereas fixed-effects do not. As such, comparing models with random vs. fixed effects is challenging. In fact, you shouldn't do it; comparing a model with and without random effects constitutes a test on the boundary of the valid parameter space, which is where AIC breaks down (i.e., is invalid to use). Residual diagnostics (as in DHARMa, or the native residuals included in gllvm) do not measure goodness-of-fit, only validity of assumptions. Usually, if a model is finicky it is a sample size problem. How many non-zero observations do you have for each species, and how many effects in the model? |
Beta Was this translation helpful? Give feedback.
-
Beta Was this translation helpful? Give feedback.
-
|
Thank you! For this example I'm looking at factors explaining salmon diet across a sound. After removing empty stomachs, there are 575 fish samples from 13 locations during May-September over three years. It's not a balanced sampling design and some 'sites' are very close to one another and farther from others, so I've coverted latitute and longitude to four PCNM (principle coordinates of neighbour matrices) values. The salmon are genetically IDed to five stocks (based on where they spawn), although one of those groups is 'unknown'. I also have data on the fork length and condition index of the salmon and whether each fish is likely from a hatchery or a natural population. I'm interested in how spatial, temporal, and biological (size, condition, stock, hatchery) affect diet. Diet is mass in grams so I'm using a Tweedie distribution. The variables in the model are: length (fork length) Comparing unconstrained and constrained models: Unconstrained Constrained UCmod gives me 4/15 taxa with mild DHARMa residual issues, predict(se.fit = TRUE) works. Also nice because I can make a species correlation plot. Cmod gives me 6/15 taxa with DHARMa residual issues, predict(se.fit = TRUE) does not work and tells me that the fixed effect covariance matrix is not semi positive definite. In the end, simulating from either model ends up telling a similar story about the impact of most variables (larger fish are more likely to eat bigger things as they grow later in the season, with some spatial differences in what they're eating). The major difference is that the unconstrained model suggests that stock explains 18% of variation, on average (from a variance partitioning plot) , while the constrained model is closer to 1%. That said, running EMmeans on the unconstrained model shows overlapping 95% CIs on the stock means. I'm trusting the unconstrained model here because DHARMa seems a bit better, and it's not throwing the error with singularity in the Hessian. A single LV unconstrained model actually has a lower AICc, but I'm going with the 2 LV model because it's more interpretable (can look at degree of species covariance), with the caveat that it doesn't necessarily explain the data better. What are your thoughts on this? |
Beta Was this translation helpful? Give feedback.

Uh oh!
There was an error while loading. Please reload this page.
-
Thanks for the insight @BertvanderVeen! I hadn't thought to look at the Hessian (and before this didn't know what to look for), but I think you were right as I was seeing both negative and postive values. I was indeed setting Power = NULL already.
Very briefly, I'm modelling diet (biomass) in salmon collected over multiple seasons and years in a large sound (lots and lots of zero biomass and small values for different taxa, then some large values when a salmon eats something big like a fish). So I have biological variables (condition, body size), temporal variables (day of year, year), and spatial variables (I've ended up using PCNM values as predictors rather than an autocorrelation structure on the row effect of collection site, as DHARMa scaled residual plots/tests and AIC both seem to think PCNM does a better job). I was avoiding interactions because I got to a place where DHARMa seemed OK and I didn't want to further complicate matters. However trying emmeans made me realized that my 'best' model likely did have non-positive-defiteness in the Hessian, as you say. Because I know biologically the salmon were using different areas (spatially) depending on body size, I have just tried putting a body size interaction on all my PCNM values, and now I can get emmeans with confience intervals. DHARMa still seems OK as it's going to get for 15 taxa likely, and AIC is lower than the no-interaction model.
So, probably still more exploring to make sure I'm not bungling up anything else, but thanks a ton for pointing me in a productive direction again!
Originally posted by @GCov in #159 (reply in thread)
Beta Was this translation helpful? Give feedback.
All reactions