Replies: 3 comments 4 replies
-
|
To show you the difference: Two graphs...same data...same R code....different gllvm versions (<> May 2024) |
Beta Was this translation helpful? Give feedback.
-
|
Yes....I will do that on Friday. Will use your online spider data.
By the way...20 people...20 laptops..1/3th with no problems for this specific simulation, 1/3th with an error, 1/3th with simulated CIs that don't make sense. So...something going on there.
Kind regards,
Alain
…On 3 Dec 2024 at 16:54 +0000, Bert ***@***.***>, wrote:
Can you turn this into a reproducible example? When I try to use your function I get the error "MyData.theta is missing", because the data is never created.
—
Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you authored the thread.Message ID: ***@***.***>
|
Beta Was this translation helpful? Give feedback.
-
|
Thanks for sending my the details on this per e-mail, Alain. Your code seems fine, and there is no bug. This is a classical case of a situation where the model struggles to converge, and there are multiple similar solutions in close vicinity of each other. Some of those solutions provide more numerical issues than others, which is why the confidence intervals for some of your participants are much larger than for others. This is somewhat related to a change in the calculation of the standard errors, in an attempt to retain some numerical accuracy for models that poorly converge, and to flag them better. Previously, these could sometimes result in considerably too optimistic standard errors (read: close to zero). Here are a few tips/workarounds for this:
There are a bunch of other tips I have collected over time, but you get the idea: this particularly model struggles to converge, and that is causing the issues. This will be different for different participants, just because their computer architecture is different. |
Beta Was this translation helpful? Give feedback.


Uh oh!
There was an error while loading. Please reload this page.
-
Hello Bert,
Last year (23 November 2023), you helped me to write some code to simulate sigma.lv * theta a large number of times, and use these to get an (approximate) 95% CI. It worked ok up to at least May 2024. I've now updated gllvm to the latest development version (and updated all packages), but the simulated results don't match the original 'sigma.lv * theta' coming out of the model anymore.
So..something must have changed. Is that a bug in gllvm, or do I need to update the R code below?
#' Function for simulating theta and sigma.lv a large
#' number of times for a GLLVM.
MySimGLLVMthetas <- function(MyModel = MyModel,
Np = Np) {
#MyModel = M1.nb
#Np = ncol(SpecParasit)
#' Input:
#' MyModel: Name of the gllvm model.
#' Np: Number of species
#' Output of function: MyData.theta
#' - The MyData.theta contains Np rows (1 for each species).
#' - The gllvm software estimates a theta and a sigma.lv, but
#' we need the cross-product, and a 95% CI for this cross-product.
#' - The theta.Sigmalv is the cross product.
#' - The lower and upper bounds define a 95% CI.
#' Calculate a 95% confidence interval for the loadings.
#' Our first attempt:
#' MyData.theta$lowerWrong <- MyData.theta$theta.Sigmalv - 1.96 * MyModel$sd$theta
#' MyData.theta$upperWrong <- MyData.theta$theta.Sigmalv + 1.96 * MyModel$sd$theta
#' Technically, this is not correct. The MyModel$params$theta are
#' parameters, and MyModel$params$sigma.lv is also a (strictly positive)
#' parameter. Hence, we have a cross-product of two parameters,
#' and one of them is positive. Getting a mathematical expression for the
#' SE of this cross-product is tricky. When something is tricky, to
#' bootstrap/simulate.
#' Plan:
#' 1. Simulate 10,000 new sets of theta and sigma.lv.
#' This is achieved by simulating from a multivariate normal
#' distribution with mean and covariance matrix coming from
#' the estimated model.
#' 2. Extract the 10,000 simulated theta and sigma.lv values.
#' 3. Calculate 10,000 times: theta * sigma.lv
#' 4. Access the 2.5% and 97.5% values of this cross-product.
#' 5. Plot the whole thing.
#' Let's implement this.
#' 1. Simulate 10,000 new sets of all parameter. We need to get
#' the estimated values and the corresponding covariance matrix.
#' That is slightly ugly:
#' - mean values: MyModel$TMBfn$par[MyModel$Hess$incl]
#' - covariance matrix: MyModel$Hess$cov.mat.mod
#' And not all parameters are on the real scale; some use a different
#' internal parametrisation. For example, for the simulated
#' sigma.lv we need to take abs(sigma.lv).
SimPar <- MASS::mvrnorm(n = 10000,
mu = MyModel$TMBfn$par[MyModel$Hess$incl],
Sigma = MyModel$Hess$cov.mat.mod)
#' dim(SimPar)
#' This object contains 10,000 rows and ** parameters.
#' Each row contains a set of simulated parameters from the model.
#' SimPar[1,] #' First simulated data set.
#' SimPar[2,] #' Second simulated data set.
#' SimPar[3,] #' Third simulated data set.
#' Explain the number of parameters
#' Answer:
#' -We have Np species, 6 covariates and the sample size = N
#' -The total number of parameters =
#' Np intercepts +
#' 6 * Np slopes +
#' Np thetas (factor loadings)
#' Np + 6 * Np + Np
#' Why are those names so funny?
#' These are the names in that Hessian matrix
#'names(MyModel$TMBfn$par[MyModel$Hess$incl])
#' Those are rather cumbersome names.
#' - The lambdas are the thetas (factor loadings).
#' - The bs are the b0 (intercepts) and betas (slopes).
#' - And there is also a sigmaLV.
#' - Really? This is TMB?
#' Anyway...let us stop complaining and go on with the
#' simulation.
#' Grab the 10,000 simulated sigmaLV:
sigma.lv.10000 <- SimPar[,colnames(SimPar) == "sigmaLV"]
length(sigma.lv.10000)
#' Apply the abs() function (see text above).
sigma.lv.10000 <- abs(sigma.lv.10000)
#' Grab the simulated theta values.
#' There are only Np-1 of them as the first theta is set to 1 as otherwise
#' the algorithm cannot estimate the thetas.
theta10000 <- SimPar[,colnames(SimPar) == "lambda"]
#' dim(theta10000)
#' CATCH UP:
#' All we have done is:
#' -Simulate 10,000 sets of parameters.
#' -Extract the 10,000 sigma.lv.
#' -Extract the 10,000 theta
#' We can now 10,000 times calculate theta * sigma.lv
#' Some fancy code from dr. Bert van Veen to calculate
#' theta * sigma.lv for each simulated value, where theta
#' is a vector and sigma.lv is a scalar value.
Perm <- matrix(0,
nrow = MyModel$num.lv,
ncol = ncol(theta10000))
for(i in 1:MyModel$num.lv){
Perm[i,(sum(Perm)+1):(sum(Perm)+ncol(MyModel$y)-i)] <- 1
}
#' Calculate 10,000 times sigma.lv * theta.
SigmalvTimestheta <- (sigma.lv.10000 %*% Perm) * theta10000
#' RECAP:
#' - We used the estimated model parameters and covariance matrix,
#' and simulated 10,000 sets of theta and sigma.lv.
#' - A little issue was that gllvm does not estimate the first
#' factor loading, hence it not simulated neither.
#' - We can calculate 10,000 times sigma.lv * theta,
#' where sigma.lv is a scalar and theta is a vector (in this example).
#' And for each column determine the 2.5% and 97.5% values
loadings.CI <- apply(SigmalvTimestheta,
MARGIN = 2,
FUN = quantile,
probs = c(0.025,0.975))
#' Each column contains the lower and upper bounds for a factor loading
#' for a specific species.
#' head(loadings.CI)
#' Add the results to MyData.theta, but recall that the first
#' loading was set to 1. Hence, it was never simulated.
MyData.theta$lower <- 0
MyData.theta$lower[2:Np] <- loadings.CI["2.5%", ]
MyData.theta$upper <- 0
MyData.theta$upper[2:Np] <- loadings.CI["97.5%", ]
MyData.theta
}
MyData.theta <- MySimGLLVMthetas(MyModel = M1.nb,
Np = ncol(SpecParasit))
head(MyData.theta)
#' This was the outline of the simulation study.
#' 1. Simulate 10,000 new sets of theta and sigma.lv. This is
#' achieved by simulating from a multivariate normal distribution
#' with mean given by the estimated values and covariance matrix
#' also coming from the estimated model.
#' 2. Extract the 10,000 theta and sigma.lv values.
#' 3. Calculate 10,000 times theta * sigma.lv
#' 4. Access the 2.5% and 97.5% values of this cross-product.
#' 5. Plot the whole thing.
#' Creating a nice plot.
#' Reorder the y-axis based on theta.Sigmalv values
MyData.theta$Species <- factor(MyData.theta$Species,
levels = MyData.theta$Species[order(MyData.theta$theta.Sigmalv)])
#' Plot the results.
p2 <- ggplot(MyData.theta,
aes(x = theta.Sigmalv, y = Species)) +
geom_point() +
geom_errorbarh(aes(xmin = lower,
xmax = upper),
height = 0.2) +
geom_vline(xintercept = 0, lty = 2, col = "red") +
theme(text = element_text(size = 8),
legend.position = "none") +
xlab("Theta") + ylab("Species")
p2
Beta Was this translation helpful? Give feedback.
All reactions