Progress Bar for GLMM #331

DarioS · 2024-05-22T06:00:09Z

Is it feasible to add a progress bar if a mixed model is used? It has been running for the past five hours without a progress message.

Running GLMM model - this may take a few minutes
    PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND
3410790 dario     20   0   39.0g  21.2g  99840 R  4250   2.8    6d+18h rsession

It also appears to consume all CPUs on a server, causing CPU starvation due to RStudio Server ignoring a user's ~/.bashrc file with:

export OPENBLAS_NUM_THREADS=1

I wonder how to choose between different options for glmm.solver in case one of them is less resource-intensive. I tried "Fisher".

The text was updated successfully, but these errors were encountered:

MikeDMorgan · 2024-05-22T08:28:38Z

Resource use is controlled by a combination of OpenMP and BiocParallel - the latter for parallelising over the nhoods, and OpenMP for multi-threading in the Rcpp code. A default for glmm.solver will be added. In terms of choosing between solvers, Fisher is Quasi-Newton, so should have fast convergence when close to the (local) optimum. The HE is good for solving with a pre-defined covariance matrix, such as a genetic relationship matrix. HE-NNLS is implemented for the cases where variance parameters are negative, which is caused by "true" values being very close to zero. The solver will give a message and highlight that it is switching to the constrained optimisation.

GLMMs tend to scale poorly - while I have implemented a range of measures to cut this down, i.e. parallelisation and multi-threading, however, the bottlenecks are several dense matrix multiplications which are ~ $O(n^3)$.

I'm not sure why your Rstudio is ignoring environmental variables, as this is the recommendation: https://cran.r-project.org/doc/manuals/r-release/R-exts.html#OpenMP-support - you may need to seek Rstudio support to resolve the issue.

DarioS · 2024-06-13T08:00:02Z

Specifying sample ID as a fixed effect isn't a quick workaround to avoid enriched neighbourhoods comprised of one sample.

design <- data.frame(colData(MILOdata))[, c("Sample", "Smoker")]
design <- distinct(design)
rownames(design) <- design$Sample
testNhoods(MILOdata, design = ~ Smoker + Sample, design.df = design, fdr.weighting = "graph-overlap")
Using TMM normalisation
Error in glmFit.default(y, design = design, dispersion = dispersion, offset = offset,  : 
  Design matrix not of full rank. The following coefficients not estimable:
 SampleOSCC_26-P

yulijia · 2024-06-15T01:07:32Z

> library(BiocParallel)
> da_results <- testNhoods(hn_milo, design = ~ Condition+(1|patient_ID),
+                          design.df = hn_design,
+                          fdr.weighting="graph-overlap")
Random effects found
Using TMM normalisation
Running GLMM model - this may take a few minutes
Error in testNhoods(hn_milo, design = ~Condition + (1 | patient_ID), design.df = hn_design,  : 
  Lowest traceback returned: 5: handle_error(e)
4: h(simpleError(msg, call))
3: .handleSimpleError(function (e) 
   {
       annotated_condition <- handle_error(e)
       stop(annotated_condition)
   }, "argument is of length zero", base::quote(if (!glmm.control$solver %in% 
       c("HE", "Fisher", "HE-NNLS")) {
       stop(glmm.control$solver, " not recognised - must be HE, HE-NNLS or Fisher")
   }))
2: fitGLMM(X = Xmodel, Z = Zmodel, y = Y[i, ], offsets = off.sets, 
       random.levels = randlevels, REML = reml, dispersion = disper[i], 
       geno.only = genonly, Kin = kinship, glmm.control = glmm.contr)
1: FUN(...)
In addition: There were 16 warnings (use warnings() to see them)

Hi @MikeDMorgan, I performed a GLMM analysis. However, it threw an error that I couldn't identify. Is there any way to check where the error is coming from in this situation?

DarioS · 2024-06-15T04:00:03Z

Parameter glmm.solver is mandatory if a mixed model fitted, but it had no default. That one was already resolved by #328.

yulijia · 2024-06-16T00:55:35Z

Here is another problem: after performing the GLMM, my DA results include NA values. However, plotNhoodGraphDA and groupNhoods cannot handle NA values.

da_results
logFC logCPM SE tvalue PValue patient_ID variance Converged Dispersion Logliklihood Nhood
1 -22.8268334 8.004828 10552.160078 -0.002163238 0.9982740 3.540525e+01 FALSE 0.01097306 -82.278581 1
2 -22.4599347 10.410420 10536.597689 -0.002131612 0.9982992 9.792410e+00 FALSE 0.01154928 -84.030955 2
3 -21.0766982 9.975377 10408.624273 -0.002024926 0.9983843 9.374782e-01 FALSE 0.01201461 -129.165084 3
4 NA 10.662702 NA NA NA NA FALSE NA NA 4
5 -22.1270556 9.093788 10354.004528 -0.002137053 0.9982949 7.603262e-01 FALSE 0.01194637 -64.592597 5
6 -22.3102589 8.740509 10457.424887 -0.002133437 0.9982978 3.258474e+01 FALSE 0.01143842 -110.076552 6
7 -21.4542960 13.159483 10246.905729 -0.002093734 0.9983294 2.088114e+01 FALSE 0.01157411 -81.276525 7

plotNhoodGraphDA(hn_milo, da_results, layout="UMAP",alpha=0.1)
Error in [<-.data.frame(*tmp*, signif_res$SpatialFDR > alpha, res_column, :
missing values are not allowed in subscripted assignments of data frames

da_results <- groupNhoods(hn_milo, da_results,compute.new =T,da.fdr =0.1)# max.lfc.delta = 2,
Found NA DA neighbourhoods at FDR 10%
Error in ..subscript.2ary(x, l[[1L]], l[[2L]], drop = drop[1L]) :
NA subscripts in x[i,j] not supported for 'x' inheriting from sparseMatrix
In addition: Warning message:
In groupNhoods(hn_milo, da_results, compute.new = T, da.fdr = 0.1) :
NA values found in SpatialFDR vector

MikeDMorgan · 2024-06-20T08:15:48Z

@yulijia You can use subset.nhoods to plot the nhoods with non-NA values. These usually arise because there is model separation - you have a variable that perfectly separates zero from non-zero counts in one or more nhoods. You can check for these using the checkSeparation function, with the goal being to adjust either your NN-graph or nhood refinement such that you don't get this separation (or you exclude these from your DA testing).

On a separate note - your issue is unrelated to the OP, so you should open a new issue not append yours to the end of this one.

MikeDMorgan closed this as completed Jun 20, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Progress Bar for GLMM #331

Progress Bar for GLMM #331

DarioS commented May 22, 2024

MikeDMorgan commented May 22, 2024

DarioS commented Jun 13, 2024

yulijia commented Jun 15, 2024

DarioS commented Jun 15, 2024

yulijia commented Jun 16, 2024

MikeDMorgan commented Jun 20, 2024 •

edited

Progress Bar for GLMM #331

Progress Bar for GLMM #331

Comments

DarioS commented May 22, 2024

MikeDMorgan commented May 22, 2024

DarioS commented Jun 13, 2024

yulijia commented Jun 15, 2024

DarioS commented Jun 15, 2024

yulijia commented Jun 16, 2024

MikeDMorgan commented Jun 20, 2024 • edited

MikeDMorgan commented Jun 20, 2024 •

edited