Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Progress Bar for GLMM #331

Closed
DarioS opened this issue May 22, 2024 · 6 comments
Closed

Progress Bar for GLMM #331

DarioS opened this issue May 22, 2024 · 6 comments

Comments

@DarioS
Copy link

DarioS commented May 22, 2024

Is it feasible to add a progress bar if a mixed model is used? It has been running for the past five hours without a progress message.

Running GLMM model - this may take a few minutes
    PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND
3410790 dario     20   0   39.0g  21.2g  99840 R  4250   2.8    6d+18h rsession

It also appears to consume all CPUs on a server, causing CPU starvation due to RStudio Server ignoring a user's ~/.bashrc file with:

export OPENBLAS_NUM_THREADS=1

I wonder how to choose between different options for glmm.solver in case one of them is less resource-intensive. I tried "Fisher".

@MikeDMorgan
Copy link
Member

Resource use is controlled by a combination of OpenMP and BiocParallel - the latter for parallelising over the nhoods, and OpenMP for multi-threading in the Rcpp code. A default for glmm.solver will be added. In terms of choosing between solvers, Fisher is Quasi-Newton, so should have fast convergence when close to the (local) optimum. The HE is good for solving with a pre-defined covariance matrix, such as a genetic relationship matrix. HE-NNLS is implemented for the cases where variance parameters are negative, which is caused by "true" values being very close to zero. The solver will give a message and highlight that it is switching to the constrained optimisation.

GLMMs tend to scale poorly - while I have implemented a range of measures to cut this down, i.e. parallelisation and multi-threading, however, the bottlenecks are several dense matrix multiplications which are ~ $O(n^3)$.

I'm not sure why your Rstudio is ignoring environmental variables, as this is the recommendation: https://cran.r-project.org/doc/manuals/r-release/R-exts.html#OpenMP-support - you may need to seek Rstudio support to resolve the issue.

@DarioS
Copy link
Author

DarioS commented Jun 13, 2024

Specifying sample ID as a fixed effect isn't a quick workaround to avoid enriched neighbourhoods comprised of one sample.

design <- data.frame(colData(MILOdata))[, c("Sample", "Smoker")]
design <- distinct(design)
rownames(design) <- design$Sample
testNhoods(MILOdata, design = ~ Smoker + Sample, design.df = design, fdr.weighting = "graph-overlap")
Using TMM normalisation
Error in glmFit.default(y, design = design, dispersion = dispersion, offset = offset,  : 
  Design matrix not of full rank. The following coefficients not estimable:
 SampleOSCC_26-P

@yulijia
Copy link

yulijia commented Jun 15, 2024

> library(BiocParallel)
> da_results <- testNhoods(hn_milo, design = ~ Condition+(1|patient_ID),
+                          design.df = hn_design,
+                          fdr.weighting="graph-overlap")
Random effects found
Using TMM normalisation
Running GLMM model - this may take a few minutes
Error in testNhoods(hn_milo, design = ~Condition + (1 | patient_ID), design.df = hn_design,  : 
  Lowest traceback returned: 5: handle_error(e)
4: h(simpleError(msg, call))
3: .handleSimpleError(function (e) 
   {
       annotated_condition <- handle_error(e)
       stop(annotated_condition)
   }, "argument is of length zero", base::quote(if (!glmm.control$solver %in% 
       c("HE", "Fisher", "HE-NNLS")) {
       stop(glmm.control$solver, " not recognised - must be HE, HE-NNLS or Fisher")
   }))
2: fitGLMM(X = Xmodel, Z = Zmodel, y = Y[i, ], offsets = off.sets, 
       random.levels = randlevels, REML = reml, dispersion = disper[i], 
       geno.only = genonly, Kin = kinship, glmm.control = glmm.contr)
1: FUN(...)
In addition: There were 16 warnings (use warnings() to see them)

Hi @MikeDMorgan, I performed a GLMM analysis. However, it threw an error that I couldn't identify. Is there any way to check where the error is coming from in this situation?

@DarioS
Copy link
Author

DarioS commented Jun 15, 2024

Parameter glmm.solver is mandatory if a mixed model fitted, but it had no default. That one was already resolved by #328.

@yulijia
Copy link

yulijia commented Jun 16, 2024

Here is another problem: after performing the GLMM, my DA results include NA values. However, plotNhoodGraphDA and groupNhoods cannot handle NA values.

da_results
logFC logCPM SE tvalue PValue patient_ID variance Converged Dispersion Logliklihood Nhood
1 -22.8268334 8.004828 10552.160078 -0.002163238 0.9982740 3.540525e+01 FALSE 0.01097306 -82.278581 1
2 -22.4599347 10.410420 10536.597689 -0.002131612 0.9982992 9.792410e+00 FALSE 0.01154928 -84.030955 2
3 -21.0766982 9.975377 10408.624273 -0.002024926 0.9983843 9.374782e-01 FALSE 0.01201461 -129.165084 3
4 NA 10.662702 NA NA NA NA FALSE NA NA 4
5 -22.1270556 9.093788 10354.004528 -0.002137053 0.9982949 7.603262e-01 FALSE 0.01194637 -64.592597 5
6 -22.3102589 8.740509 10457.424887 -0.002133437 0.9982978 3.258474e+01 FALSE 0.01143842 -110.076552 6
7 -21.4542960 13.159483 10246.905729 -0.002093734 0.9983294 2.088114e+01 FALSE 0.01157411 -81.276525 7

plotNhoodGraphDA(hn_milo, da_results, layout="UMAP",alpha=0.1)
Error in [<-.data.frame(*tmp*, signif_res$SpatialFDR > alpha, res_column, :
missing values are not allowed in subscripted assignments of data frames

da_results <- groupNhoods(hn_milo, da_results,compute.new =T,da.fdr =0.1)# max.lfc.delta = 2,
Found NA DA neighbourhoods at FDR 10%
Error in ..subscript.2ary(x, l[[1L]], l[[2L]], drop = drop[1L]) :
NA subscripts in x[i,j] not supported for 'x' inheriting from sparseMatrix
In addition: Warning message:
In groupNhoods(hn_milo, da_results, compute.new = T, da.fdr = 0.1) :
NA values found in SpatialFDR vector

@MikeDMorgan
Copy link
Member

MikeDMorgan commented Jun 20, 2024

@yulijia You can use subset.nhoods to plot the nhoods with non-NA values. These usually arise because there is model separation - you have a variable that perfectly separates zero from non-zero counts in one or more nhoods. You can check for these using the checkSeparation function, with the goal being to adjust either your NN-graph or nhood refinement such that you don't get this separation (or you exclude these from your DA testing).

On a separate note - your issue is unrelated to the OP, so you should open a new issue not append yours to the end of this one.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants