Warnings/errors produced while running MAST with mixed effect #133

jmchan88 · 2020-04-29T06:14:38Z

Hi there,

I was wondering if you might be able to help me with an issue I've been having when running MAST including a random effect for batch. Despite following previous threads related to this topic, I keep getting a number of warnings and errors.

In particular, I have data for ~53,000 cells from 20 patients (batch) where I want to compare 2 groups of cells (condition), after calculating the centered cellular detection rate (cngeneson). I placed "batch" as a vector of 20 factors in my sca data frame. I then run the following command:

library(lme4)
zlmCond <- zlm(~condition + cngeneson + (1|batch), sca = sca, method = 'glmer', ebayes = F, strictConvergence=F, fitArgsD = list(nAGQ = 0))

However, I get a number of different warnings and errors, 3 of which worry me:

==============

Warning message:
In nobs.default(object, use.fallback = use.fallback) :
no 'nobs' method is available

Error : number of levels of each grouping factor must be < number of observations

Error in (function (cl, name, valueClass) :
assignment of an object of class “list” is not valid for @‘optimMsg’ in an object of class “LMERlike”; is(value, "character") is not TRUE

==============

It's unclear to me why 'nobs' is not recognized as a method, as it appears to be included in lme4 which I already loaded. It seems like the levels of my grouping factor (20) are well below the number of observations (53,000). I have no idea what to make of the 3rd error.

Thank you for any help you can provide!

gfinak · 2020-04-29T14:10:38Z

Can you provide a minimal reproducible example? Otherwise we are just speculating.

jmchan88 · 2020-04-29T17:07:44Z

Thank you so much for taking the time to take a look at this! I've attached the sca object that I'm working with but downsampled to 1000 cells in each condition to reduce file size. I anticipate that the above command should reproduce the errors that I'm seeing.

sca.mast_example.rds.zip

gfinak · 2020-04-29T18:03:44Z

Could you post the sca object with all cells but downsampled to several genes that give the error and others that don't?

gfinak · 2020-04-29T18:06:29Z

Okay so just looking quickly at your data set, it looks like you haven't filtered out genes that aren't expressed in any cells. That's likely the problem. When I filter on genes with at least 10% expression the errors you describe go away.
We describe the whole filtering and analysis pipeline in the MAST paper. I suggest reading it.

jmchan88 · 2020-04-29T19:17:21Z

Ah I see! I will make those changes and see how it goes. Thank you so much for checking!

jmchan88 · 2020-04-30T18:05:42Z

Hmm taking a look, it seems that filtering out genes expressed in at least 10% cells still produces the same errors/warnings for many, albeit possibly at lower frequency. They include:

Error : number of levels of each grouping factor must be < number of observations

In addition: Warning messages:
1: In nobs.default(object, use.fallback = us

2: In checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv, :
Model failed to converge with max|grad| = 0.00948906 (tol = 0.002, component 1)

5: In checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv, :
Model is nearly unidentifiable: very large eigenvalue

Rescale variables?

None of these strictly stop the code from running, so I'll take a look at the end results once it completes on the computer cluster (the code noticeably takes a much longer time than with fixed effects using bayesglm, understandably)

gfinak · 2020-04-30T18:09:19Z

Can you post code that does the filtering and leads to the same errors? I am not experiencing this side-effect.

jmchan88 · 2020-04-30T18:38:05Z

Sure! This is the code I've been using below:

==================================

suppressMessages(library(MAST))
suppressMessages(library(lme4))
suppressMessages(library(stringr))
suppressMessages(library(data.table))
suppressMessages(require("Matrix"))

args <- commandArgs(trailingOnly = TRUE)

# command line arguments
mtx_file <- args[1]
g_file <- args[2]
bc_file <- args[3]
obs_file <- args[4]
obs <- args[5]
cluster1 <- args[6]
cluster2 <- args[7]
out_dir <- args[8]

# Normalized data frame
counts = as.matrix(readMM(mtx_file))
bc = read.csv(bc_file,header=F,col.names = 'Cell', stringsAsFactors = F)
g = read.csv(g_file,header=F,col.names = 'Gene', stringsAsFactors = F)

rownames(counts) = bc$Cell
colnames(counts) = g$Gene

ms <- rowSums(counts)
norm_df <- counts/ms * median(ms)

# Log transform
data_df <- log(norm_df + 1)

# Clusters
obs_df = read.csv(obs_file,header=T,stringsAsFactors=F, sep = '\t', row.names=1)
cell_clusters = as.character(obs_df[,obs]); names(cell_clusters) <- rownames(obs_df)
batch0 = as.character(obs_df[,'batch']); names(batch0) <- rownames(obs_df)

pairwise_de <- function(df1, df2, batch1, batch2, title){
        # ORdered data frame
        o_df <- rbind(df1, df2)
        #Filter out genes expressed by less than 10% of cells
        ind_col = colSums(o_df!=0)/nrow(o_df) > 0.1
        o_df = o_df[,ind_col]
        batch = c(batch1, batch2)
        condition <- rep(c(1, 2), times=c(nrow(df1), nrow(df2)))
        # Prepare for MAST
        wellKey <- rownames(o_df)
        cdata <- data.frame(cbind(wellKey=wellKey, condition=condition))
        fdata <- data.frame(primerid=colnames(o_df))

        # SCa data
        sca <- FromMatrix( t(o_df), cdata, fdata)
        cdr2 <-colSums(assay(sca)>0)
        colData(sca)$cngeneson <- scale(cdr2)
    colData(sca)$condition <- factor(unlist(as.list(condition)))
    colData(sca)$condition <- relevel(colData(sca)$condition, 2)
    colData(sca)$batch <- factor(batch)

    # Fits
    zlmCond <- zlm(~condition + cngeneson + (1|batch), sca = sca, method = 'glmer', ebayes = F, strictConvergence=F,fitArgsD = list(nAGQ = 0),
                   fitArgsC = list(control = lmerControl(check.conv.singular = .makeCC(action = "ignore",  tol = 1e-4)), verbose=0))
    summaryDt <- summary(zlmCond, doLRT='condition1')$datatable

        # Significance table
        fcHurdle <- merge(summaryDt[contrast=='condition1' & component=='H',.(primerid, `Pr(>Chisq)`)],
                summaryDt[contrast=='condition1' & component=='logFC', .(primerid,coef, ci.hi, ci.lo)], by='primerid')

        # FDR
        fcHurdle[,fdr:=p.adjust(`Pr(>Chisq)`, 'fdr')]
        data.table::setorder(fcHurdle, fdr)

        # Export data
        write.csv(as.data.frame(fcHurdle), sprintf("%s/%s.csv", out_dir, title), quote=FALSE)
}

df1 <- data_df[cell_clusters == cluster1, ]
df2 <- data_df[cell_clusters == cluster2, ]
batch1 = batch0[ cell_clusters == cluster1]
batch2 = batch0[ cell_clusters == cluster2]

pairwise_de(df1, df2, batch1, batch2, sprintf("%s.%s_vs_%s", obs, cluster1, cluster2))

gfinak · 2020-04-30T20:08:02Z

> library(MAST)
> sca<-readRDS("~/Downloads/sca.mast_example.rds")
> sca_filtered<-filterLowExpressedGenes(sca,threshold=0.1)
Filtered out 42.64 percent of genes at 10.00 percent threshold
zlmCond <- zlm(~condition + cngeneson + (1|batch), sca = sca_filtered, method = 'glmer', ebayes = F, strictConvergence=F, fitArgsD = list(nAGQ = 0))
> zlmCond <- zlm(~condition + cngeneson + (1|batch), sca = sca_filtered, method = 'glmer', ebayes = F, strictConvergence=F, fitArgsD = list(nAGQ = 0))
Loading required namespace: lme4
 Completed [==============================>]  100% with 0 failures
fixed-effect model matrix is rank deficient so dropping 1 column / coefficient
Done!
Warning messages:
1: In optwrap(optimizer, devfun, start, rho$lower, control = control,  :
  convergence code 3 from bobyqa: bobyqa -- a trust region step failed to reduce q
2: In checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv,  :
  Model failed to converge with max|grad| = 0.0031284 (tol = 0.002, component 1)
3: In rbind(...) :
  number of columns of result is not a multiple of vector length (arg 1841)

> table(colData(sca)$batch)

PleuralEffusion         RU1065C          RU1066         RU1080C      RU1124A_LN 
             30               5             130             131              44 
      RU1144_LN        RU1144_T          RU1145          RU1152         RU1181B 
            106              61             162              64             103 
        RU1181C         RU1195A          RU1215  RU1229A_Frozen         RU1231A 
            119              24             193             185             264 
        RU1293A     RU1311A_T_1          RU426B          RU779D 
            295               8              59              17

So the above works relatively fine, some convergence errors.
Looking at your batch structure, you've got a pretty unbalanced design, I suspect this is the source of your problems.

gfinak · 2020-04-30T20:21:23Z

Also, don't fit different models for each pariwise comparison. Fit one model and compute contrasts.

jmchan88 · 2020-04-30T22:11:04Z

Ah gotcha ... I was hoping that including the random effect might allow me to account for batch effect without downsampling, but I guess it makes sense that this might lead to unstable fits that don't converge.

One warning for the life of me I can't understand though is this:
In addition: Warning message:
In nobs.default(object, use.fallback = use.fallback) :
no 'nobs' method is available

It seems like it's a function in both lme4 and stats, but both are loaded for me. Do you have a sense of what this method is when I'm running MAST and if it's something specific to my R environment? In the meantime, I'll try to downsample and balance the batches. Thank you!

gfinak · 2020-04-30T22:12:55Z

To be clear, the problem is not that you're donwnsampling, the problem is that you appear to be fitting a bunch of pairwise models. Instead fit one model and compute the contrasts of interest.

jmchan88 · 2020-04-30T22:21:50Z

Ah I see what you are saying about fitting a single model and using contrasts for multiple pairwise comparisons.

I guess for this particular case though I'm just running this code once and making a single pairwise comparison, and these 2 conditions that I'm comparing essentially cover the entire dataset; so in essence, I'm doing a single pairwise comparison for the entire dataset and getting these warnings/errors.

But your point is well taken, and definitely for instances where I'm doing multiple pairwise comparisons in the future, I'll be sure to change my code to fit a single model!

gfinak · 2020-04-30T23:51:27Z

See if you can isolate one gene that gives the nobs error (after filtering appropriately).

amcdavid · 2020-05-01T07:02:07Z

I don't think the nobs error is significant. It has to do with some generic not being present with a calculation about the residual degrees of freedom that we save for models where we are doing shrinkage of the gaussian variance, but don't actually use it here. I would like to get to the bottom of it, though, and silence it.
I suspect the Error : number of levels of each grouping factor must be < number of observations, are due to some genes with low expression vs the relatively high number of batch groups. In that case, we just don't fit the continuous component of the model, so this also should be harmless.
I'm perplexed by the Error in (function (cl, name, valueClass) : assignment of an object of class “list” is not valid for @‘optimMsg’ in an object of class “LMERlike”; is(value, "character") is not TRUE report.
If you are getting a lot of eigenvalue or convergence complaints, etc, that probably means that some component of the design is problematic. I am a bit worried about some of the batches that only had 5 or 8 cells measured, in particular.

jmchan88 · 2020-05-01T16:23:37Z

Thank you both so much for your comments! I do think you're right that the batch sizes are unbalanced and that may be leading to errors. The batch sizes that I sent you were uniformly downsampled actually just to give you guys a smaller file size to work with, but are actually much larger. That said, there is def the same imbalance with one batch size far outnumbering the others. I just did a run without that large batch and found that a number of these errors had disappeared, so I do believe I might need to downsample a bit to try to restore some balance. I'll see how the downsampling works ... thank you so much!

gfinak closed this as completed Apr 30, 2020

jgamache014 mentioned this issue Mar 30, 2021

simulate() error when running Power.Rmd script kdzimm/PseudoreplicationPaper#2

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Warnings/errors produced while running MAST with mixed effect #133

Warnings/errors produced while running MAST with mixed effect #133

jmchan88 commented Apr 29, 2020

gfinak commented Apr 29, 2020

jmchan88 commented Apr 29, 2020

gfinak commented Apr 29, 2020

gfinak commented Apr 29, 2020 •

edited

Loading

jmchan88 commented Apr 29, 2020

jmchan88 commented Apr 30, 2020

gfinak commented Apr 30, 2020

jmchan88 commented Apr 30, 2020 •

edited

Loading

gfinak commented Apr 30, 2020

gfinak commented Apr 30, 2020

jmchan88 commented Apr 30, 2020

gfinak commented Apr 30, 2020

jmchan88 commented Apr 30, 2020

gfinak commented Apr 30, 2020

amcdavid commented May 1, 2020

jmchan88 commented May 1, 2020

Warnings/errors produced while running MAST with mixed effect #133

Warnings/errors produced while running MAST with mixed effect #133

Comments

jmchan88 commented Apr 29, 2020

gfinak commented Apr 29, 2020

jmchan88 commented Apr 29, 2020

gfinak commented Apr 29, 2020

gfinak commented Apr 29, 2020 • edited Loading

jmchan88 commented Apr 29, 2020

jmchan88 commented Apr 30, 2020

gfinak commented Apr 30, 2020

jmchan88 commented Apr 30, 2020 • edited Loading

gfinak commented Apr 30, 2020

gfinak commented Apr 30, 2020

jmchan88 commented Apr 30, 2020

gfinak commented Apr 30, 2020

jmchan88 commented Apr 30, 2020

gfinak commented Apr 30, 2020

amcdavid commented May 1, 2020

jmchan88 commented May 1, 2020

gfinak commented Apr 29, 2020 •

edited

Loading

jmchan88 commented Apr 30, 2020 •

edited

Loading