Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error: grouping factors must have > 1 sampled level #146

Open
esrasefik opened this issue Nov 28, 2020 · 4 comments
Open

Error: grouping factors must have > 1 sampled level #146

esrasefik opened this issue Nov 28, 2020 · 4 comments

Comments

@esrasefik
Copy link

esrasefik commented Nov 28, 2020

Dear MAST Team,
I am trying to use MAST for genotype-based differential expression analysis in a number of cell-type specific clusters that I identified in my single cell RNA-Seq data. I have two genotypes (wild type and a mutant strain with a hemizygous deletion of 21 genes). I have 4 replicates per genotype, so 8 mice in total (metadata on replicate ids is referred to as orig.ident below). Due to concerns related to pseudo-replication, I would like to add a random effect for replicate to my model. However, when I try to run the code pasted below, I get an error that states:

Error: grouping factors must have > 1 sampled level no 'nobs' method is available. 
Error: number of levels of each grouping factor must be < number of observations (problems: orig.ident).

The pipeline does not crash in response to this error, but I would like to understand what is causing the error message. Any feedback would be appreciated.

Note: I performed my data preprocessing and clustering using Seurat. Hence, I first had to transform my Seurat object into an SCA object. I believe I followed the vignette correctly, but in case the error is caused by a mistake in one of the steps prior to running zlm, I included all relevant code:

s_obj4.sce <- as.SingleCellExperiment(s_obj4, assay = "RNA") # s_obj4 is my Seurat object that includes the raw counts
s_obj4.sce <- logNormCounts(s_obj4.sce, log = TRUE) # normalization
s_obj4.sca <- SceToSingleCellAssay(s_obj4.sce, check_sanity = FALSE) 

cdr <-colSums(assay(s_obj4.sca)>0) # calculcate cellular detection rate
colData(s_obj4.sca)$cngeneson <- scale(cdr)

cond <-factor(colData(s_obj4.sca)$genotype)
cond <-relevel(cond, "1") # "genotype = 1" refers to wild type
colData(s_obj4.sca)$genotype <-cond

s_obj4.sca_cl1 <- subset(s_obj4.sca, CellType =='1') # subset the data to only cells from CellType =='1' to perform the analysis on only neurons.

zlm_cl1_test <- zlm(~ genotype + cngeneson + (1 | orig.ident),
     sca = s_obj4.sca_cl1, 
     exprs_value = 'logcounts', 
     method="glmer", 
     ebayes=FALSE,
      silent=T, 
     fitArgsD = list(nAGQ = 0))

Thank you for any help you can give me!
Esra

@gfinak
Copy link
Member

gfinak commented Nov 28, 2020

Look at the crosstabulation of what you're actually modeling.
What does the table of genotype by orig.ident for cell type 1 look like?
The error suggests that you have only one genotype or one value of orig.ident in your data, probably after subsetting to celltype 1.
The error is probably only occuring for some genes, and may be in the continuous part of the model since that is conditional on a gene being expressed.

@esrasefik
Copy link
Author

Ah that makes so much sense! Thank you for your quick response. In that case, do you think filtering the data for each cluster in the following way could help fix this issue: A gene should be expressed in at least 10% (just as an example) of the wild type or mutant cells of a given cluster. I believe the FindMarkers function in Seurat, which provides an option to use MAST for DE analysis, implements a thresholding rule like this. Is there a similar way I could filter my data in MAST prior to zlm or in the same function as zlm? Thank you again!

Esra

@gfinak
Copy link
Member

gfinak commented Nov 28, 2020

Basically yes. When I do per cluster analysis I'll do the filtering in a loop. There's no specific API for this in MAST.

@esrasefik
Copy link
Author

I see. Thank you again! Given the need for cluster-specific filtering prior to zlm with random effect, I assume that cellular detection rate should be re-calculated for each cluster after filtering. Could you please confirm if that is accurate?

Esra

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants