Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

error message when I am using estimateDispersions() function. #20

Closed
ikwak2 opened this issue Mar 20, 2017 · 8 comments
Closed

error message when I am using estimateDispersions() function. #20

ikwak2 opened this issue Mar 20, 2017 · 8 comments

Comments

@ikwak2
Copy link

ikwak2 commented Mar 20, 2017

Thank you for developing nice tools for analyzing scRNA-seq data. I have used monocle 1 with fun. Now I reinstalled monocle to try census count and visualize data using monocle.

However, I am getting errors that I previously didn't had.
Here I attach error message from estimateDispersions() function, and my sessionInfo().

load("Xerr.RData") # npX : scRNA-seq expression data, pd = pheno, AnnotatedDataFrame, fd = feature, AnnotatedDataFrame.
pXX <- newCellDataSet(npX, phenoData = pd, featureData = fd)

rpc_matrix <- relative2abs(pXX)

pXX <- newCellDataSet(as(as.matrix(rpc_matrix), "sparseMatrix"),

  •                    phenoData = pd,
    
  •                    featureData = fd,
    
  •                    lowerDetectionLimit=1,
    
  •                    expressionFamily=negbinomial.size())
    

pXX <- estimateSizeFactors(pXX)
pXX <- estimateDispersions(pXX)
Error in intI(i, n = x@Dim[1], dn[[1]], give.dn = FALSE) :
invalid character indexing
In addition: Warning message:
Deprecated, use tibble::rownames_to_column() instead.

sessionInfo()
R version 3.3.2 (2016-10-31)
Platform: x86_64-apple-darwin13.4.0 (64-bit)
Running under: OS X El Capitan 10.11.6

locale:
[1] C

attached base packages:
[1] splines stats4 parallel stats graphics grDevices utils
[8] datasets methods base

other attached packages:
[1] monocle_2.2.0 DDRTree_0.1.4 irlba_2.1.2
[4] VGAM_1.0-3 ggplot2_2.2.1 Biobase_2.34.0
[7] BiocGenerics_0.20.0 Matrix_1.2-7.1

loaded via a namespace (and not attached):
[1] Rcpp_0.12.9 compiler_3.3.2 RColorBrewer_1.1-2
[4] plyr_1.8.4 tools_3.3.2 tibble_1.2
[7] gtable_0.2.0 lattice_0.20-34 igraph_1.0.1
[10] DBI_0.5-1 HSMMSingleCell_0.108.0 fastICA_1.2-0
[13] dplyr_0.5.0 stringr_1.1.0 cluster_2.0.5
[16] combinat_0.0-8 grid_3.3.2 R6_2.2.0
[19] qlcMatrix_0.9.5 pheatmap_1.0.8 limma_3.30.8
[22] reshape2_1.4.2 magrittr_1.5 scales_0.4.1
[25] matrixStats_0.51.0 assertthat_0.1 colorspace_1.3-2
[28] stringi_1.1.2 lazyeval_0.2.0 munsell_0.4.3
[31] slam_0.1-40

I am not sure what I've done wrong. I can send "Xerr.RData" file if needed.

Thank you so much!
Sincerely,
ilyoup

@Xiaojieqiu
Copy link
Collaborator

we have not see this error before. yes. please attach the CDS file in your response here. We will be happy to take a look over it. Thanks

@ikwak2
Copy link
Author

ikwak2 commented Mar 21, 2017

github do not support Rdata file format. So I sent the file to xqiu@uw.edu .
Thank you so much!

@Xiaojieqiu
Copy link
Collaborator

Xiaojieqiu commented Mar 22, 2017

Thanks for your email. I have looked at your data. The 34th cell (column C34) has NaN values in your npX matrix (so the pXX cds too). This causes the line of script Matrix::rowSums(rounded > cds@lowerDetectionLimit, na.rm = T) in disp_calc_helper_NB called by estimateDispersion function get all NA values which leads to the error you saw.

After removing this cell, you can run estimateSizeFactors and estimateDispersions without error

pXX_valid <- pXX[, -34]
pXX_valid <- estimateSizeFactors(pXX_valid)
pXX_valid <- estimateDispersions(pXX_valid)

Also, please notice that estimateDispersion works better when you pool all the genes in your single-cell sample. In your example, you only have a few hundred genes.

@ikwak2
Copy link
Author

ikwak2 commented Mar 22, 2017

Oh, got it. Thank you so much!

@ikwak2 ikwak2 closed this as completed Mar 22, 2017
@jgarces02
Copy link

jgarces02 commented Jul 21, 2017

Hi @Xiaojieqiu I have the same problem but with the markerDiffTable function. I've tried to search some zero or NA value but there is none... I copy below my code:

path <- paste(getwd(), "2_count_outs/outs/filtered_gene_bc_matrices/GRCh38/", sep = "/")
matrix <- readMM(paste(path, "matrix.mtx", sep = ""))
pd <- read.table(paste(path, "barcodes.tsv", sep = ""))
colnames(pd) <- "cell_ID"
rownames(pd) <- pd$cell_ID
fd <- read.table(paste(path, "genes.tsv", sep = ""))
colnames(fd) <- c("transcript_ID", "gene_short_name")
rownames(fd) <- fd$transcript_ID
colnames(matrix) <- pd$cell_ID; rownames(matrix) <- fd$transcript_ID
pdata <- new("AnnotatedDataFrame", data = pd)
fdata <- new("AnnotatedDataFrame", data = fd)
rawdata <- newCellDataSet(matrix, phenoData = pdata, featureData = fdata, expressionFamily = negbinomial.size())

rawdata <- rawdata[1:30000,1:500]
rawdata <- estimateSizeFactors(rawdata)
rawdata <- estimateDispersions(rawdata)

rawdata <- detectGenes(rawdata, min_expr = 1) #zero
expressed_genes <- row.names(subset(fData(rawdata), num_cells_expressed >= 1))

gata1 <- row.names(subset(fData(rawdata), gene_short_name == "GATA1"))
gypa <- row.names(subset(fData(rawdata), gene_short_name == "GYPA"))
mpo <- row.names(subset(fData(rawdata), gene_short_name == "MPO"))
cebpb <- row.names(subset(fData(rawdata), gene_short_name == "CEBPB"))
dntt <- row.names(subset(fData(rawdata), gene_short_name =="DNTT"))
ebf1 <- row.names(subset(fData(rawdata), gene_short_name =="EBF1"))
fos <- row.names(subset(fData(rawdata), gene_short_name == "FOS"))
prdm1 <- row.names(subset(fData(rawdata), gene_short_name == "PRDM1"))
thy1 <- row.names(subset(fData(rawdata), gene_short_name == "THY1"))

cth <- newCellTypeHierarchy()
cth <- addCellType(cth, "Erythrocyte", classify_func = function(x) {x[ery_id,] >= 1 & x[gypa,] >= 1})
cth <- addCellType(cth, "Myeloid", classify_func = function(x) {x[mpo,] >= 1 & x[cebpb,] >= 1})
cth <- addCellType(cth, "LiT", classify_func = function(x) {x[ebf1,] >= 1 & x[dntt,] >= 1})
cth <- addCellType(cth, "LiB", classify_func = function(x) {x[fos,] >= 1 & x[prdm1,] >= 1})
cth <- addCellType(cth, "Progenitors", classify_func = function(x) {x[thy1,] >= 1 & x[fos,] < 1})
rawdata_ct <- classifyCells(rawdata, cth)

marker_diff <- markerDiffTable(rawdata[expressed_genes,], cth, cores = 2)
## and here the error appears:
## Error in intI(i, n = x@Dim[1], dn[[1]], give.dn = FALSE) : invalid character indexing 

I can't upload my matrix count because is .mtx format, but if you need it I'll send you by email.

Thanks in advance!

@vertesy
Copy link

vertesy commented Dec 30, 2017

I have the same issue. There are no NA or NaN values in my expression matrix, yet I got the error:

> MyCellDataSet <- estimateDispersions(MyCellDataSet)
Error in intI(i, n = x@Dim[1], dn[[1]], give.dn = FALSE) : 
  invalid character indexing
In addition: Warning message:
Deprecated, use tibble::rownames_to_column() instead. 

Solution

  • It looks like that the relative2abs() function introduces NaN values to an expression matrix.
  • This function also causes some cells (~10% in my case) to have only NaN values.
    - If I ran Monocle with an unfiltered dataset (10K instead the highest 1000 genes), 27% of the cells were set to NaN-only values. Odd.

Replacing NA-s with 0, and removing 0-only cells helped.

rpc_matrix <- relative2abs(HSMM)

NA_count =sum(is.na(rpc_matrix))
rpc_matrix <- na.replace(rpc_matrix, 0.)

OnlyZeros = (colSums(rpc_matrix)==0)
paste(sum(OnlyZeros), "cells have zero reads in total, and there were", NA_count, "NA values before replacement to NA -> 0")

Valid = which(!OnlyZeros)
rpc_matrix = rpc_matrix[ , Valid ]; dim(rpc_matrix)

# you need to subset phenotype data too!

PS: Additionally, the vignette code uses melt() but does not require(reshape2).

@fereshtehizadi
Copy link

Sorry I am working with URD package, when I am trying to plot markers on clusters I always get this error

> plotDot(object.6s.mnn, genes = c("DDB_G0267178", "DDB_G0267178", "DDB_G0285311", "DDB_G0290079", "DDB_G0267180", "DDB_G0273181"), clustering="Infomap-60")
Error in intI(i, n = d[1], dn[[1]], give.dn = FALSE) : 
  invalid character indexing
> 

Please somebody help me with that

Thanks a lot

@rpa12356
Copy link

Hello, I am using the Monocle R package to analyze single-cell data, but when I was using the estimateDispersion() function, I meet the following error: Error in log (ifelse (y==0, 1, y/mu)): (converted from warning to) NaNs generated“
Here is my code:
cells_2<-subset(Data_harmony,labels%in%c("High-Malignant cells","low-Malignant cells"))
cells_2_matrix <- as.matrix(cells_2@assays$RNA@counts, 'sparseMatrix')
p2_data <- cells_2@meta.data
p2_data$celltype <- cells_2@active.ident
f2_data <- data.frame(gene_short_name = row.names(cells_2_matrix),row.names = row.names(cells_2_matrix))
pd2 <- new('AnnotatedDataFrame', data = p2_data)
fd2 <- new('AnnotatedDataFrame', data = f2_data)
cds2 <- newCellDataSet(cells_2_matrix,
phenoData = pd2,
featureData = fd2,
lowerDetectionLimit = 0.5,
expressionFamily = negbinomial.size())
cds2 <- estimateSizeFactors(cds2)
cds2 <- estimateDispersions(cds2)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants