Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

table of droplets (tod) and table of counts (toc) have different numbers of genes #148

Open
al3xmlt030 opened this issue Feb 7, 2024 · 10 comments

Comments

@al3xmlt030
Copy link

Hi,
I've encountered an issue after running the cellranger multi pipeline for scRNASeq with a fixed protocol. The pipeline has generated a directory structure with "per_sample_outs", wherein each sample contains a "count" folder with both "filtered_feature_bc_matrix" and "raw_feature_bc_matrix". However, I've noticed a discrepancy in the number of genes between these two matrices, leading to the following problem:

toc = Seurat::Read10X(file.path(tmpDir, "filtered_feature_bc_matrix"))
tod = Seurat::Read10X(file.path(tmpDir, "raw_feature_bc_matrix"))
sc = SoupChannel(tod, toc)

Error in SoupChannel(tod, toc): The provided table of droplets (tod) and table of counts (toc) have different numbers of genes. Both tod and toc must have the same genes in the same order.
Traceback:

Thanks a lot in advance!

@jcorn427
Copy link

jcorn427 commented Feb 27, 2024

I've run into the same issue. I initially tried to use the load10X() function but due to the different directory structures from regular cell ranger and cell ranger multi it doesn't work. So, then I tried what you've done here to build the soupchannel object from the raw matrices and I get the same error as you.

Edit: Just wanted to add that I was hoping it would work since you're using cell ranger multi to generate singleplex data from the fixed samples. However, I don't know if support for cell ranger multi will be added.

@changostraw
Copy link

I get this error as well trying to apply SoupX to cellranger multi outs. Does anyone know a workaround?

@gyanmishra
Copy link

I am having the similar issue. Does anyone got any fix for this.

h5.files = list.files("results/20240304D3A_Seur_R/",pattern = "*.h5",full.names = TRUE)
raw.matrix.files = h5.files[grepl('_raw',h5.files)]
filt.matrix_files =  h5.files[!(grepl('_raw',h5.files))]

raw.matrix <- lapply(raw.matrix.files,
                      function(x){
                        Read10X_h5(x,use.names = F)})

filt.matrix  <- lapply(filt.matrix_files, 
                      function(x){
                        Read10X_h5(x,use.names = F)})

soup.channel  <- for(i in 1:length(raw.matrix)){SoupChannel(raw.matrix[i], filt.matrix[i])}

Error in if (nrow(tod) != nrow(toc)) stop("The provided table of droplets (tod) and table of counts (toc) have different numbers of genes. Both tod and toc must have the same genes in the same order.") :
argument is of length zero

@wblashka
Copy link

I'm having a similar issue. I won't have time to try and troubleshoot myself, but I am wondering if this is the result of Cellranger automatically filtering out deprecated probes from their FRP protocol. Based on the description of Cellranger multi's outputs, it seems like the raw matrix includes these probes while the filtered matrix does not. Perhaps these are responsible for the discrepancy? If anyone is able to attempt to remove these probes from a raw matrix and see if that resolves the issue, I would love to know... otherwise I will attempt this in a couple of weeks.

@NathanKochhar
Copy link

if your object has multiple assays this will fix it:

toc <- Read10X(data.dir = "/filtered_feature_bc_matrix")
tod <- Read10X(data.dir = "/raw_feature_bc_matrix")
toc <- toc$"Gene Expression"
tod <- tod$"Gene Expression" 
sc = SoupChannel(tod, toc, calcSoupProfile = FALSE)
sc = estimateSoup(sc)

@aspides-js
Copy link

@wblashka following your suggestion I filtered the raw matrix to only include the probes marked as included = TRUE in cellranger's probe_set.csv output but unfortunately doesn't resolve the issue - in my case this only filtered out 419 of the 13285 gene discrepancy. As a workaround, the function works after simply filtering the raw matrix by setdiff() on the rownames between the raw and filtered matrices.

@RB786
Copy link

RB786 commented Apr 19, 2024

I faced similar issue. My raw and filtered hd5 files have different number of genes. I filtered the unmatched genes between the two files and then it worked. However I am not sure if this is the right way. Has anyone got it solved?

@jnmnbals
Copy link

jnmnbals commented May 3, 2024

Adding myself to the list of users running into this issue. Hoping someone has found a workaround or two for this.

I faced similar issue. My raw and filtered hd5 files have different number of genes. I filtered the unmatched genes between the two files and then it worked. However I am not sure if this is the right way. Has anyone got it solved?

@RB786 Would you mind sharing how you went about filtering? Still very new to the bioinformatics world.

@imet-k
Copy link

imet-k commented May 15, 2024

I solved it like this if anyone is interested:
(filt.matrix is toc and raw.matrix tod)

filt_genes <- rownames(filt.matrix)

# Subset raw.matrix to keep only the genes in filt.matrix
raw.matrix_subset <- raw.matrix[rownames(raw.matrix) %in% filt_genes, ]

@afletch00
Copy link

Adding myself as well. @imet-k, your method worked than you!!! I am wondering if there have been other issues pop-up when using the FLEX assay.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests