New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Is matrixValidity printing row and column wrong? #52
Comments
Thanks for catching the error in the error! You are right, the actual error is in the (88, 1) portion. Usually this is when filtering or input is unexpected in some way. For instance, you say that you have 333 cells but they are rows in the input matrix. Cell ranger's matrix market output has cells as columns, so that is what we used for input, could you check that your barcode and features file are matching that? If not, you can also transpose the matrix with the |
I had initially tried cells as columns, but only got one cell in the output with the default filter thresholds:
I tried different threshold combinations, but still only had this one cell in the output. Not including filter thresholds gives:
However, using the previous cells as rows and genes as columns mtx format, I ran
This listed the genes instead of the barcodes as cells in its output so I swapped the genes and barcodes filenames so that
I'm not sure where I've gone wrong, but doing it the opposite, but correct way doesn't work: matrix with cells as columns and genes as rows, |
Can I see a For the actual case (non-transposed). |
> wc -l barcodes.tsv
333 barcodes.tsv > wc -l genes.tsv
60329 genes.tsv |
And can I see the command you ran with those files along with the error? Is it the first comment of this thread? |
Yes, with this data it was too-many-cells make-tree \
--matrix-path input \
--output out Adding this too-many-cells make-tree \
--filter-thresholds "(250, 1)" \
--matrix-path input \
--output out removes the error giving one cell in the output |
I'm starting to confuse myself. I've reverted the pull request as I realized I was referring to the rows as features and cells as columns like with Cell Ranger (even though for the program our convention is cells as rows). Hence the swap. For your problem, this means that the original error is that the feature file had 60329 rows but the matrix had 333 rows (and vice versa for columns). Based on what you sent me (the
Silly question, what is in the What happens if you do not change your features and barcode files but use |
Also, you should not use that "default" filtering threshold as your values are definitely not 10x scRNA-seq, just leave them at Could you also send the |
> ls input
barcodes.tsv
genes.tsv
matrix.mtx
too-many-cells make-tree \
--matrix-path input \
-T \
--output input
|
If its helpful, the data are from a new scRNA-seq protocol and I'm using the data from that paper. I got a SingleCellExperiment object and pulled the count matrix from it and wrote it to a mtx file. |
I'm a little confused, in the first comment in this thread you had cells as rows, but in the latest one you have them as columns. Which is the original and which error goes with which matrix? |
Just to be clear, in a perfect world where it works, the matrix should have 333 columns and the barcode file with 333 lines, no transposing, and filters being 0. |
Try testing on the example from the TooManyCells workshop to see if it has the appropriate output, and see if the inputs match yours. |
The first comment had cells as rows and genes as columns and produced that error about mismatched dimensions ( The last comment from me is using the same data, but, as expected, with cells as columns and genes as rows. The perfect world scenario also produced an error with different mismatched dimensions In both cases, using the filters-thresholds flag removes the error. When the matrix is set up as too-many-cells expects, only the first cell is present in the output regardless of the filter thresholds values.
I ran this and it worked perfectly. There is probably something up with my matrix - I'm having a hard time tracking down the source of the error. As far as I can tell the inputs match in mtx header and the counts of features and barcodes. My comment with the dendrogram picture was the only time I did a transpose of the 'wrong' matrix format (like my first comment) and swapped genes/barcodes filenames. Its also the only one that produces results without filters. Sorry for the confusion and thanks for taking the time to go through this! |
The only difference I found between the workshop data and mine is the numeric type - real vs integer: mine:
workshop brain:
|
Very weird. What if you use a csv? It will be slower but we can see if it has something to do with the mtx file which I think is the culprit. |
That worked! The tree also makes more biological sense than the transposed approach. I'll try converting between csv/mtx and figure out where my mtx file is broken. |
I'm trying out a dataset with 333 cells and 60329 genes.
When running:
I wasn't sure what the error meant and checked the code. It looks like its printing the matrix's cols,rows instead of rows,cols as the message says. However, The matrix does have 333 rows and 60329 columns so I'm not sure how to interpret it or know if my matrix is set up wrong. I'm also not sure where the (88, 1) means.
I'm having issues where not all 333 cells are in the final clusters.csv output - fewer than 100 make it depending on the
filter-thresholds
used. Since the documentation said this was optional, I took it out, but found this error.I've made a draft PR swapping the two prints if it is actually wrong.
The text was updated successfully, but these errors were encountered: