Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error in DESeqDataSet : counts matrix should be numeric, currently it has mode: character #210

Open
ifeelsostupid opened this issue Feb 25, 2020 · 4 comments

Comments

@ifeelsostupid
Copy link

Hello,
I encountered the above error while running DESeq. I know it's because I manually made the row.names of my matrix the name of the genes, but the example file ran with zero problems (https://informatics.fas.harvard.edu/differential-expression-with-deseq2.html). I checked that this file also has the gene names, instead of numericals, as the row names. I don't understand why they won't have the error message while I do.

This is what head(theirData) looks like: (dim = 6)

        dmel_unf1 dmel_unf2 dmel_unf3 dmel_inf1 dmel_inf2 dmel_inf3

FBgn0000003 209 164 143 162 80 151
FBgn0000008 572 467 580 509 435 297
FBgn0000014 387 276 383 289 237 141
FBgn0000015 158 123 157 117 110 70
FBgn0000017 2351 2126 2885 2896 2041 1467
FBgn0000018 368 296 314 318 272 169

This is what head(myData) looks like: (dim = 4)

CHIP_DHT1 CHIP_DHT2 CHIP_VEH1 CHIP_VEH2
A1BG 16 10 23 17
A1BG-AS1 26 15 20 22
A1CF 62 61 83 56
A2M 46 33 46 46
A2M-AS1 16 19 18 16
A2ML1 51 50 56 61

Any help is appreciated. Thanks!

@nephantes
Copy link
Member

in the first column, you need to write genename or something. The number of columns in the header and table has to be same. In your case it isn't.
Other thing, make sure the delimiters are consistent. If you chose tsv, it has to be tab separated everywhere. If there is space somewhere it won't work. If you attach some portion of your data, I can help you to find the problem in your table.

@ifeelsostupid
Copy link
Author

Thank you.
My data file is a mess because I combined it from four smaller files each containing the counts of only one sample. It's also a txt file, not csv. I think I'll just look into it and see it that's where the problem is...

But clarification on my original problem: my data had 4 columns, not 5. The gene names are the row.names, not a column, and when you run dim() on the file it did say 4 columns. Exactly like the example file, where they had 6 columns plus gene names as row.names. So their row names were not numerical either. I just did not understand why that file would be allowed to run by DESeq.

That said, I could totally make the gene names my first row. It just causes later complications because it seems that DESeq will mistake it as a column of samples. If there's a way to fix this problem then all these previous problems would not matter.

I successfully ran one trial before, not sure why it stopped working... Here's the output from that run if it helps.

class: DESeqDataSet
dim: 26586 4
metadata(1): version
assays(1): counts
rownames(26586): A1BG A1BG-AS1 ... ZZEF1 ZZZ3
rowData names(0):
colnames(4): CHIP_DHT1 CHIP_DHT2 CHIP_VEH1 CHIP_VEH2
colData names(1): condition

@ifeelsostupid
Copy link
Author

Update: Actually, I was building a txt file from my HTSeq output so that I could use the DESeqDataSetFromMatrix function.
I did not know that there was a DESeqDataSetFromHTSeqCount function.

So it's all good now...
I'll close this thread in a few days in case someone has something to add to the FromMatrix problem.

@nephantes
Copy link
Member

Good to know.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants