-
Notifications
You must be signed in to change notification settings - Fork 160
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Salmon indexing on UCSC genome.fa files fail for mm9 #49
Comments
It appears that you're trying to index the entire mm9 genome using salmon. Both salmon and rapmap are designed to work with a smaller sequence space such as what you would find in a transcriptome. Your log file shows that salmon processes 615,000,000 bases from the genome and then aborts. Depending on how many transcripts are in your feature file, a human transcriptome might be 5-10X smaller. |
Hi @vd4mmind, Indeed, @mdshw5 is spot on. The issue you're seeing is a result of the hash table doubling failing to allocate sufficient memory when attempting to build a hash table for all 31-mers in the mouse genome. In addition to the memory requirements of building a quasi-index on the genome (which we're actually working to mitigate b/c we think it could be useful in another context), this won't be particularly useful for quantification. Salmon treats each entry in the multifasta file as a distinct transcriptional target. Thus, here, even if the index did build successfully, you'd be quantifying the abundance of different chromosomes & contigs, rather than the transcripts. What you should do (as pointed out by @mdshw5 above), is to grab a file that contains the mouse transcripts (or take your mm9 genome and an appropriate gtf file and use a tool like |
Ah yes this is actually true, I realised it now. Infact I always ran salmon on the transcripts file for human rather than genome. Yes the mm9 does not have transcripts fasta file in our lab, so I will create one and then run indexes on it. Yes it is my bad. Thanks for the suggestions. I will do the needful and run the index once it is done I will report it here. If its not a problem till that time I would like to keep this ticket open. |
I have a question , if you guys would like to answer. Where can I get the transcripts.gtf file for mm9. Is there any link from where I can download or do I have to create on my own. I am a bit confused and different forums are adding up to my confusion if you would like to suggest. |
Done the required work. Sorry for bothering everyone. Downloaded the refGene.gtf file from UCSC for mm9 having transcript information and then used |
I am intending to run salmon on a set of RNA-Seq data lying in our lab for a long time. They are for mm9 and since there are >50 samples I was intending to run it using Salmon
version : 0.6.0
. I have used earlier versions of salmon on hg19 data from both UCSC, NCBI (spiked-in and non-spiked in data) without alignment mode and have run them successfully. Recently we were able to download the latest version and compile and trying to run the indexing on the UCSC mm9 genome.fa file so that I can use quasi-mapping indexes that can be then used to run quant for my samples downstream so getting read counts as well as TPM much faster than any other tool. Can you tell me what is the problem.Command line used
salmon index -t /path_to/genome.fa -i salmonquasi-indexes --type quasi -k 31
Here is the error message while using the Ram-Map
I also checked the log file and it shows nothing except.
output:
So can you give me a workaround or inputs to solve this issue? Thanks
The text was updated successfully, but these errors were encountered: