-
Notifications
You must be signed in to change notification settings - Fork 119
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
massive memory use by hisat2-build when attempting to index the rat genome #123
Comments
Hi,
What settings are recomended. |
Kapeel, You can download an index for the GRCh38 release of the human reference sequence from the authors website (look on the right-hand side as you scroll down). This will save you the frustration of trying to do this yourself! Trevor |
I am facing the same issue. I have download "R. norvegicus, UCSC rn6 ,genome index" from hisat2 website, but it not include split site and exon. I think my results is not believing, becsuse some genes map genome,but FPKM is 0. |
Also have this issue - can't build a genome_trans index (using version 2.1.0) for mm10 with Ensembl 91 annotations due to lack of memory on a node with 1TB of RAM. |
I am trying to index the rat genome (Ensembl release 89) using the "make_rnor6_tran.sh" script in the hisat2 installation folder (this script includes known transcript structure in the index) on a workstation with 512 Gb RAM and 28 cores running CentOS 7. The program is consistently being killed by the kernel due to exhaustion of system memory, which is over twice your recommended amount for the human genome when including known SNPs, splice-sites, and exons in the index. The rat genome is comparable to that of humans, but the number of SNPs and transcripts is much lower, so I am at a loss as to why this keeps happening.
One possible thought is that there appears to be a bug in how hisat2-build assesses available system memory. On systems with 256 Gb RAM or less returns it throws an out of memory, trying more friendly settings, message as the program continues to search the parameter space (albeit ultimately unsuccessfully) while on systems with >256 Gb of RAM it gets exhausts the memory without a second thought and its killed by the kernel.
Are you able to provide the relevant settings you used when indexing the human genome? Or since indexing is a relatively quick process, can you index Ensembl release 89 of the rat genome, including known SNPs and transcripts in the index, and post the "rnor6_snp_tran" index on your groups hisat2 web page?
Thanks in advance,
Trevor
The text was updated successfully, but these errors were encountered: