massive memory use by hisat2-build when attempting to index the rat genome #123

drtjpemberton · 2017-06-22T22:23:43Z

I am trying to index the rat genome (Ensembl release 89) using the "make_rnor6_tran.sh" script in the hisat2 installation folder (this script includes known transcript structure in the index) on a workstation with 512 Gb RAM and 28 cores running CentOS 7. The program is consistently being killed by the kernel due to exhaustion of system memory, which is over twice your recommended amount for the human genome when including known SNPs, splice-sites, and exons in the index. The rat genome is comparable to that of humans, but the number of SNPs and transcripts is much lower, so I am at a loss as to why this keeps happening.

One possible thought is that there appears to be a bug in how hisat2-build assesses available system memory. On systems with 256 Gb RAM or less returns it throws an out of memory, trying more friendly settings, message as the program continues to search the parameter space (albeit ultimately unsuccessfully) while on systems with >256 Gb of RAM it gets exhausts the memory without a second thought and its killed by the kernel.

Are you able to provide the relevant settings you used when indexing the human genome? Or since indexing is a relatively quick process, can you index Ensembl release 89 of the rat genome, including known SNPs and transcripts in the index, and post the "rnor6_snp_tran" index on your groups hisat2 web page?

Thanks in advance,

Trevor

Kapeel · 2017-12-12T15:23:21Z

Hi,
I am facing a similar issue indexing a Human Genome. Below is the error I get

hisat2-indexing Homo_sapiens.GRCh38.dna.toplevel
Settings:
  Output files: "Homo_sapiens.GRCh38.dna.toplevel.*.ht2l"
  Line rate: 7 (line is 128 bytes)
  Lines per side: 1 (side is 128 bytes)
  Offset rate: 4 (one in 16)
  FTable chars: 10
  Strings: unpacked
  Local offset rate: 3 (one in 8)
  Local fTable chars: 6
  Local sequence length: 57344
  Local sequence overlap between two consecutive indexes: 1024
  Endianness: little
  Actual local endianness: little
  Sanity checking: disabled
  Assertions: disabled
  Random seed: 0
  Sizeofs: void*:8, int:4, long:8, size_t:8
Input files DNA, FASTA:
  Homo_sapiens.GRCh38.dna.toplevel.fa
Reading reference sizes
  Time reading reference sizes: 00:02:49
Calculating joined length
Writing header
Reserving space for joined string
Joining reference sequences
  Time to join reference sequences: 00:06:15
  Time to read SNPs and splice sites: 00:00:00
Using parameters --bmax 576132141 --dcv 1024
  Doing ahead-of-time memory usage test
Executing: hisat2 -I 0 --min-intronlen 20 --max-intronlen 500000 --dta -X 500 --dta-cufflinks -x Homo_sapiens.GRCh38.dna.toplevel -U SRR849504.fastq -p 4 | samtools view -bS - > SRR849504.fastq.bam

Error reading _rstarts[] array: 36448, 42288
Error: Encountered internal HISAT2 exception (#1)
Command: /hisat2/hisat2-align-l --wrapper basic-0 -I 0 --min-intronlen 20 --max-intronlen 500000 --dta -X 500 --dta-cufflinks -x Homo_sapiens.GRCh38.dna.toplevel -p 4 -U SRR849504.fastq 
(ERR): hisat2-align exited with value 1
[samopen] no @SQ lines in the header.
[sam_read1] missing header? Abort!
[bam_header_read] EOF marker is absent. The input is probably truncated.

What settings are recomended.
Thanks
Kapeel

drtjpemberton · 2017-12-13T15:13:05Z

Kapeel,

You can download an index for the GRCh38 release of the human reference sequence from the authors website (look on the right-hand side as you scroll down). This will save you the frustration of trying to do this yourself!

Trevor

Lee211 · 2018-01-30T13:47:09Z

I am facing the same issue. I have download "R. norvegicus, UCSC rn6 ,genome index" from hisat2 website, but it not include split site and exon. I think my results is not believing, becsuse some genes map genome,but FPKM is 0.

snsansom · 2018-02-15T14:02:15Z

Also have this issue - can't build a genome_trans index (using version 2.1.0) for mm10 with Ensembl 91 annotations due to lack of memory on a node with 1TB of RAM.

suryasaha mentioned this issue Mar 19, 2018

High RAM usage during indexing and function of exon file #162

Open

outpaddling mentioned this issue Jun 27, 2023

hisat2 hangs aligning axolotl reads #413

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

massive memory use by hisat2-build when attempting to index the rat genome #123

massive memory use by hisat2-build when attempting to index the rat genome #123

drtjpemberton commented Jun 22, 2017

Kapeel commented Dec 12, 2017

drtjpemberton commented Dec 13, 2017

Lee211 commented Jan 30, 2018

snsansom commented Feb 15, 2018 •

edited

Loading

massive memory use by hisat2-build when attempting to index the rat genome #123

massive memory use by hisat2-build when attempting to index the rat genome #123

Comments

drtjpemberton commented Jun 22, 2017

Kapeel commented Dec 12, 2017

drtjpemberton commented Dec 13, 2017

Lee211 commented Jan 30, 2018

snsansom commented Feb 15, 2018 • edited Loading

snsansom commented Feb 15, 2018 •

edited

Loading