Genome preparation fails in multi-thread mode for extremely large genomes #251

FelixKrueger · 2019-04-14T08:30:31Z

When using the Bowtie2 indexer on a very large genomes, e.g. the Axolotl genome (~32GB), the auto-detection of small/large genome sequences doesn't seem to work as expected:

...
Building a SMALL index
Reading reference sizes
Building a SMALL index
Reading reference sizes
Error: Reference sequence has more than 2^32-1 characters!  Please build a large index by passing the --large-index option to bowtie2-build
  Time reading reference sizes: 00:00:57
Total time for call to driver() for forward index: 00:01:00
Error: Encountered internal Bowtie 2 exception (#1)
Command: bowtie2-build --wrapper basic-0 -f --threads 2 genome_mfa.GA_conversion.fa BS_GA 
Deleting "BS_GA.3.bt2" file written during aborted indexing attempt.
Deleting "BS_GA.4.bt2" file written during aborted indexing attempt.
Error: Reference sequence has more than 2^32-1 characters!  Please build a large index by passing the --large-index option to bowtie2-build
  Time reading reference sizes: 00:00:57
Total time for call to driver() for forward index: 00:01:00
Error: Encountered internal Bowtie 2 exception (#1)
Command: bowtie2-build --wrapper basic-0 -f --threads 2 genome_mfa.CT_conversion.fa BS_CT 
Deleting "BS_CT.3.bt2" file written during aborted indexing attempt.
Deleting "BS_CT.4.bt2" file written during aborted indexing attempt.
Parent process: Failed to build index

It appear that we need to allow passing on the indexing option --large-index to bowtie2-build to make this work.

PS: It works in default (single-core) indexing mode, i.e. it finds and automatically generates a large index. The wallclock time was roughly 2d 6h, and took ~150GB of RAM.

The text was updated successfully, but these errors were encountered:

FelixKrueger · 2019-04-14T09:53:15Z

It appears that HISAT2 is also failing, even with the very same Bowtie-2 message... 📦

...Reading reference sizes
Reading reference sizes
Error: Reference sequence has more than 2^32-1 characters!  Please build a large index by passing the --large-index option to bowtie2-build
  Time reading reference sizes: 00:00:53
Total time for call to driver() for forward index: 00:00:57
Error: Encountered internal HISAT2 exception (#1)
Command: hisat2-build --wrapper basic-0 -f --threads 2 genome_mfa.CT_conversion.fa BS_CT 
Deleting "BS_CT.1.ht2" file written during aborted indexing attempt.
Deleting "BS_CT.2.ht2" file written during aborted indexing attempt.
Deleting "BS_CT.3.ht2" file written during aborted indexing attempt.
Deleting "BS_CT.4.ht2" file written during aborted indexing attempt.
Parent process: Failed to build index

FelixKrueger · 2019-04-14T13:54:04Z

I have now added a new option --large-index to the bismark_genome_preparation which should hopefully fix the auto-detection problem. Tests are currently under way, but will probably take a day or two to complete. Should consider reporting this to the Bowtie 2 and HISAT2 developers. Added here: 5de68d5.

FelixKrueger · 2019-04-16T07:53:21Z

I can confirm that the indexing with both Bowtie 2 as well as HISAT2 now works with multi-core support when --large-index is specified specifically.

This also brings the time of indexing the Axolotl genome down to ~18-20 hours when using --parallel 4 (8 cores total, and ~183GB RAM usage).

FelixKrueger added the bug label Apr 14, 2019

FelixKrueger self-assigned this Apr 14, 2019

FelixKrueger closed this as completed Apr 16, 2019

FelixKrueger mentioned this issue May 16, 2019

Bismark mapping problem #169

Closed

FelixKrueger mentioned this issue Sep 1, 2020

Bismark_genome_preparation fails to make index for large genome (22GB) #368

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Genome preparation fails in multi-thread mode for extremely large genomes #251

Genome preparation fails in multi-thread mode for extremely large genomes #251

FelixKrueger commented Apr 14, 2019

FelixKrueger commented Apr 14, 2019

FelixKrueger commented Apr 14, 2019

FelixKrueger commented Apr 16, 2019

Genome preparation fails in multi-thread mode for extremely large genomes #251

Genome preparation fails in multi-thread mode for extremely large genomes #251

Comments

FelixKrueger commented Apr 14, 2019

FelixKrueger commented Apr 14, 2019

FelixKrueger commented Apr 14, 2019

FelixKrueger commented Apr 16, 2019